### Задание

Ознакомиться со структурой датасета, в котором хранятся сведения о результатах сходов/крушений подвижного состава вне стрелочных переводов по причине неисправности подвижного состава.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import mean_absolute_error
from scipy import stats
import statsmodels.api as sm

df = pd.read_excel("WO.xlsx")
df.head()

Unnamed: 0,date,length,commonlength,maxder,dcar,speed,weight,load,curve,profile,type
0,2013-01-03,68,70,61,1,6,4875,0.705669,0.005,0.0,2
1,2013-01-03,43,45,2,1,30,3955,0.999663,0.005,,2
2,2013-01-10,70,72,65,4,10,1658,0.009938,,,2
3,2013-01-10,56,58,44,10,29,2651,0.352743,0.003333,-0.0164,2
4,2013-01-12,68,71,23,1,5,2522,0.204177,0.003333,,2


Загрузить указанный датасет в любой математический пакет по выбору студента.

Используя пуассоновскую регрессию, построить не менее 12 зависимостей среднего количества подвижных единиц в сходе с рельсов от различных факторов движения. Для каждой построенной зависимости привести значения скорректированного коэффициента детерминации, средней абсолютной погрешности, средней относительной погрешности, используя два прогноза (в виде среднего и в виде значения с максимальной вероятностью). Привести значение AIC, отношения правдоподобия (при сравнении с тривиальной моделью, содержащей только константу).

Определить наилучшую из построенных зависимостей и объяснить, почему она, на взгляд студента, является наилучшей.

### Решение

Напишем метод, который будет применять метод пуассоновской регрессии `PoissonRegressor` из библиотеки `sklearn.linear_model`. Метод заключается заключается максимизации функции $\overline{L}\left( x_1, \dots, x_n, \dots, \theta_1, \dots \theta_p, y_1, \dots, y_n \right)$
$$
P(Y = y|X = x, \theta_1, \dots, \theta_p) = \frac{f(x, \theta_1, \dots, \theta_p)^{y}\exp(-f(x, \theta_1, \dots, \theta_p))}{y!}
$$
$$
L\left( x_1, \dots, x_n, \theta_1, \dots \theta_p, y_1, \dots, y_n \right) = \prod\limits_{k=1}^n\frac{f(x_k \theta_1, \dots, \theta_p)^{y_k}\exp(-f(x_k, \theta_1, \dots, \theta_p))}{y_k!}
$$
$$
\overline{L}\left( x_1, \dots, x_n, \theta_1, \dots \theta_p, y_1, \dots, y_n \right)=\sum\limits_{k=1}^{n}\ln\left( \frac{1}{x_k!} \right) + \\ + \sum\limits_{k=1}^{n}\left( y \cdot \ln{f\left( x_k, \theta_1, \dots, \theta_p \right)} \right) - \sum\limits_{k=1}^{n}\left( \ln{f\left( x_k, \theta_1, \dots, \theta_p \right)} \right) \rightarrow \max\limits_{\theta_1, \dots, \theta_p}
$$

где $y_k, k=1,\dots,n$ - набор скалярных экспериментальных данных, интересующий нас столбец таблицы `dcar` - 1, $x_k$ - набор векторных экспериментальных данных и предполагается, что $y$ зависит от $x$, $\theta_1, \dots, \theta_p$ - параметры системы, $f(x, \theta_1, \dots, \theta_p) = \exp(\cdot)$ - функция, которая определяется вектором неизвестных параметров $\theta_1, \dots, \theta_p$.

In [None]:
def solve_poisson(df, target_col='dcar', feature_cols=None, alpha=0):
    if feature_cols is None:
        feature_cols = [col for col in df.columns if col != target_col]
    y = df[target_col] - 1
    X = df[feature_cols]
    X_sm = sm.add_constant(X)
    poisson_model_sm = sm.GLM(y, X_sm, family=sm.families.Poisson()).fit()
    model = PoissonRegressor(alpha=alpha, max_iter=1000)
    model.fit(X, y)
    y_pred_mean = model.predict(X)
    y_pred_mode = np.floor(y_pred_mean).astype(int)
    n = X.shape[0]
    p = X.shape[1]
    null_model = sm.GLM(y, sm.add_constant(np.ones(n)), family=sm.families.Poisson()).fit()
    y_np = y.to_numpy()
    r2_mean = 1 - sum([(y_np[i] - y_pred_mean[i])**2 for i in range(len(y_np))])/sum([(y_np[i] - np.mean(y_np))**2 for i in range(len(y_np))])
    adj_r2_mean = 1 - (1-r2_mean)*(n-1)/(n-p-1)
    r2_mode = 1 - sum([(y_np[i] - y_pred_mode[i])**2 for i in range(len(y_np))])/sum([(y_np[i] - np.mean(y_np))**2 for i in range(len(y_np))])
    adj_r2_mode = 1 - (1-r2_mode)*(n-1)/(n-p-1)
    mae_mean = mean_absolute_error(y, y_pred_mean)
    mae_mode = mean_absolute_error(y, y_pred_mode)
    mre_mean = np.mean([abs((true - pred)/true) if true != 0 else 0 
                      for true, pred in zip(y, y_pred_mean)])
    mre_mode = np.mean([abs((true - pred)/true) if true != 0 else 0 
                      for true, pred in zip(y, y_pred_mode)])
    aic = poisson_model_sm.aic
    likelihood_ratio = -2 * (null_model.llf - poisson_model_sm.llf)
    formula = "exp("
    for i, feature in enumerate(feature_cols):
        formula += f"{model.coef_[i]:.3f}*{feature}" 
        if i < len(feature_cols) - 1:
            formula += " + "
    formula += f" + {model.intercept_:.3f})"
    return {
        'names': model.feature_names_in_,
        'intercept': model.intercept_,
        'params': model.coef_,
        'formula': formula,
        'y': df[target_col].to_numpy(),
        'y_pred_mean': y_pred_mean,
        'y_pred_mode': y_pred_mode,
        # 'r2_mean': r2_mean,
        'adj_r2_mean': adj_r2_mean,
        # 'r2_mode': r2_mode,
        'adj_r2_mode': adj_r2_mode,
        'mae_mean': mae_mean,
        'mae_mode': mae_mode,
        'mre_mean': mre_mean,
        'mre_mode': mre_mode,
        'aic': aic,
        'likelihood_ratio': likelihood_ratio,
    }

In [3]:
data = {
    'Зависимость': [],
    'Прогноз': [],
    'R^2_adj': [],
    'Δ': [],
    '𝛿': [],
    'AIC': [],
    'Отношение правдоподобия': []
}

pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

result_table = pd.DataFrame(data)

In [4]:
#Тест 1
dd = df[['speed', 'dcar']].copy().dropna()
res_1 = solve_poisson(dd)
ans_1 = "dcar = "
for i in range(len(res_1['params'])):
    ans_1 += f"{res_1['params'][i]:.4f}*{res_1['names'][i]} + "
ans_1 += str(format(res_1['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_1, "Среднее", res_1['adj_r2_mean'], res_1['mae_mean'], res_1['mre_mean'], res_1['aic'], res_1['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_1, "Макс. вер-ть", res_1['adj_r2_mode'], res_1['mae_mode'], res_1['mre_mode'], res_1['aic'], res_1['likelihood_ratio']]
res_1

{'names': array(['speed'], dtype=object),
 'intercept': -0.01873550186240651,
 'params': array([0.03059263]),
 'formula': 'exp(0.031*speed + -0.019)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([1.17918121, 2.4572517 , 1.33267852, 2.38321614, 1.14365317,
        2.10871838, 8.10236726, 1.33267852, 1.75509775, 1.98356373,
        1.10919556, 1.60118927, 7.62148326, 1.86583714, 1.04336364,
        1.2925257 , 1.65093085, 2.38321614, 1.33267852, 1.60118927,
        8.35407052, 2.17422655, 1.86583714, 1.46077738, 1.70221768,
        1.75509775, 1.37407871, 1.60118927, 3.04405946, 1.50615701,
        1.50615701, 1.50615701, 1.55294638, 1.75509775, 6.95313823,
        1.14365317, 1.65093085, 1.75509775, 1.55294638, 1.50615701]),
 'y_pred_mode': array([1, 2, 1, 2, 1, 2, 8, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 2, 1, 1, 8

In [5]:
#Тест 2
dd = df[[ 'length', 'dcar']].copy().dropna()
res_2 = solve_poisson(dd)
ans_2 = "dcar = "
for i in range(len(res_2['params'])):
    ans_2 += f"{res_2['params'][i]:.4f}*{res_2['names'][i]} + "
ans_2 += str(format(res_2['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_2, "Среднее", res_2['adj_r2_mean'], res_2['mae_mean'], res_2['mre_mean'], res_2['aic'], res_2['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_2, "Макс. вер-ть", res_2['adj_r2_mode'], res_2['mae_mode'], res_2['mre_mode'], res_2['aic'], res_2['likelihood_ratio']]
res_2

{'names': array(['length'], dtype=object),
 'intercept': 0.6732042383462435,
 'params': array([0.00234544]),
 'formula': 'exp(0.002*length + 0.673)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([2.29950633, 2.16854915, 2.3103184 , 2.23568826, 2.29950633,
        2.25147478, 2.3103184 , 2.25676169, 2.28874486, 2.47873583,
        2.3103184 , 2.3762668 , 2.28338303, 2.31574348, 2.30490602,
        2.25676169, 2.31574348, 2.23568826, 2.28338303, 2.16854915,
        2.3103184 , 2.45558962, 2.01174826, 2.35407744, 2.28338303,
        2.10836553, 2.19928216, 2.3103184 , 2.49623854, 2.23045072,
        2.29950633, 2.17364133, 2.18898973, 2.18386159, 2.26737278,
        2.29411929, 2.11331639, 2.25676169, 2.45558962, 2.25676169]),
 'y_pred_mode': array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

In [6]:
# Тест 3
dd = df[['maxder', 'dcar']].copy().dropna()
res_3 = solve_poisson(dd)
ans_3 = "dcar = "
for i in range(len(res_3['params'])):
    ans_3 += f"{res_3['params'][i]:.4f}*{res_3['names'][i]} + "
ans_3 += str(format(res_3['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_3, "Среднее", res_3['adj_r2_mean'], res_3['mae_mean'], res_3['mre_mean'], res_3['aic'], res_3['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_3, "Макс. вер-ть", res_3['adj_r2_mode'], res_3['mae_mode'], res_3['mre_mode'], res_3['aic'], res_3['likelihood_ratio']]
res_3

{'names': array(['maxder'], dtype=object),
 'intercept': 0.11507940518441316,
 'params': array([0.01791267]),
 'formula': 'exp(0.018*maxder + 0.115)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([3.34595595, 1.16288589, 3.59449363, 2.46757775, 1.69396321,
        3.2282077 , 1.85268185, 1.34206052, 1.95496494, 6.85007324,
        1.14224095, 3.0049965 , 1.24926508, 1.81979083, 1.75575019,
        2.69878148, 1.16288589, 2.25618118, 1.78748373, 2.46757775,
        4.22330156, 1.60533564, 1.41615316, 4.22330156, 1.60533564,
        1.9902991 , 1.18390397, 3.40643098, 1.14224095, 1.88616735,
        3.0049965 , 2.46757775, 2.65086948, 2.10017978, 1.18390397,
        1.16288589, 1.39101191, 2.69878148, 2.89924702, 1.92025806]),
 'y_pred_mode': array([3, 1, 3, 2, 1, 3, 1, 1, 1, 6, 1, 3, 1, 1, 1, 2, 1, 2, 1, 2, 4

In [7]:
# Тест 4
dd = df[['weight', 'dcar']].copy().dropna()
res_4 = solve_poisson(dd)
ans_4 = "dcar = "
for i in range(len(res_4['params'])):
    ans_4 += f"{res_4['params'][i]:.4f}*{res_4['names'][i]} + "
ans_4 += str(format(res_4['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_4, "Среднее", res_4['adj_r2_mean'], res_4['mae_mean'], res_4['mre_mean'], res_4['aic'], res_4['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_4, "Макс. вер-ть", res_4['adj_r2_mode'], res_4['mae_mode'], res_4['mre_mode'], res_4['aic'], res_4['likelihood_ratio']]
res_4

{'names': array(['weight'], dtype=object),
 'intercept': 1.0740936357684274,
 'params': array([-6.56538136e-05]),
 'formula': 'exp(-0.000*weight + 1.074)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([2.12555149, 2.25789464, 2.62541666, 2.45971472, 2.48063535,
        2.33401037, 2.62558904, 2.43497061, 1.99152905, 2.5058435 ,
        2.22026249, 2.01705707, 1.97628973, 2.17082259, 2.04479092,
        2.06733375, 2.61046334, 2.6435779 , 2.33110069, 2.46133015,
        2.62593382, 1.62361265, 2.82222184, 1.84718608, 1.97460369,
        2.77154246, 2.16968271, 1.85532939, 2.06001739, 2.0925966 ,
        2.0956213 , 2.33799793, 2.22537026, 2.34399204, 2.00754479,
        1.94118328, 2.78869981, 2.37170121, 2.4285844 , 2.23239431]),
 'y_pred_mode': array([2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2,

In [8]:
#Тест 5
dd = df[[ 'length', 'speed', 'dcar']].copy().dropna()
res_5 = solve_poisson(dd)
ans_5 = "dcar = "
for i in range(len(res_5['params'])):
    ans_5 += f"{res_5['params'][i]:.4f}*{res_5['names'][i]} + "
ans_5 += str(format(res_5['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_5, "Среднее", res_5['adj_r2_mean'], res_5['mae_mean'], res_5['mre_mean'], res_5['aic'], res_5['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_5, "Макс. вер-ть", res_5['adj_r2_mode'], res_5['mae_mode'], res_5['mre_mode'], res_5['aic'], res_5['likelihood_ratio']]
res_5

{'names': array(['length', 'speed'], dtype=object),
 'intercept': 0.11909401639741833,
 'params': array([-0.00228861,  0.03084733]),
 'formula': 'exp(-0.002*length + 0.031*speed + 0.119)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([1.16015353, 2.57565451, 1.30651654, 2.4242068 , 1.12491224,
        2.12814181, 8.06357614, 1.33676242, 1.74045212, 1.82161126,
        1.08576028, 1.52956765, 7.66838126, 1.83014742, 1.02313803,
        1.29615638, 1.61770291, 2.4242068 , 1.32155295, 1.67236871,
        8.3161922 , 2.01660997, 2.09952952, 1.40719387, 1.69145008,
        1.88560054, 1.41378805, 1.57215685, 2.78631031, 1.52971736,
        1.48487564, 1.56871644, 1.60679132, 1.82196788, 7.03874144,
        1.12748967, 1.76873231, 1.76451628, 1.43633855, 1.51231248]),
 'y_pred_mode': array([1, 2, 1, 2, 1, 2, 8, 1,

In [9]:
# Тест 6
dd = df[['length', 'maxder', 'dcar']].copy().dropna()
res_6 = solve_poisson(dd)
ans_6 = "dcar = "
for i in range(len(res_6['params'])):
    ans_6 += f"{res_6['params'][i]:.4f}*{res_6['names'][i]} + "
ans_6 += str(format(res_6['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_6, "Среднее", res_6['adj_r2_mean'], res_6['mae_mean'], res_6['mre_mean'], res_6['aic'], res_6['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_6, "Макс. вер-ть", res_6['adj_r2_mode'], res_6['mae_mode'], res_6['mre_mode'], res_6['aic'], res_6['likelihood_ratio']]
res_6

{'names': array(['length', 'maxder'], dtype=object),
 'intercept': 0.5561656082001257,
 'params': array([-0.00907922,  0.02118982]),
 'formula': 'exp(-0.009*length + 0.021*maxder + 0.556)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([3.4258777 , 1.23138407, 3.6618136 , 2.66467312, 1.53134208,
        3.5633141 , 1.67185734, 1.25020885, 1.84747812, 5.97996958,
        0.94347071, 2.65676629, 1.09763212, 1.62201005, 1.58319499,
        2.85683285, 0.95496617, 2.39679467, 1.67690804, 2.99849726,
        4.43118449, 1.11444543, 2.07873982, 4.12074169, 1.47670422,
        2.59291842, 1.19107147, 3.4362786 , 0.69920996, 1.95680552,
        3.01686671, 2.97139646, 3.14732742, 2.41129804, 1.05846891,
        0.99028502, 1.68187361, 2.85683285, 2.24256653, 1.91000723]),
 'y_pred_mode': array([3, 1, 3, 2, 1, 3, 1, 1

In [10]:
# Тест 7
dd = df[['length', 'weight', 'dcar']].copy().dropna()
res_7 = solve_poisson(dd)
ans_7 = "dcar = "
for i in range(len(res_7['params'])):
    ans_7 += f"{res_7['params'][i]:.4f}*{res_7['names'][i]} + "
ans_7 += str(format(res_7['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_7, "Среднее", res_7['adj_r2_mean'], res_7['mae_mean'], res_7['mre_mean'], res_7['aic'], res_7['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_7, "Макс. вер-ть", res_7['adj_r2_mode'], res_7['mae_mode'], res_7['mre_mode'], res_7['aic'], res_7['likelihood_ratio']]
res_7

{'names': array(['length', 'weight'], dtype=object),
 'intercept': 0.7758856233426799,
 'params': array([ 6.25430776e-03, -9.12515008e-05]),
 'formula': 'exp(0.006*length + -0.000*weight + 0.776)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([2.13043777, 1.98163284, 2.89328397, 2.42109309, 2.64069533,
        2.2934841 , 2.893548  , 2.44778523, 1.92186752, 3.27144847,
        2.29200527, 2.16207649, 1.8896028 , 2.23531496, 2.03143696,
        1.94973533, 2.88841412, 2.67624121, 2.37705939, 2.23407136,
        2.89407613, 1.74550375, 2.21190895, 1.86595848, 1.88736255,
        2.44429482, 1.94654763, 1.78577812, 2.53882478, 1.92187953,
        2.08885728, 2.0930661 , 1.9912684 , 2.12696791, 1.89536765,
        1.86631072, 2.48081867, 2.35983518, 3.05472218, 2.16941575]),
 'y_pred_mode': array([2, 1, 2, 2, 2,

In [11]:
# Тест 8
dd = df[['speed', 'weight', 'dcar']].copy().dropna()
res_8 = solve_poisson(dd)
ans_8 = "dcar = "
for i in range(len(res_8['params'])):
    ans_8 += f"{res_8['params'][i]:.4f}*{res_8['names'][i]} + "
ans_8 += str(format(res_8['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_8, "Среднее", res_8['adj_r2_mean'], res_8['mae_mean'], res_8['mre_mean'], res_8['aic'], res_8['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_8, "Макс. вер-ть", res_8['adj_r2_mode'], res_8['mae_mode'], res_8['mre_mode'], res_8['aic'], res_8['likelihood_ratio']]
res_8

{'names': array(['speed', 'weight'], dtype=object),
 'intercept': 0.10672567713156512,
 'params': array([ 3.03119708e-02, -3.05693916e-05]),
 'formula': 'exp(0.030*speed + -0.000*weight + 0.107)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([1.14978126, 2.44778334, 1.43212783, 2.47127293, 1.19864233,
        2.13626835, 8.56426384, 1.38278315, 1.65417518, 2.07822193,
        1.10433777, 1.51937194, 7.06180355, 1.82953906, 1.00028794,
        1.24306086, 1.76594435, 2.55562948, 1.35499825, 1.66692972,
        8.82837784, 1.85964372, 2.06731345, 1.33162254, 1.59842139,
        1.92936092, 1.35080376, 1.46138149, 2.89985638, 1.45468876,
        1.45566741, 1.5317696 , 1.54302914, 1.78456686, 6.49524682,
        1.06930922, 1.82109579, 1.79435864, 1.60710644, 1.4991568 ]),
 'y_pred_mode': array([1, 2, 1, 2, 1, 

In [12]:
# Тест 9
dd = df[['length', 'speed', 'weight', 'dcar']].copy().dropna()
res_9 = solve_poisson(dd)
ans_9 = "dcar = "
for i in range(len(res_9['params'])):
    ans_9 += f"{res_9['params'][i]:.4f}*{res_9['names'][i]} + "
ans_9 += str(format(res_9['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_9, "Среднее", res_9['adj_r2_mean'], res_9['mae_mean'], res_9['mre_mean'], res_9['aic'], res_9['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_9, "Макс. вер-ть", res_9['adj_r2_mode'], res_9['mae_mode'], res_9['mre_mode'], res_9['aic'], res_9['likelihood_ratio']]
res_9

{'names': array(['length', 'speed', 'weight'], dtype=object),
 'intercept': 0.15787129681392334,
 'params': array([-1.08435260e-03,  3.04726240e-02, -2.71961692e-05]),
 'formula': 'exp(-0.001*length + 0.030*speed + -0.000*weight + 0.158)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([1.14383853, 2.50393128, 1.40719957, 2.48123546, 1.1828298 ,
        2.14230705, 8.49530469, 1.3788518 , 1.65818554, 1.98556967,
        1.09344585, 1.49515892, 7.14428197, 1.81658073, 0.99537986,
        1.24979364, 1.73578464, 2.55644596, 1.34685347, 1.69381103,
        8.75863996, 1.82557401, 2.1613224 , 1.32142298, 1.60448075,
        1.9750265 , 1.37144916, 1.46319735, 2.79579407, 1.47078258,
        1.45106295, 1.55839561, 1.56897868, 1.81287123, 6.58403238,
        1.06974079, 1.86099009, 1.79427148, 1.54263147, 1.5025461

In [13]:
# Тест 10
dd = df[['commonlength', 'maxder', 'load', 'dcar']].copy().dropna()
res_10 = solve_poisson(dd)
ans_10 = "dcar = "
for i in range(len(res_10['params'])):
    ans_10 += f"{res_10['params'][i]:.4f}*{res_10['names'][i]} + "
ans_10 += str(format(res_10['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_10, "Среднее", res_10['adj_r2_mean'], res_10['mae_mean'], res_10['mre_mean'], res_10['aic'], res_10['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_10, "Макс. вер-ть", res_10['adj_r2_mode'], res_10['mae_mode'], res_10['mre_mode'], res_10['aic'], res_10['likelihood_ratio']]
res_10

{'names': array(['commonlength', 'maxder', 'load'], dtype=object),
 'intercept': 0.6778102147026461,
 'params': array([-0.00758782,  0.01957089, -0.23676341]),
 'formula': 'exp(-0.008*commonlength + 0.020*maxder + -0.237*load + 0.678)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([3.23300006, 1.14891309, 4.06024037, 2.76024416, 1.71748952,
        3.48305564, 1.96829582, 1.37930464, 1.694275  , 6.54158374,
        1.02383825, 2.60742286, 1.05114166, 1.66719821, 1.54572819,
        2.56580333, 1.15309824, 2.63680529, 1.74872354, 2.90272431,
        4.77002544, 1.07901836, 2.09333817, 3.60329961, 1.36071116,
        2.83034333, 1.10673164, 2.95445802, 0.81358561, 1.75980399,
        2.84364181, 2.70256498, 2.77167416, 2.24750032, 1.00429105,
        0.93865267, 1.90935494, 2.8483385 , 2.58233109, 1.89143542])

In [14]:
# Тест 11
dd = df[['length', 'maxder', 'speed', 'weight', 'dcar']].copy().dropna()
res_11 = solve_poisson(dd)
ans_11 = "dcar = "
for i in range(len(res_11['params'])):
    ans_11 += f"{res_11['params'][i]:.4f}*{res_11['names'][i]} + "
ans_11 += str(format(res_11['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_11, "Среднее", res_11['adj_r2_mean'], res_11['mae_mean'], res_11['mre_mean'], res_11['aic'], res_11['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_11, "Макс. вер-ть", res_11['adj_r2_mode'], res_11['mae_mode'], res_11['mre_mode'], res_11['aic'], res_11['likelihood_ratio']]
res_11

{'names': array(['length', 'maxder', 'speed', 'weight'], dtype=object),
 'intercept': -0.3414926391029168,
 'params': array([-0.0251866 ,  0.02746679,  0.03749348,  0.00016347]),
 'formula': 'exp(-0.025*length + 0.027*maxder + 0.037*speed + 0.000*weight + -0.341)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([ 1.90251469,  1.49443868,  1.38646489,  2.65719263,  0.43924273,
         3.64863058,  4.58338552,  0.47494854,  1.68055606,  3.20093762,
         0.28975017,  1.8797679 ,  5.34656773,  1.15450733,  0.65420108,
         2.00711148,  0.31595327,  1.93565294,  0.72435899,  2.26063549,
        16.82855228,  1.26163136,  1.85324795,  3.89749831,  1.25349341,
         1.83163308,  0.71530144,  3.7952642 ,  0.52409187,  1.5380555 ,
         2.25609047,  2.32441972,  2.82413382,  2.06884011,  4.56338255,
    

In [15]:
# Тест 12
dd = df[['length', 'commonlength', 'maxder', 'speed', 'weight', 'load', 'dcar']].copy().dropna()
res_12 = solve_poisson(dd)
ans_12 = "dcar = "
for i in range(len(res_12['params'])):
    ans_12 += f"{res_12['params'][i]:.4f}*{res_12['names'][i]} + "
ans_12 += str(format(res_12['intercept'], '.4f'))
result_table.loc[len(result_table)] = [ans_12, "Среднее", res_12['adj_r2_mean'], res_12['mae_mean'], res_12['mre_mean'], res_12['aic'], res_12['likelihood_ratio']]
result_table.loc[len(result_table)] = [ans_12, "Макс. вер-ть", res_12['adj_r2_mode'], res_12['mae_mode'], res_12['mre_mode'], res_12['aic'], res_12['likelihood_ratio']]
res_12

{'names': array(['length', 'commonlength', 'maxder', 'speed', 'weight', 'load'],
       dtype=object),
 'intercept': 0.005815406450666909,
 'params': array([-3.36318859e-01,  2.85044153e-01,  2.61927686e-02,  3.44769784e-02,
         6.60482798e-04, -2.30045159e+00]),
 'formula': 'exp(-0.336*length + 0.285*commonlength + 0.026*maxder + 0.034*speed + 0.001*weight + -2.300*load + 0.006)',
 'y': array([ 1,  1,  4, 10,  1,  4,  2,  1,  5,  2,  1,  3,  6,  3,  2,  1,  1,
         2,  3,  1, 22,  7,  3,  1,  3,  2,  3,  4,  1,  1,  4,  9,  2,  1,
         3,  1,  2,  6,  1,  1], dtype=int64),
 'y_pred_mean': array([ 1.63287163,  0.79478747,  1.11203634,  2.21733371,  0.51950647,
         2.86893667,  3.22527419,  0.43472178,  2.47702068,  1.53453189,
         0.27069932,  1.76967781,  3.89614566,  1.00283279,  0.60499376,
         1.51483063,  0.46950894,  3.2024609 ,  1.12482365,  1.71372317,
        19.68943521,  2.55356766,  1.6869133 ,  5.20267259,  1.83918253,
         2.46143272,  0.59

In [None]:
styled_results = result_table.style.format("{:.4f}", subset=["R^2_adj"]).format("{:.4f}", subset=["Δ"]).format("{:.4f}", subset=["𝛿"]).format("{:.4f}", subset=["AIC"]).format("{:.4f}", subset=["Отношение правдоподобия"]).highlight_max(color="grey", subset=["R^2_adj"])  

styled_results

Unnamed: 0,Зависимость,Прогноз,R^2_adj,Δ,𝛿,AIC,Отношение правдоподобия
0,dcar = 0.0306*speed + -0.0187,Среднее,0.2331,2.1391,0.5354,194.1376,42.5379
1,dcar = 0.0306*speed + -0.0187,Макс. вер-ть,0.2135,2.075,0.5055,194.1376,42.5379
2,dcar = 0.0023*length + 0.6732,Среднее,-0.0258,2.2412,0.3758,236.5128,0.1627
3,dcar = 0.0023*length + 0.6732,Макс. вер-ть,-0.0319,2.125,0.3283,236.5128,0.1627
4,dcar = 0.0179*maxder + 0.1151,Среднее,0.0409,2.2168,0.4708,217.6235,19.052
5,dcar = 0.0179*maxder + 0.1151,Макс. вер-ть,0.0387,2.075,0.3955,217.6235,19.052
6,dcar = -0.0001*weight + 1.0741,Среднее,-0.0184,2.2864,0.427,235.2558,1.4197
7,dcar = -0.0001*weight + 1.0741,Макс. вер-ть,-0.0673,2.2,0.3646,235.2558,1.4197
8,dcar = -0.0023*length + 0.0308*speed + 0.1191,Среднее,0.2109,2.1446,0.5413,196.0045,42.6709
9,dcar = -0.0023*length + 0.0308*speed + 0.1191,Макс. вер-ть,0.1865,2.05,0.5055,196.0045,42.6709


### Заключение

В рамках данной лабораторной работы была рассмотрена пуассоновская регрессия, с помощью нее был построен прогноз количества подвижных единиц грузового поезда, сошедших с рельсов. Наилучшим образом показала себя модель с параметрами `length`, `commonlength`, `maxder`, `speed`, `weight` и `load`. Скорректированный коэффициент детерминации $R^2_{adj}$ для прогноза в виде среднего равен $0.6207$, а для прогноза в виде значения с максимальной вероятностью - $0.5825$. Значение AIC для этой модели минимальное из рассмотренных, а отношение правдоподобия - максимальное, что тоже является плюсом. Недостатком этой модели является ее сложноть, то есть большое количество признаков, на такой маленькой выборке и с таким большим числом признаков есть риск переобучения модели.