## Model

In [None]:
Markdown('Model used: **{}**'.format(model_name))

In [None]:
Markdown('Number of features in model: **{}**'.format(len(features_used)))

In [None]:
ets_ols_models = ['empWt',
                  'empWtBalanced',
                  'empWtDropNeg',
                  'empWtStep',
                  'empWtLasso',
                  "empWtNNLS",
                  'empWtDropNegLasso',
                  'empWtLassoBest']

ets_lasso_models = ['lassoWtLasso',
                    'lassoWtLassoBest']

In [None]:
display(Markdown('### Model summary'))

In [None]:
# we first just show a summary of that model
summary_file = join(output_dir, '{}_Rmodel_summary.txt'.format(experiment_id))
with open(summary_file, 'r') as summf:
    model_summary = summf.read()

In [None]:
print(model_summary)

### Standardized and Relative Regression Coefficients (Betas)

The relative coefficients are intended to show relative contribution of different feature and their primary purpose is to indentify whether one of the features has an unproportionate effect over the final score. They are computed as standardized/(sum of absolute values of standardized coefficients). 

Negative standardized coefficients are highlighted in <span style="color: red">red</span>.

**Note**: if the model contains negative coefficients, relative values will not sum up to one and their interpretation is generally questionable. 

In [None]:
markdown_str = """
**Note**: The coefficients were estimated using LASSO regression. Unlike OLS (standard) linear regression, lasso estimation is based on an optimization routine and therefore the exact estimates may differ across different systems. """

if model_name in ets_lasso_models:
    display(Markdown(markdown_str))

In [None]:
df_betas = pd.read_csv(join(output_dir, '{}_betas.csv'.format(experiment_id)))
df_betas.sort('feature', inplace=True)
display(HTML(df_betas.to_html(classes=['sortable'], 
                              index=False, 
                              escape=False,
                              float_format=float_format_func,
                              formatters={'standardized': color_highlighter})))

Here are the same values, shown graphically.

In [None]:
df_betas_sorted = df_betas.sort('standardized', ascending=False)
df_betas_sorted.reset_index(drop=True, inplace=True)
fig = plt.figure()
fig.set_size_inches(8, 3)
fig.subplots_adjust(bottom=0.5)
grey_colors = sns.color_palette('Greys', len(features_used))[::-1]
with sns.axes_style('whitegrid'):
    ax1=fig.add_subplot(121)
    sns.barplot("feature","standardized", data=df_betas_sorted, 
                order=df_betas_sorted['feature'].values,
                palette=sns.color_palette("Greys", 1), ax=ax1)
    ax1.set_xticklabels(df_betas_sorted['feature'].values, rotation=90)
    ax1.set_title('Values of standardized coefficients')
    ax1.set_xlabel('')
    ax1.set_ylabel('')
    # no pie chart if we have more than 15 features
    if len(features_used) <= 15:
        ax2=fig.add_subplot(133, aspect=True)
        ax2.pie(abs(df_betas_sorted['relative'].values), colors=grey_colors, 
            labels=df_betas_sorted['feature'].values)
        ax2.set_title('Proportional contribution of each feature')
    else:
        fig.set_size_inches(0.35*len(features_used), 3)
plt.savefig(join(figure_dir, '{}_betas.svg'.format(experiment_id)))

In [None]:
if model_name in ets_ols_models:
    display(Markdown('### Model diagnostics'))
    display(Markdown("These are standard plots for model diagnostics for the main model. All information is computed based on the training set."))

In [None]:
if model_name in ets_ols_models:
    imgfile = join(figure_dir, '{}_Rmodel_diagnostics.svg'.format(experiment_id))
    display(SVG(imgfile))