# Report Examples

Let's see how we can create html report with different models

First, we need to install Insolver library

In [None]:
!pip install insolver

## Import libraries and prepare data

In [2]:
# Import key libraries we potentially need to start this pipeline.

import numpy as np
import pandas as pd
from hyperopt import hp
from sklearn.metrics import mean_gamma_deviance, mean_squared_error

# InsolverDataFrame is a special class based on pandas data frame with additional properties, which allow you to use additional methods for data frames. We will look at these methods later.
from insolver import InsolverDataFrame 

# Insolver transformation is used for data transformation especially during inference. 
# After you experiment with dataset, you can register your own transformations and then use them anywhere. Also, it is extremely useful during model implementation.
from insolver.transforms import (
    InsolverTransform,
    TransformExp,
    TransformAge,
    TransformMapValues,
    TransformPolynomizer,
    TransformAgeGender,
)

# Special wrappers allow you to create models with simple interfaces, 
# here we import special GLM models which are very often used in insurance, GBM models which became very popular last year and Trivial models to compare our model with trivial ones.

from insolver.wrappers import InsolverGLMWrapper, InsolverGBMWrapper, InsolverTrivialWrapper, InsolverRFWrapper
from insolver.model_tools import ModelMetricsCompare, deviance_gamma

In [3]:
# We can set up user transformations
class TransformSocioCateg:
    def __init__(self, column_socio_categ):
        self.priority = 0
        self.column_socio_categ = column_socio_categ

    def __call__(self, df):
        df[self.column_socio_categ] = df[self.column_socio_categ].str.slice(0,4)
        return df

In [4]:
# Add method to renew experience function
@staticmethod
def new_exp(exp, exp_max):
    if pd.isnull(exp):
        exp = None
    elif exp < 0:
        exp = None
    else:
        exp = exp * 7 // 365
    if exp > exp_max:
        exp = exp_max
    return exp

In [5]:
# put data to pandas dataframe
pd.options.display.float_format = '{:.2f}'.format
df = pd.read_csv('freMPL-R.csv', low_memory=False)
df = df[df.Dataset.isin([5, 6, 7, 8, 9])]
df.dropna(how='all', axis=1, inplace=True)
df = df[df.ClaimAmount > 0]

# Transfer our dataframe to InsolverDataFrame to get additional possibilities for analytics and dataframe transforms.
InsDataFrame = InsolverDataFrame(df)

# Add method to renew experience function
TransformExp._exp = new_exp

# After that we can combine all transformations into one one object
InsTransforms = InsolverTransform(InsDataFrame, [
    TransformSocioCateg('SocioCateg'),
    TransformAge('DrivAge', 18, 75),
    TransformExp('LicAge', 57),
    TransformMapValues('Gender', {'Male':0, 'Female':1}),
    TransformMapValues('MariStat', {'Other':0, 'Alone':1}),
    TransformAgeGender('DrivAge', 'Gender', 'Age_m', 'Age_f', age_default=18, gender_male=0, gender_female=1),
    TransformPolynomizer('Age_m'),
    TransformPolynomizer('Age_f'),
])

# Now we are ready to implement transformations
InsTransforms.ins_transform()

# Classical train test split of transformations
train, valid, test = InsTransforms.split_frame(val_size=0.15, test_size=0.15, random_state=0, shuffle=True)

# Lets take features and target
features = ['LicAge', 'Gender', 'MariStat', 'DrivAge', 'HasKmLimit', 'BonusMalus', 'RiskArea',
        'Age_m', 'Age_f', 'Age_m_2', 'Age_f_2']
target = 'ClaimAmount'

# Split on train, validation and test data
x_train, y_train = train[features], train[target]
x_valid, y_valid = valid[features], valid[target]
x_test, y_test = test[features], test[target]
offset_train = train['Exposure']
offset_valid = valid['Exposure']
offset_test = test['Exposure']

## Report creation

Let's train some models and see how we can create reports for them 

To create a report we need `insolver.report.Report` class

In [6]:
from insolver.report import Report

**Random Forest**

In [7]:
irf = InsolverRFWrapper(backend='sklearn', task='reg')
irf.fit(x_train, y_train)
predict_rf = irf.predict(x_test)
predict_rf_train = irf.predict(x_train)
predict_rf_test = irf.predict(x_test)

# To use Report we need to pass as parameters
# model:             model instanse
# task:              'reg' for regression and 'class' for classification
# X_train, y_train:  train dataset
# predicted_train:   model predictions for train dataset
# X_test, y_test:    test dataset
# predicted_test:    model predictions for test dataset

r = Report(model=irf,
           task='reg',
           X_train=x_train,
           y_train=y_train,
           predicted_train=pd.Series(predict_rf_train),
           X_test=x_test,
           y_test=y_test,
           predicted_test=pd.Series(predict_rf_test),
           )

# To create an html file we use `Report.to_html()` method
# Parameters are:
# path:        existing directory to save report (default '.')
# report_name: name of created report directory (default 'report')

r.to_html(report_name='0_random_forest_report')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

---

Let's check if the report has been created

In [8]:
!ls -l | grep report

drwxr-xr-x  2 jovyan users     4096 Dec  3 11:04 0_random_forest_report


In [9]:
!ls -l $(ls | grep report | tail -n 1) | grep .html

-rw-r--r-- 1 jovyan users 5924577 Dec  3 11:04 profiling_report.html
-rw-r--r-- 1 jovyan users  220091 Dec  3 11:04 report.html


Now you can open the `report.html` from the directory where it was created

---

Same way you can create report for other models and dataframes

In [10]:
# iglm_h2o
iglm = InsolverGLMWrapper(backend='h2o', family='gamma', link='log')
iglm.fit(x_train, y_train, sample_weight=offset_train, X_valid=x_valid, y_valid=y_valid, sample_weight_valid=offset_valid)
predict_glm_train = iglm.predict(x_train, sample_weight=offset_train)
predict_glm_test = iglm.predict(x_test, sample_weight=offset_test)

r = Report(model=iglm,
           task='reg',
           X_train=x_train,
           y_train=y_train,
           predicted_train=pd.Series(predict_glm_train),
           X_test=x_test,
           y_test=y_test,
           predicted_test=pd.Series(predict_glm_test),
           )
r.to_html(report_name='1_glm_h2o_report')

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.11" 2021-04-20; OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04); OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)
  Starting server from /opt/conda/lib/python3.9/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp0wvsnmzi
  JVM stdout: /tmp/tmp0wvsnmzi/h2o_jovyan_started_from_python.out
  JVM stderr: /tmp/tmp0wvsnmzi/h2o_jovyan_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,03 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.32.0.3
H2O_cluster_version_age:,11 months and 8 days !!!
H2O_cluster_name:,H2O_from_python_jovyan_xk67ru
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,1.430 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


Model Details
H2OGeneralizedLinearEstimator :  Generalized Linear Modeling
Model Key:  GLM_model_python_1638529468365_1


GLM Model: summary


Unnamed: 0,Unnamed: 1,family,link,regularization,number_of_predictors_total,number_of_active_predictors,number_of_iterations,training_frame
0,,gamma,log,"Elastic Net (alpha = 0.5, lambda = 0.001825 )",11,10,8,Key_Frame__upload_bed28a22283c84f2a5e80d70d1b24d4b.hex




ModelMetricsRegressionGLM: glm
** Reported on train data. **

MSE: 86347495.37806311
RMSE: 9292.335302713904
MAE: 2350.2669145955892
RMSLE: 1.7468139514234649
R^2: -0.011339170162486756
Mean Residual Deviance: 2.0827971487219292
Null degrees of freedom: 10871
Residual degrees of freedom: 10861
Null deviance: 56280.753410690704
Residual deviance: 22644.170600904814
AIC: NaN

ModelMetricsRegressionGLM: glm
** Reported on validation data. **

MSE: 292795010.338008
RMSE: 17111.25390899241
MAE: 2565.6489323946876
RMSLE: 1.755816358282092
R^2: -0.003847731214389194
Mean Residual Deviance: 2.381531192097885
Null degrees of freedom: 2329
Residual degrees of freedom: 2319
Null deviance: 12098.930539986566
Residual deviance: 5548.967677588073
AIC: NaN

Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,iterations,negative_log_likelihood,objective,training_rmse,training_deviance,training_mae,training_r2,validation_rmse,validation_deviance,validation_mae,validation_r2
0,,2021-12-03 11:04:39,0.000 sec,0,56280.75,5.18,,,,,,,,
1,,2021-12-03 11:04:39,0.213 sec,1,39256.22,3.61,,,,,,,,
2,,2021-12-03 11:04:39,0.258 sec,2,27770.77,2.55,,,,,,,,
3,,2021-12-03 11:04:39,0.292 sec,3,23284.3,2.14,,,,,,,,
4,,2021-12-03 11:04:39,0.323 sec,4,22673.04,2.09,,,,,,,,
5,,2021-12-03 11:04:39,0.364 sec,5,22655.78,2.08,,,,,,,,
6,,2021-12-03 11:04:39,0.399 sec,6,22655.79,2.08,,,,,,,,
7,,2021-12-03 11:04:40,0.500 sec,7,22644.85,2.08,,,,,,,,
8,,2021-12-03 11:04:40,0.548 sec,8,22644.17,2.08,9292.34,2.08,2350.27,-0.01,17111.25,2.38,2565.65,-0.0



Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,Age_f_2,0.35,1.0,0.25
1,Age_f,0.27,0.76,0.19
2,Age_m_2,0.2,0.57,0.15
3,BonusMalus,0.11,0.32,0.08
4,MariStat,0.11,0.31,0.08
5,RiskArea,0.09,0.26,0.07
6,DrivAge,0.09,0.26,0.06
7,Gender,0.08,0.23,0.06
8,LicAge,0.07,0.2,0.05
9,HasKmLimit,0.02,0.04,0.01


Model Details
H2OGeneralizedLinearEstimator :  Generalized Linear Modeling
Model Key:  GLM_model_python_1638529468365_1


GLM Model: summary


Unnamed: 0,Unnamed: 1,family,link,regularization,number_of_predictors_total,number_of_active_predictors,number_of_iterations,training_frame
0,,gamma,log,"Elastic Net (alpha = 0.5, lambda = 0.001825 )",11,10,8,Key_Frame__upload_bed28a22283c84f2a5e80d70d1b24d4b.hex




ModelMetricsRegressionGLM: glm
** Reported on train data. **

MSE: 86347495.37806311
RMSE: 9292.335302713904
MAE: 2350.2669145955892
RMSLE: 1.7468139514234649
R^2: -0.011339170162486756
Mean Residual Deviance: 2.0827971487219292
Null degrees of freedom: 10871
Residual degrees of freedom: 10861
Null deviance: 56280.753410690704
Residual deviance: 22644.170600904814
AIC: NaN

ModelMetricsRegressionGLM: glm
** Reported on validation data. **

MSE: 292795010.338008
RMSE: 17111.25390899241
MAE: 2565.6489323946876
RMSLE: 1.755816358282092
R^2: -0.003847731214389194
Mean Residual Deviance: 2.381531192097885
Null degrees of freedom: 2329
Residual degrees of freedom: 2319
Null deviance: 12098.930539986566
Residual deviance: 5548.967677588073
AIC: NaN

Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,iterations,negative_log_likelihood,objective,training_rmse,training_deviance,training_mae,training_r2,validation_rmse,validation_deviance,validation_mae,validation_r2
0,,2021-12-03 11:04:39,0.000 sec,0,56280.75,5.18,,,,,,,,
1,,2021-12-03 11:04:39,0.213 sec,1,39256.22,3.61,,,,,,,,
2,,2021-12-03 11:04:39,0.258 sec,2,27770.77,2.55,,,,,,,,
3,,2021-12-03 11:04:39,0.292 sec,3,23284.3,2.14,,,,,,,,
4,,2021-12-03 11:04:39,0.323 sec,4,22673.04,2.09,,,,,,,,
5,,2021-12-03 11:04:39,0.364 sec,5,22655.78,2.08,,,,,,,,
6,,2021-12-03 11:04:39,0.399 sec,6,22655.79,2.08,,,,,,,,
7,,2021-12-03 11:04:40,0.500 sec,7,22644.85,2.08,,,,,,,,
8,,2021-12-03 11:04:40,0.548 sec,8,22644.17,2.08,9292.34,2.08,2350.27,-0.01,17111.25,2.38,2565.65,-0.0



Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,Age_f_2,0.35,1.0,0.25
1,Age_f,0.27,0.76,0.19
2,Age_m_2,0.2,0.57,0.15
3,BonusMalus,0.11,0.32,0.08
4,MariStat,0.11,0.31,0.08
5,RiskArea,0.09,0.26,0.07
6,DrivAge,0.09,0.26,0.06
7,Gender,0.08,0.23,0.06
8,LicAge,0.07,0.2,0.05
9,HasKmLimit,0.02,0.04,0.01


Model Details
H2OGeneralizedLinearEstimator :  Generalized Linear Modeling
Model Key:  GLM_model_python_1638529468365_1


GLM Model: summary


Unnamed: 0,Unnamed: 1,family,link,regularization,number_of_predictors_total,number_of_active_predictors,number_of_iterations,training_frame
0,,gamma,log,"Elastic Net (alpha = 0.5, lambda = 0.001825 )",11,10,8,Key_Frame__upload_bed28a22283c84f2a5e80d70d1b24d4b.hex




ModelMetricsRegressionGLM: glm
** Reported on train data. **

MSE: 86347495.37806311
RMSE: 9292.335302713904
MAE: 2350.2669145955892
RMSLE: 1.7468139514234649
R^2: -0.011339170162486756
Mean Residual Deviance: 2.0827971487219292
Null degrees of freedom: 10871
Residual degrees of freedom: 10861
Null deviance: 56280.753410690704
Residual deviance: 22644.170600904814
AIC: NaN

ModelMetricsRegressionGLM: glm
** Reported on validation data. **

MSE: 292795010.338008
RMSE: 17111.25390899241
MAE: 2565.6489323946876
RMSLE: 1.755816358282092
R^2: -0.003847731214389194
Mean Residual Deviance: 2.381531192097885
Null degrees of freedom: 2329
Residual degrees of freedom: 2319
Null deviance: 12098.930539986566
Residual deviance: 5548.967677588073
AIC: NaN

Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,iterations,negative_log_likelihood,objective,training_rmse,training_deviance,training_mae,training_r2,validation_rmse,validation_deviance,validation_mae,validation_r2
0,,2021-12-03 11:04:39,0.000 sec,0,56280.75,5.18,,,,,,,,
1,,2021-12-03 11:04:39,0.213 sec,1,39256.22,3.61,,,,,,,,
2,,2021-12-03 11:04:39,0.258 sec,2,27770.77,2.55,,,,,,,,
3,,2021-12-03 11:04:39,0.292 sec,3,23284.3,2.14,,,,,,,,
4,,2021-12-03 11:04:39,0.323 sec,4,22673.04,2.09,,,,,,,,
5,,2021-12-03 11:04:39,0.364 sec,5,22655.78,2.08,,,,,,,,
6,,2021-12-03 11:04:39,0.399 sec,6,22655.79,2.08,,,,,,,,
7,,2021-12-03 11:04:40,0.500 sec,7,22644.85,2.08,,,,,,,,
8,,2021-12-03 11:04:40,0.548 sec,8,22644.17,2.08,9292.34,2.08,2350.27,-0.01,17111.25,2.38,2565.65,-0.0



Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,Age_f_2,0.35,1.0,0.25
1,Age_f,0.27,0.76,0.19
2,Age_m_2,0.2,0.57,0.15
3,BonusMalus,0.11,0.32,0.08
4,MariStat,0.11,0.31,0.08
5,RiskArea,0.09,0.26,0.07
6,DrivAge,0.09,0.26,0.06
7,Gender,0.08,0.23,0.06
8,LicAge,0.07,0.2,0.05
9,HasKmLimit,0.02,0.04,0.01


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [11]:
# iglm_sklearn
iglm2 = InsolverGLMWrapper(backend='sklearn', family='gamma', link='log', standardize=True)
iglm2.fit(x_train, y_train, sample_weight=offset_train)
predict_glm2_train = iglm2.predict(x_train, sample_weight=offset_train)
predict_glm2_test = iglm2.predict(x_test, sample_weight=offset_test)

r = Report(model=iglm2,
           task='reg',
           X_train=x_train,
           y_train=y_train,
           predicted_train=pd.Series(predict_glm2_train),
           X_test=x_test,
           y_test=y_test,
           predicted_test=pd.Series(predict_glm2_test),
           )
r.to_html(report_name='2_glm_sklearn_report')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [12]:
# igbm_xgboost
igbm = InsolverGBMWrapper(backend='xgboost', task='reg', n_estimators=100, objective='gamma', tree_method='hist')
igbm.fit(x_train, y_train, sample_weight=offset_train)
predict_gbm_train = igbm.predict(x_train)
predict_gbm_test = igbm.predict(x_test)

r = Report(model=igbm,
           task='reg',
           X_train=x_train,
           y_train=y_train,
           predicted_train=pd.Series(predict_gbm_train),
           X_test=x_test,
           y_test=y_test,
           predicted_test=pd.Series(predict_gbm_test),
           )
r.to_html(report_name='3_gbm_xgboost_report')


ntree_limit is deprecated, use `iteration_range` or model slicing instead.



Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [13]:
# igbm_lightgbm
igbm2 = InsolverGBMWrapper(backend='lightgbm', task='reg', n_estimators=100, objective='gamma', metric='gamma_deviance',
                       boosting_type='goss')
igbm2.fit(x_train, y_train, sample_weight=offset_train)
predict_gbm2_train = igbm2.predict(x_train)
predict_gbm2_test = igbm2.predict(x_test)

r = Report(model=igbm2,
           task='reg',
           X_train=x_train,
           y_train=y_train,
           predicted_train=pd.Series(predict_gbm2_train),
           X_test=x_test,
           y_test=y_test,
           predicted_test=pd.Series(predict_gbm2_test),
           )
r.to_html(report_name='4_gbm_lightgbm_report')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [14]:
# igbm3
igbm3 = InsolverGBMWrapper(backend='catboost', task='reg', n_estimators=100, objective='gamma', silent=True)
igbm3.fit(x_train, y_train, sample_weight=offset_train)
predict_gbm3_train = igbm3.predict(x_train)
predict_gbm3_test = igbm3.predict(x_test)

r = Report(model=igbm3,
           task='reg',
           X_train=x_train,
           y_train=y_train,
           predicted_train=pd.Series(predict_gbm3_train),
           X_test=x_test,
           y_test=y_test,
           predicted_test=pd.Series(predict_gbm3_test),
           )
r.to_html(report_name='5_gbm_catboost_report')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [15]:
!ls -l | grep report

drwxr-xr-x  2 jovyan users     4096 Dec  3 11:04 0_random_forest_report
drwxr-xr-x  2 jovyan users     4096 Dec  3 11:05 1_glm_h2o_report
drwxr-xr-x  2 jovyan users     4096 Dec  3 11:06 2_glm_sklearn_report
drwxr-xr-x  2 jovyan users     4096 Dec  3 11:08 3_gbm_xgboost_report
drwxr-xr-x  2 jovyan users     4096 Dec  3 11:09 4_gbm_lightgbm_report
drwxr-xr-x  2 jovyan users     4096 Dec  3 11:10 5_gbm_catboost_report


In [16]:
!ls -l $(ls | grep report | tail -n 1) | grep .html

-rw-r--r-- 1 jovyan users 5924577 Dec  3 11:10 profiling_report.html
-rw-r--r-- 1 jovyan users   63150 Dec  3 11:10 report.html
