#  **<span style="color:orange"><center>Insurance Cost Predicion</span>**

# Setting up enviroment

In [0]:
!pip install pycaret

In [2]:
from pycaret.utils import enable_colab
enable_colab()

Colab mode activated.


In [3]:
from pycaret.regression import *

#Brief overview of techniques used for cleaning the dataset



Before we into the practical execution of the techniques mentioned above in Section 1, it is important to understand what are these techniques are and when to use them. More often than not most of these techniques will help linear and parametric algorithms, however it is not surprising to also see performance gains in tree-based models. The Below explanations are only brief and we recommend that you do extra reading to dive deeper and get a more thorough understanding of these techniques.

- **Normalization:** Normalization / Scaling (often used interchangeably with standardization) is used to transform the actual values of numeric variables in a way that provides helpful properties for machine learning. Many algorithms such as Linear Regression, Support Vector Machine and K Nearest Neighbors assume that all features are centered around zero and have variances that are at the same level of order. If a particular feature in a dataset has a variance that is larger in order of magnitude than other features, the model may not understand all features correctly and could perform poorly. __[Read more](https://sebastianraschka.com/Articles/2014_about_feature_scaling.html#z-score-standardization-or-min-max-scaling)__ <br/>
<br/>
- **Transformation:** While normalization transforms the range of data to remove the impact of magnitude in variance, transformation is a more radical technique as it changes the shape of the distribution so that transformed data can be represented by a normal or approximate normal distirbution. In general, you should transform the data if using algorithms that assume normality or a gaussian distribution. Examples of such models are Linear Regression, Lasso Regression and Ridge Regression. __[Read more](https://en.wikipedia.org/wiki/Power_transform)__<br/>
<br/>
- **Target Transformation:** This is similar to the `transformation` technique explained above with the exception that this is only applied to the target variable. __[Read more](https://scikit-learn.org/stable/auto_examples/compose/plot_transformed_target.html)__ to understand the effects of transforming the target variable in regression.<br/>
<br/>
- **Combine Rare Levels:** Sometimes categorical features have levels that are insignificant in the frequency distribution. As such, they may introduce noise into the dataset due to a limited sample size for learning. One way to deal with rare levels in categorical features is to combine them into a new class. <br/>
<br/>
- **Bin Numeric Variables:** Binning or discretization is the process of transforming numerical variables into categorical features. An example would be `Carat Weight` in this experiment. It is a continious distribution of numeric values that can be discretized into intervals. Binning may improve the accuracy of a predictive model by reducing the noise or non-linearity in the data. PyCaret automatically determines the number and size of bins using Sturges rule.  __[Read more](https://www.vosesoftware.com/riskwiki/Sturgesrule.php)__<br/>
<br/>
- **Model Ensembling and Stacking:** Ensemble modeling is a process where multiple diverse models are created to predict an outcome. This is achieved either by using many different modeling algorithms or using different samples of training data sets. The ensemble model then aggregates the predictions of each base model resulting in one final prediction for the unseen data. The motivation for using ensemble models is to reduce the generalization error of the prediction. As long as the base models are diverse and independent, the prediction error of the model decreases when the ensemble approach is used. The two most common methods in ensemble learning are `Bagging` and `Boosting`. Stacking is also a type of ensemble learning where predictions from multiple models are used as input features for a meta model that predicts the final outcome. __[Read more](https://blog.statsbot.co/ensemble-learning-d1dcd548e936)__<br/>
<br/>
- **Tuning Hyperparameters of Ensemblers:** Similar to hyperparameter tuning for a single machine learning model, we will also learn how to tune hyperparameters for an ensemble model.

#Getting the Data

In [4]:
from pycaret.datasets import get_data
dataset = get_data('insurance', profile=True)

  import pandas.util.testing as tm




Notice that when the `profile` parameter is to `True`, it displays a data profile for exploratory data analysis. Several pre-processing steps as discussed above will be performed in this experiment based on this analysis. 

In [5]:
#check the shape of data
dataset.shape

(1338, 7)

In order to demonstrate the `predict_model()` function on unseen data, 10% of the data has been withheld from the original dataset to be used for predictions. This should not be confused with a train/test split as this particular split is performed to simulate a real life scenario. 

In [6]:
data = dataset.sample(frac=0.9, random_state=786).reset_index(drop=True)
data_unseen = dataset.drop(data.index).reset_index(drop=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions ' + str(data_unseen.shape))

Data for Modeling: (1204, 7)
Unseen Data For Predictions (134, 7)


# Setting up Environment in PyCaret

In [0]:
from pycaret.regression import *

In [7]:
exp_reg102 = setup(data, target = 'charges', session_id = 123,
           normalize = True,
           polynomial_features = True, trigonometry_features = True, feature_interaction=True, 
           bin_numeric_features= ['age', 'bmi'])

 
Setup Succesfully Completed!


Unnamed: 0,Description,Value
0,session_id,123
1,Transform Target,False
2,Transform Target Method,
3,Original Data,"(1204, 7)"
4,Missing Values,False
5,Numeric Features,2
6,Categorical Features,4
7,Ordinal Features,False
8,High Cardinality Features,False
9,High Cardinality Method,


In [10]:
exp_reg102[0].columns

Index(['age_Power2', 'bmi_Power2', 'sex_male', 'children_0', 'children_1',
       'children_2', 'children_3', 'children_4', 'children_5', 'smoker_no',
       'region_northeast', 'region_northwest', 'region_southeast',
       'region_southwest', 'age_0.0', 'age_1.0', 'age_10.0', 'age_11.0',
       'age_2.0', 'age_3.0', 'age_4.0', 'age_5.0', 'age_6.0', 'age_7.0',
       'age_8.0', 'age_9.0', 'bmi_0.0', 'bmi_1.0', 'bmi_10.0', 'bmi_11.0',
       'bmi_2.0', 'bmi_3.0', 'bmi_4.0', 'bmi_5.0', 'bmi_6.0', 'bmi_7.0',
       'bmi_8.0', 'bmi_9.0', 'children_1_multiply_age_Power2',
       'bmi_7.0_multiply_age_Power2', 'smoker_no_multiply_children_0',
       'age_Power2_multiply_bmi_6.0', 'bmi_6.0_multiply_age_Power2',
       'sex_male_multiply_age_Power2', 'bmi_Power2_multiply_age_Power2',
       'sex_male_multiply_bmi_Power2', 'age_Power2_multiply_smoker_no',
       'smoker_no_multiply_bmi_Power2', 'children_1_multiply_bmi_Power2',
       'region_southeast_multiply_bmi_Power2', 'smoker_no_multiply

# Comparing All Models

In [8]:
compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extreme Gradient Boosting,2508.2145,20805065.5585,4477.1037,0.8494,0.4313,0.3001
1,Gradient Boosting Regressor,2568.1833,21792129.5713,4593.0445,0.8427,0.4414,0.3072
2,CatBoost Regressor,2584.84,21755211.0984,4569.981,0.8422,0.4424,0.3077
3,Ridge Regression,2960.7657,22012035.3637,4656.1805,0.8422,0.4276,0.3237
4,Bayesian Ridge,2967.9944,22026324.2118,4658.2703,0.8421,0.4273,0.3246
5,Lasso Regression,2929.6218,22022559.5917,4655.4949,0.842,0.4308,0.3182
6,Linear Regression,2970.2482,22289948.4885,4683.8513,0.8398,0.4349,0.3271
7,Random Forest,2513.7604,22146936.9973,4611.1424,0.8394,0.4496,0.3028
8,TheilSen Regressor,2878.6208,23150364.3828,4779.3927,0.8343,0.4892,0.2902
9,Lasso Least Angle Regression,2996.6517,23216493.5527,4779.8457,0.8341,0.4327,0.3314


#Ensemble a Model

### Blending

Blending is another common technique for ensembling that can be used in PyCaret. It creates multiple models and then averages the individual predictions to form a final prediction. If no list is passed, PyCaret uses all of the models available in the model library by default. Let's see an example below:

In [11]:
blend_all = blend_models()

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2933.75,20257780.0,4500.864,0.8709,0.4299,0.3635
1,115004400000000.0,1.1242119999999999e+30,1060289000000000.0,-5.946253e+21,2.8528,2584288000.0
2,2701.868,16249200.0,4031.03,0.8742,0.4087,0.3523
3,3076.718,26395240.0,5137.63,0.7725,0.4939,0.3445
4,3005.766,29530930.0,5434.237,0.8131,0.4237,0.2477
5,3520.88,28049330.0,5296.162,0.8049,0.6172,0.4065
6,14381.09,339792400.0,18433.46,-0.8674,1.284,1.7858
7,2701.887,22364140.0,4729.074,0.7909,0.4649,0.3634
8,2392.756,12865150.0,3586.802,0.9365,0.4087,0.3761
9,3002.05,28570250.0,5345.114,0.7488,0.4816,0.3235


Now that we have created a voting regressor using the `blend_models()` function. The model stored in the variable `blend_all` is just like any other model that you would create using `create_model()` or `tune_model()`. You can use this model for predictions on unseen data using `predict_model()` in the same way you would for any other model. Notice that since we didn't pass the list of specific models for voting, it uses all of the models in the model library by default. The next example will show how to pass a specific set of models for blending.

In [0]:
"""
we will create 3 specific models to be passed into blend_models().
Note that verbose is set to False to avoid printing score grid of individual models.
"""
xgboost = create_model('xgboost', verbose = False)
gbr = create_model('gbr', verbose = False)
catboost = create_model('catboost', verbose = False)
ridge=create_model('ridge',verbose=False)
huber = create_model('huber', verbose=False)

In [23]:
blend_specific_1 = blend_models(estimator_list = [xgboost,gbr,catboost])

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2476.9678,18314980.0,4279.6005,0.8833,0.4273,0.3163
1,1627.0145,6441511.0,2538.0132,0.9659,0.2991,0.2487
2,2432.5497,17229520.0,4150.8457,0.8666,0.4269,0.3498
3,2851.1386,26916030.0,5188.0663,0.768,0.5166,0.277
4,2964.2989,31455810.0,5608.5475,0.8009,0.4521,0.2403
5,2564.2409,19703920.0,4438.9093,0.8629,0.4265,0.319
6,3012.1151,26467150.0,5144.623,0.8545,0.4822,0.3915
7,2561.2114,23309800.0,4828.0224,0.782,0.45,0.2981
8,2106.7925,12143270.0,3484.7197,0.94,0.3854,0.3219
9,2465.9684,27773080.0,5270.0166,0.7559,0.4563,0.2391


In [24]:
blend_specific_2 = blend_models(estimator_list = [xgboost,gbr,catboost,ridge])

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2515.5991,17860540.0,4226.1738,0.8862,0.4128,0.311
1,1828.129,7122695.0,2668.8378,0.9623,0.3034,0.2578
2,2448.2934,16192070.0,4023.9369,0.8746,0.4132,0.333
3,2744.4616,25092870.0,5009.2781,0.7837,0.4892,0.2709
4,2944.3157,30062650.0,5482.9416,0.8097,0.442,0.2417
5,2596.3316,19086570.0,4368.8177,0.8672,0.4218,0.3263
6,2946.3715,23449740.0,4842.4932,0.8711,0.455,0.3691
7,2535.0544,22717710.0,4766.3101,0.7876,0.4375,0.2904
8,1970.5727,10905290.0,3302.3165,0.9461,0.3693,0.309
9,2488.1683,27325710.0,5227.3995,0.7598,0.4541,0.2461


In [43]:
blend_specific_3 = blend_models(estimator_list = [xgboost,gbr,catboost,ridge, huber])

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2382.2645,17540170.0,4188.0992,0.8882,0.3897,0.2662
1,1796.7329,7704956.0,2775.7802,0.9592,0.2636,0.2162
2,2303.1086,15319140.0,3913.9675,0.8814,0.3821,0.2812
3,2586.7173,24524710.0,4952.2432,0.7886,0.4778,0.2343
4,2817.4201,29329120.0,5415.6367,0.8144,0.4327,0.2126
5,2492.9521,19086260.0,4368.7825,0.8672,0.3977,0.2789
6,2733.9466,21676780.0,4655.8334,0.8809,0.4231,0.3114
7,2392.5583,22794250.0,4774.3325,0.7868,0.428,0.2491
8,1842.9706,10320430.0,3212.5432,0.949,0.3295,0.2569
9,2417.5399,28021760.0,5293.5582,0.7537,0.4467,0.2157


### Stacking

In [15]:
stack_1 = stack_models([xgboost,gbr,catboost])

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2710.4451,19123230.0,4373.0119,0.8782,0.4581,0.3232
1,2400.7478,12183540.0,3490.4933,0.9356,0.3381,0.3008
2,2770.0776,17101690.0,4135.4187,0.8676,0.4081,0.3346
3,2799.7085,23757350.0,4874.1507,0.7952,0.4904,0.2621
4,3206.8705,31801400.0,5639.2731,0.7987,0.4743,0.2632
5,2854.3041,21796080.0,4668.6269,0.8484,0.4509,0.3558
6,3173.2611,23242920.0,4821.0908,0.8723,0.4389,0.3607
7,2711.3178,23669950.0,4865.1775,0.7786,0.4551,0.3046
8,2106.469,10781920.0,3283.5839,0.9468,0.3842,0.3298
9,2670.8192,27822450.0,5274.6989,0.7554,0.4788,0.2643


In [19]:
stack_1 = stack_models([xgboost,gbr,catboost,ridge])

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2124.5172,9968707.0,3157.3259,0.9365,0.4868,0.2629
1,2566.1734,22036460.0,4694.3009,0.8834,0.4168,0.2882
2,2231.7965,10457750.0,3233.8451,0.919,0.3372,0.2539
3,2684.196,18143090.0,4259.4704,0.8436,0.441,0.3046
4,3034.1984,20303020.0,4505.8873,0.8715,0.4325,0.3252
5,2200.5464,11072830.0,3327.5865,0.923,0.3405,0.2637
6,2375.7411,13865410.0,3723.6283,0.9238,0.3379,0.261
7,2666.0186,16000540.0,4000.068,0.8504,0.5203,0.446
8,2040.0343,7516428.0,2741.6105,0.9629,0.5929,0.3394
9,2897.3191,19229510.0,4385.146,0.831,0.4713,0.3898


Before we wrap up this section, there is another parameter in `stack_models()` that we haven't seen yet called `restack`. This parameter controls the ability to expose the raw data to the meta model. When set to `True`, it exposes the raw data to the meta model along with all the predictions of the base level models. By default it is set to `True`. See the example below with the `restack` parameter changed to `False`.

In [28]:
stack_2 = stack_models([xgboost,gbr,catboost,ridge], restack = False)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2541.2435,17663320.0,4202.7748,0.8875,0.4018,0.3081
1,2106.6237,9168611.0,3027.9715,0.9515,0.3105,0.2721
2,2504.802,15585750.0,3947.8788,0.8793,0.3995,0.3225
3,2730.3236,24230440.0,4922.4423,0.7911,0.481,0.2587
4,2948.5749,29366750.0,5419.11,0.8141,0.4382,0.2374
5,2721.6126,20730710.0,4553.0991,0.8558,0.4331,0.3419
6,2882.1589,22505150.0,4743.9596,0.8763,0.4369,0.3445
7,2523.1255,22592940.0,4753.2027,0.7887,0.4423,0.2853
8,1982.0847,10350970.0,3217.2928,0.9489,0.3686,0.3164
9,2532.2683,27094440.0,5205.2318,0.7618,0.4552,0.2538


#  Predict on test / hold-out Sample

In [45]:
predict_model(blend_specific_3);

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Voting Regressor,2537.516,23055810.0,4801.6469,0.8253,0.4227,0.2548


#  Finalize Model for Deployment

In [31]:
save_model(blend_specific_3, 'deployment_14052020')

Transformation Pipeline and Model Succesfully Saved


In [46]:
final_blend_3 = finalize_model(blend_specific_3)

#  Predict on unseen data

We will now use `blend-specific_3` to generate predictions on `data_unseen` which is the variable created at the beginning of the tutorial and contains 10% of the original dataset which was never exposed to PyCaret. 

In [47]:
unseen_predictions = predict_model(blend_specific_3, data=data_unseen, round=0)
unseen_predictions.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges,Label
0,18,female,27.28,3,yes,southeast,18223.4512,20409.0
1,35,male,17.86,1,no,northwest,5116.5004,5140.0
2,59,female,34.8,2,no,southwest,36910.60803,24249.0
3,36,male,33.4,2,yes,southwest,38415.474,36939.0
4,37,female,25.555,1,yes,northeast,20296.86345,21132.0


# Save the experiment

In [48]:
save_experiment('Experiment_insurance_cost 14May2020')

Experiment Succesfully Saved


# Loading saved experiment

In [49]:
saved_experiment = load_experiment('Experiment_insurance_cost 14May2020')



Unnamed: 0,Object
0,Regression Setup Config
1,X_training Set
2,y_training Set
3,X_test Set
4,y_test Set
5,Transformation Pipeline
6,Target Inverse Transformer
7,Compare Models Score Grid
8,Voting Regressor
9,Voting Regressor Score Grid


In [41]:
final_blend_3_loaded = saved_experiment[49]

In [50]:
new_prediction = predict_model(final_blend_3_loaded, data=data_unseen, round = 0)
new_prediction.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges,Label
0,18,female,27.28,3,yes,southeast,18223.4512,19044.0
1,35,male,17.86,1,no,northwest,5116.5004,5257.0
2,59,female,34.8,2,no,southwest,36910.60803,22707.0
3,36,male,33.4,2,yes,southwest,38415.474,37069.0
4,37,female,25.555,1,yes,northeast,20296.86345,21152.0
