#  <span style="color:orange">Regression Tutorial (REG102) - Level Intermediate</span>

**Created using: PyCaret 2.2** <br />
**Date Updated: November 25, 2020**

# 1.0 Tutorial Objective
Welcome to the regression tutorial **(REG102)** - Level Intermediate. This tutorial assumes that you have completed __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__. If you haven't used PyCaret before and this is your first tutorial, we strongly recommend you to go back and progress through the beginner tutorial to understand the basics of working in PyCaret.

In this tutorial we will use the `pycaret.regression` module to learn:

* **Normalization:**  How to normalize and scale the dataset
* **Transformation:**  How to apply transformations that make the data linear and approximately normal
* **Target Transformation:**  How to apply transformations to the target variable
* **Combine Rare Levels:**  How to combine rare levels in categorical features
* **Bin Numeric Variables:**  How to bin numeric variables and transform numeric features into categorical ones using 'sturges' rule
* **Model Ensembling and Stacking:**  How to boost model performance using several ensembling techniques such as Bagging, Boosting, Voting and Generalized Stacking.
* **Experiment Logging:** How to log experiments in PyCaret using MLFlow backend

Read Time : Approx 60 Minutes


# 1.1 Installing PyCaret
If you haven't installed PyCaret yet. Please follow the link to __[Beginner's Tutorial](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ for instructions on how to install pycaret.

# 1.2 Pre-Requisites
- Python 3.6 or greater
- PyCaret 2.0 or greater
- Internet connection to load data from pycaret's repository
- Completion of Regression Tutorial (REG101) - Level Beginner

# 1.3 For Google colab users:
If you are running this notebook on Google colab, run the following code at top of your notebook to display interactive visuals.<br/>
<br/>
`from pycaret.utils import enable_colab` <br/>
`enable_colab()`

# 1.4 See also:
- __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__
- __[Regression Tutorial (REG103) - Level Expert](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Expert%20-%20REG103.ipynb)__

# 2.0 Brief overview of techniques covered in this tutorial
Before we into the practical execution of the techniques mentioned above in Section 1, it is important to understand what are these techniques are and when to use them. More often than not most of these techniques will help linear and parametric algorithms, however it is not surprising to also see performance gains in tree-based models. The Below explanations are only brief and we recommend that you do extra reading to dive deeper and get a more thorough understanding of these techniques.

- **Normalization:** Normalization / Scaling (often used interchangeably with standardization) is used to transform the actual values of numeric variables in a way that provides helpful properties for machine learning. Many algorithms such as Linear Regression, Support Vector Machine and K Nearest Neighbors assume that all features are centered around zero and have variances that are at the same level of order. If a particular feature in a dataset has a variance that is larger in order of magnitude than other features, the model may not understand all features correctly and could perform poorly. __[Read more](https://sebastianraschka.com/Articles/2014_about_feature_scaling.html#z-score-standardization-or-min-max-scaling)__ <br/>
<br/>
- **Transformation:** While normalization transforms the range of data to remove the impact of magnitude in variance, transformation is a more radical technique as it changes the shape of the distribution so that transformed data can be represented by a normal or approximate normal distirbution. In general, you should transform the data if using algorithms that assume normality or a gaussian distribution. Examples of such models are Linear Regression, Lasso Regression and Ridge Regression. __[Read more](https://en.wikipedia.org/wiki/Power_transform)__<br/>
<br/>
- **Target Transformation:** This is similar to the `transformation` technique explained above with the exception that this is only applied to the target variable. __[Read more](https://scikit-learn.org/stable/auto_examples/compose/plot_transformed_target.html)__ to understand the effects of transforming the target variable in regression.<br/>
<br/>
- **Combine Rare Levels:** Sometimes categorical features have levels that are insignificant in the frequency distribution. As such, they may introduce noise into the dataset due to a limited sample size for learning. One way to deal with rare levels in categorical features is to combine them into a new class. <br/>
<br/>
- **Bin Numeric Variables:** Binning or discretization is the process of transforming numerical variables into categorical features. An example would be `Carat Weight` in this experiment. It is a continious distribution of numeric values that can be discretized into intervals. Binning may improve the accuracy of a predictive model by reducing the noise or non-linearity in the data. PyCaret automatically determines the number and size of bins using Sturges rule.  __[Read more](https://www.vosesoftware.com/riskwiki/Sturgesrule.php)__<br/>
<br/>
- **Model Ensembling and Stacking:** Ensemble modeling is a process where multiple diverse models are created to predict an outcome. This is achieved either by using many different modeling algorithms or using different samples of training data sets. The ensemble model then aggregates the predictions of each base model resulting in one final prediction for the unseen data. The motivation for using ensemble models is to reduce the generalization error of the prediction. As long as the base models are diverse and independent, the prediction error of the model decreases when the ensemble approach is used. The two most common methods in ensemble learning are `Bagging` and `Boosting`. Stacking is also a type of ensemble learning where predictions from multiple models are used as input features for a meta model that predicts the final outcome. __[Read more](https://blog.statsbot.co/ensemble-learning-d1dcd548e936)__<br/>
<br/>
- **Tuning Hyperparameters of Ensemblers:** Similar to hyperparameter tuning for a single machine learning model, we will also learn how to tune hyperparameters for an ensemble model.

# 3.0 Dataset for the Tutorial

For this tutorial we will be using the same dataset that was used in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__.

# Dataset Acknowledgements:
This case was prepared by Greg Mills (MBA ’07) under the supervision of Phillip E. Pfeifer, Alumni Research Professor of Business Administration. Copyright (c) 2007 by the University of Virginia Darden School Foundation, Charlottesville, VA. All rights reserved.

The original dataset and description can be __[found here.](https://github.com/DardenDSC/sarah-gets-a-diamond)__ 

# 4.0 Getting the Data

You can download the data from the original source __[found here](https://github.com/DardenDSC/sarah-gets-a-diamond)__ and load it using the pandas read_csv function or you can use PyCaret's data respository to load the data using the get_data function (This will require internet connection).

In [1]:
from pycaret.datasets import get_data
dataset = get_data('diamond')

Unnamed: 0,Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price
0,1.1,Ideal,H,SI1,VG,EX,GIA,5169
1,0.83,Ideal,H,VS1,ID,ID,AGSL,3470
2,0.85,Ideal,H,SI1,EX,EX,GIA,3183
3,0.91,Ideal,E,SI1,VG,VG,GIA,4370
4,0.83,Ideal,G,SI1,EX,EX,GIA,3171


Notice that when the `profile` parameter is to `True`, it displays a data profile for exploratory data analysis. Several pre-processing steps as discussed in section 2 above will be performed in this experiment based on this analysis. Let's summarize how the profile has helped make critical pre-processing choices with the data.

- **Missing Values:** There are no missing values in the data. However, we still need imputers in our pipeline just in case the new unseen data has missing values (not applicable in this case). When you execute the `setup()` function, imputers are created and stored in the pipeline automatically. By default, it uses a mean imputer for numeric values and a constant imputer for categorical. This can be changed using the `numeric_imputation` and `categorical_imputation` parameters in `setup()`. <br/>
<br/>
- **Combine Rare Levels:** Notice the distribution of the `Clarity` feature in the dataset. It has 7 distinct classes of which `FL` only appears 4 times. Similarly in the `Cut` feature, the `Fair` level only appears `2.1%` of the time in the training dataset. We will use the `combine_rare_categories` parameter in the setup to combine the rare levels. <br/>
<br/>
- **Data Scale / Range:** Notice how the scale / range of `Carat Weight` is significantly different than the `Price` variable. Carat Weight ranges from between 0.75 to 2.91 while Price ranges from 2,184 all the way up to 101,561. We will deal with this problem by using the `normalize` parameter in setup. <br/>
<br/>
- **Target Transformation:** The target variable `Price` is not normally distributed. It is right skewed with high kurtosis. We will use the `transform_target` parameter in the setup to apply a linear transformation on the target variable. `<br/>
<br/>
- **Bin Numeric Features:** `Carat Weight` is the only numeric feature. When looking at its histogram, the distribution seems to have natural breaks. Binning will convert it into a categorical feature and create several levels using sturges' rule. This will help remove the noise for linear algorithms. <br/>
<br/>

In [2]:
#check the shape of data
dataset.shape

(6000, 8)

In order to demonstrate the `predict_model()` function on unseen data, a sample of 600 has been withheld from the original dataset to be used for predictions. This should not be confused with a train/test split as this particular split is performed to simulate a real life scenario. Another way to think about this is that these 600 records were not available at the time when the machine learning experiment was performed.

In [3]:
data = dataset.sample(frac=0.9, random_state=786)
data_unseen = dataset.drop(data.index)

data.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions ' + str(data_unseen.shape))

Data for Modeling: (5400, 8)
Unseen Data For Predictions (600, 8)


# 5.0 Setting up Environment in PyCaret

In the previous tutorial __[Regression (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned how to initialize the environment in pycaret using `setup()`. No additional parameters were passed in our last example as we did not perform any pre-processing steps (other than those that are imperative for machine learning experiments which were performed automatically by PyCaret). In this example we will take it to the next level by customizing the pre-processing pipeline using `setup()`. Let's look at how to implement all the steps discussed in section 4 above.

In [4]:
from pycaret.regression import *

In [6]:
exp_reg102 = setup(data = data, target = 'Price', session_id=123,
                  normalize = True, transformation = True, transform_target = True, 
                  #combine_rare_levels = True, rare_level_threshold = 0.05,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95, 
                  bin_numeric_features = ['Carat Weight'],
                  log_experiment = True, experiment_name = 'diamond1') 

Unnamed: 0,Description,Value
0,Session id,123
1,Target,Price
2,Target type,Regression
3,Original data shape,"(5400, 8)"
4,Transformed data shape,"(5400, 28)"
5,Transformed train set shape,"(3779, 28)"
6,Transformed test set shape,"(1621, 28)"
7,Ordinal features,1
8,Numeric features,1
9,Categorical features,6


Note that this is the same setup grid that was shown in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__. The only difference here is the customization parameters that were passed to `setup()` are now set to `True`. Also notice that the `session_id` is the same as the one used in the beginner tutorial, which means that the effect of randomization is completely isolated. Any improvements we see in this experiment are solely due to the pre-processing steps taken in `setup()` or any other modeling techniques used in later sections of this tutorial.

# 6.0 Comparing All Models

Similar to __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we will also begin this tutorial with `compare_models()`. We will then compare the below results with the last experiment.

In [7]:
top3 = compare_models(exclude = ['ransac'], n_select = 3)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
gbr,Gradient Boosting Regressor,1707.9544,5903344.1271,2417.2365,0.9413,0.2225,0.1887,0.537
lightgbm,Light Gradient Boosting Machine,1660.9066,6358156.8678,2495.2445,0.9377,0.2235,0.1842,0.479
xgboost,Extreme Gradient Boosting,1732.9093,6998796.7042,2619.1033,0.9308,0.2263,0.1874,0.379
rf,Random Forest Regressor,1741.3787,7137302.0316,2652.7687,0.9301,0.2402,0.1914,0.718
et,Extra Trees Regressor,1899.5518,8861962.0801,2964.1057,0.9123,0.2605,0.2042,0.907
dt,Decision Tree Regressor,1917.2621,9348381.9398,3041.8356,0.9078,0.2611,0.2045,0.296
huber,Huber Regressor,2459.0362,13448255.5674,3652.8315,0.8665,0.2773,0.245,0.453
ridge,Ridge Regression,2525.026,13756677.4502,3694.629,0.8634,0.2745,0.2399,0.442
br,Bayesian Ridge,2528.8946,13826571.1684,3703.9075,0.8627,0.2745,0.2399,0.502
lar,Least Angle Regression,2531.1665,13967491.9759,3720.5131,0.8616,0.2758,0.2402,0.23


Notice that we have used `n_select` parameter within `compare_models`. In last tutorial you have seen that compare_models by default returns the best performing model (single model based on default sort order). However you can use `n_select` parameter to return top N models. In this example `compare_models` has returned Top 3 models.

In [8]:
type(top3)

list

In [9]:
print(top3)

[GradientBoostingRegressor(random_state=123), LGBMRegressor(random_state=123), XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
             colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
             early_stopping_rounds=None, enable_categorical=False,
             eval_metric=None, feature_types=None, gamma=0, gpu_id=-1,
             grow_policy='depthwise', importance_type=None,
             interaction_constraints='', learning_rate=0.300000012, max_bin=256,
             max_cat_threshold=64, max_cat_to_onehot=4, max_delta_step=0,
             max_depth=6, max_leaves=0, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=-1,
             num_parallel_tree=1, predictor='auto', random_state=123, ...)]


For the purpose of comparison we will use the `RMSLE` score. Notice how drastically a few of the algorithms have improved after we performed a few pre-processing steps in `setup()`. 
- Linear Regression RMSLE improved from `0.6690` to `0.0973`
- Ridge Regression RMSLE improved from `0.6689` to `0.0971`
- Huber Regression RMSLE improved from `0.4333` to `0.0972`

To see results for all of the models from the previous tutorial refer to Section 7 in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__.

# 7.0 Create a Model

In the previous tutorial __[Regression (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned how to create a model using the `create_model()` function. Now we will learn about a few other parameters that may come in handy. In this section, we will create all models using 5 fold cross validation. Notice how the `fold` parameter is passed inside `create_model()` to achieve this.

# 7.1 Create Model (with 5 Fold CV)

In [10]:
dt = create_model('dt', fold = 5)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1928.1512,10249478.4341,3201.4807,0.9075,0.2616,0.2009
1,1967.5722,9382421.9386,3063.0739,0.906,0.2607,0.2086
2,1950.6633,9201605.3151,3033.4148,0.9068,0.2626,0.2154
3,1551.3049,6042836.8337,2458.2182,0.9349,0.2205,0.1624
4,2076.3433,10966783.5924,3311.6134,0.8941,0.281,0.2219
Mean,1894.807,9168625.2228,3013.5602,0.9099,0.2573,0.2018
Std,179.1511,1686355.0612,295.0933,0.0135,0.0199,0.0209


# 7.2 Create Model (Metrics rounded to 2 decimals points)

In [11]:
rf = create_model('rf', round = 2)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1752.07,6449115.36,2539.51,0.93,0.25,0.19
1,1840.63,10718266.21,3273.88,0.92,0.24,0.19
2,1783.52,6505327.18,2550.55,0.94,0.25,0.2
3,1720.01,5945124.73,2438.26,0.93,0.23,0.18
4,1720.48,6022082.65,2453.99,0.92,0.24,0.2
5,1827.36,8641771.67,2939.69,0.93,0.23,0.19
6,1830.87,8040154.88,2835.52,0.92,0.24,0.19
7,1360.06,4271736.19,2066.82,0.95,0.21,0.16
8,1738.12,6622931.78,2573.51,0.93,0.24,0.19
9,1840.68,8156509.67,2855.96,0.92,0.27,0.22


Notice how passing the `round` parameter inside `create_model()` has rounded the evaluation metrics to 2 decimals.

# 7.3 Create Model (KNN)

In [12]:
knn = create_model('knn')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,4096.8206,45400257.2877,6737.9713,0.5016,0.4575,0.34
1,4692.9487,79604690.3771,8922.1461,0.3899,0.5053,0.3763
2,3919.8681,56469332.3409,7514.6079,0.4821,0.455,0.3292
3,4140.9397,51540049.0058,7179.1399,0.4314,0.4902,0.3348
4,3617.9247,41613627.8755,6450.8626,0.481,0.4337,0.3064
5,3752.7984,54018481.146,7349.7266,0.539,0.4278,0.3217
6,4090.9154,54842846.7726,7405.5956,0.4848,0.447,0.3138
7,3849.6441,45791279.9598,6766.9254,0.4209,0.4881,0.3255
8,3985.4358,49882782.2179,7062.7744,0.5101,0.4404,0.306
9,3888.2621,47645336.0906,6902.5601,0.5463,0.4549,0.3611


In [13]:
print(knn)

KNeighborsRegressor(n_jobs=-1)


# 8.0 Tune a Model

In the previous tutorial __[Regression (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned how to automatically tune the hyperparameters of a model using pre-defined grids. Here we will introduce the the `n_iter` parameter in `tune_model()`. `n_iter` is the number of iterations within a random grid search. For every iteration, the model randomly selects one value from a pre-defined grid of hyperparameters. By default, the parameter is set to `10` which means there would be a maximum of 10 iterations to find the best value for hyperparameters. Increasing the value may improve the performance but will also increase the training time. See the example below:

In [14]:
tuned_knn = tune_model(knn)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,2927.4439,30099480.7429,5486.2994,0.6695,0.3585,0.2484
1,3424.3837,47003436.8814,6855.9053,0.6398,0.3863,0.2631
2,3112.2023,42682256.6933,6533.1659,0.6085,0.3666,0.2612
3,3226.5194,34550553.6184,5877.9719,0.6188,0.3647,0.2468
4,2721.4621,26157896.4256,5114.4791,0.6737,0.3351,0.2395
5,2877.8925,41721821.9671,6459.2431,0.6439,0.3211,0.2322
6,3193.3727,48188698.9766,6941.808,0.5473,0.3664,0.2365
7,2641.5358,29258853.9933,5409.1454,0.63,0.3633,0.2233
8,3174.5941,35647145.5891,5970.5231,0.6499,0.3599,0.2469
9,3061.9084,34524306.3654,5875.7388,0.6713,0.3662,0.2697


Fitting 10 folds for each of 10 candidates, totalling 100 fits


In [15]:
tuned_knn2 = tune_model(knn, n_iter = 50)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,2962.6263,28183280.9659,5308.7928,0.6906,0.3603,0.2569
1,3534.2239,48705639.1999,6978.9426,0.6267,0.4141,0.2799
2,3063.7516,42889210.0893,6548.9854,0.6066,0.3769,0.2614
3,3182.445,35078591.6057,5922.7183,0.613,0.3776,0.2463
4,2769.7818,26753943.2658,5172.4214,0.6663,0.3482,0.249
5,2787.3481,37614195.3754,6133.0413,0.679,0.3247,0.2342
6,3187.845,44254879.554,6652.4341,0.5842,0.3676,0.245
7,2782.7807,31600957.9954,5621.4729,0.6004,0.3922,0.2379
8,3087.2763,33972247.1504,5828.5716,0.6664,0.3609,0.247
9,3000.9526,29751027.206,5454.4502,0.7167,0.3671,0.2801


Fitting 10 folds for each of 50 candidates, totalling 500 fits


Notice how two tuned K Nearest Neighbors were created based on the `n_iter` parameter. In `tuned_knn`, the `n_iter` parameter is left to the default value and resulted in R2 of `0.6504`. In `tuned_knn2`, the `n_iter` parameter was set to `50` and the R2 improved to `0.6689`. Observe the differences between the hyperparameters of `tuned_knn` and `tuned_knn2` below:

In [16]:
plot_model(tuned_knn, plot = 'parameter')

Unnamed: 0,Parameters
algorithm,auto
leaf_size,30
metric,manhattan
metric_params,
n_jobs,-1
n_neighbors,13
p,2
weights,distance


In [17]:
plot_model(tuned_knn2, plot = 'parameter')

Unnamed: 0,Parameters
algorithm,auto
leaf_size,30
metric,manhattan
metric_params,
n_jobs,-1
n_neighbors,7
p,2
weights,distance


# 9.0 Ensemble a Model

Ensembling is a common machine learning technique used to improve the performance of models (mostly tree based). There are various techniques for ensembling that we will cover in this section. These include Bagging and Boosting __[(Read More)](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205)__. We will use the `ensemble_model()` function in PyCaret which ensembles the trained base estimators using the method defined in the `method` parameter.

In [18]:
# lets create a simple dt
dt = create_model('dt')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1924.4656,8736131.7449,2955.6948,0.9041,0.2668,0.2041
1,1980.1004,12538181.9475,3540.9295,0.9039,0.2646,0.2041
2,1912.0015,8982930.8341,2997.1538,0.9176,0.2651,0.2165
3,1923.16,8364765.4873,2892.1904,0.9077,0.2507,0.1954
4,1945.9234,8658428.0843,2942.5207,0.892,0.2634,0.2157
5,2020.7376,11914427.8011,3451.7282,0.8983,0.2598,0.2096
6,1939.6026,8723092.9019,2953.4883,0.918,0.2619,0.1962
7,1485.4546,5583419.3599,2362.926,0.9294,0.2226,0.1659
8,1997.2631,9908416.9058,3147.7638,0.9027,0.2708,0.2056
9,2043.9121,10074024.3315,3173.9604,0.9041,0.2857,0.2317


# 9.1 Bagging

In [19]:
bagged_dt = ensemble_model(dt)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1767.1462,6737781.6593,2595.7237,0.926,0.2513,0.1948
1,1829.8755,10975942.8633,3312.9961,0.9159,0.2415,0.191
2,1830.6263,8073105.9527,2841.3212,0.926,0.2503,0.2041
3,1706.5973,5818551.2544,2412.1673,0.9358,0.2293,0.1815
4,1754.0543,6420900.6174,2533.9496,0.9199,0.2487,0.2029
5,1875.1184,9842995.2813,3137.3548,0.916,0.24,0.1967
6,1904.4099,9289012.891,3047.7882,0.9127,0.2498,0.1926
7,1404.7415,4677902.9585,2162.846,0.9408,0.2103,0.1588
8,1769.3131,6349131.1466,2519.7482,0.9376,0.247,0.1947
9,1944.2884,9343070.0321,3056.6436,0.911,0.2709,0.2205


In [20]:
# check the parameter of bagged_dt
print(bagged_dt)

BaggingRegressor(base_estimator=DecisionTreeRegressor(random_state=123),
                 random_state=123)


Notice how ensembling has improved the `RMSLE` from `0.1082` to `0.0938`. In the above example we have used the default parameters of `ensemble_model()` which uses the `Bagging` method. Let's try `Boosting` by changing the `method` parameter in `ensemble_model()`. See example below: 

# 9.2 Boosting

In [21]:
boosted_dt = ensemble_model(dt, method = 'Boosting')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1881.0771,7215903.09,2686.2433,0.9208,0.2699,0.2069
1,1990.8388,12728084.804,3567.6442,0.9025,0.2593,0.2019
2,1857.8921,6676250.8704,2583.8442,0.9388,0.2564,0.2103
3,1824.9308,6530790.3589,2555.5411,0.9279,0.2488,0.1935
4,1910.9129,7601374.4146,2757.059,0.9052,0.2709,0.2176
5,1889.2235,9030032.3369,3005.0012,0.9229,0.2506,0.2066
6,1919.9484,8871738.6963,2978.5464,0.9167,0.2549,0.1927
7,1511.9313,5357740.4775,2314.6793,0.9322,0.2268,0.1725
8,1777.2217,6350058.8393,2519.9323,0.9376,0.2614,0.1961
9,1941.6839,8829752.086,2971.4899,0.9159,0.2768,0.2207


Notice how easy it is to ensemble models in PyCaret. By simply changing the `method` parameter you can do bagging or boosting which would otherwise have taken multiple lines of code. Note that `ensemble_model()` will by default build `10` estimators. This can be changed using the `n_estimators` parameter. Increasing the number of estimators can sometimes improve results. See an example below:

In [22]:
bagged_dt2 = ensemble_model(dt, n_estimators=50)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1768.6334,6770978.3496,2602.1104,0.9257,0.2484,0.1941
1,1828.7557,10694621.1337,3270.2632,0.918,0.2426,0.1924
2,1790.8172,6550480.9886,2559.3907,0.9399,0.248,0.2049
3,1718.1859,5859451.483,2420.6304,0.9354,0.2277,0.1814
4,1731.322,6241468.6932,2498.2932,0.9222,0.2448,0.1999
5,1831.2207,8651946.9435,2941.4192,0.9262,0.2364,0.1958
6,1828.2129,8151819.5797,2855.1392,0.9234,0.2419,0.1865
7,1357.2761,4277227.2823,2068.1459,0.9459,0.2045,0.1552
8,1728.8104,6498070.4056,2549.1313,0.9362,0.2423,0.19
9,1852.2415,8135775.5017,2852.3281,0.9225,0.2677,0.2177


Notice how increasing the n_estimators parameter has improved the result. The bagged_dt model with the default `10` estimators resulted in a RMSLE of `0.0996` whereas in bagged_dt2 where `n_estimators = 50` the RMSLE improved to `0.0911`.

# 9.3 Blending

Blending is another common technique for ensembling that can be used in PyCaret. It creates multiple models and then averages the individual predictions to form a final prediction. Let's see an example below:

In [23]:
# train individual models to blend
lightgbm = create_model('lightgbm', verbose = False)
dt = create_model('dt', verbose = False)
lr = create_model('lr', verbose = False)

In [24]:
# blend individual models
blender = blend_models(estimator_list = [lightgbm, dt, lr])

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1792.3915,5612874.7665,2369.1506,0.9384,0.2292,0.1931
1,2060.5845,11431919.891,3381.1122,0.9124,0.234,0.2008
2,1833.7757,5767469.6529,2401.5557,0.9471,0.2392,0.209
3,1807.0819,5971875.5862,2443.7421,0.9341,0.2229,0.1862
4,1852.8244,5981259.2186,2445.6613,0.9254,0.2376,0.207
5,1938.964,8814640.4457,2968.946,0.9248,0.234,0.2028
6,1991.7775,8166767.6234,2857.7557,0.9233,0.2353,0.1987
7,1447.7937,4965029.0636,2228.2345,0.9372,0.2016,0.1587
8,1900.9482,7290265.3818,2700.0491,0.9284,0.2353,0.1961
9,1842.1676,6782325.0421,2604.2897,0.9354,0.2471,0.2136


In [25]:
# blend top3 models from compare_models
blender_top3 = blend_models(top3)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1635.6224,4741904.2455,2177.5914,0.9479,0.2199,0.1823
1,1825.6666,8213653.8017,2865.9473,0.9371,0.2241,0.1901
2,1706.3179,5390984.2434,2321.8493,0.9506,0.2382,0.202
3,1644.3962,4576692.1413,2139.3205,0.9495,0.209,0.1756
4,1716.49,6931055.2129,2632.6897,0.9136,0.2305,0.1956
5,1690.2676,5330966.8283,2308.8887,0.9545,0.2171,0.1867
6,1800.451,7509022.2552,2740.2595,0.9295,0.2243,0.1859
7,1252.6605,3282073.944,1811.6495,0.9585,0.1884,0.1478
8,1677.2267,6130810.4679,2476.0473,0.9398,0.2178,0.1794
9,1724.6525,6545587.1225,2558.4345,0.9377,0.2387,0.2027


In [26]:
print(blender_top3.estimators_)

[GradientBoostingRegressor(random_state=123), LGBMRegressor(random_state=123), XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
             colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
             early_stopping_rounds=None, enable_categorical=False,
             eval_metric=None, feature_types=None, gamma=0, gpu_id=-1,
             grow_policy='depthwise', importance_type=None,
             interaction_constraints='', learning_rate=0.300000012, max_bin=256,
             max_cat_threshold=64, max_cat_to_onehot=4, max_delta_step=0,
             max_depth=6, max_leaves=0, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=-1,
             num_parallel_tree=1, predictor='auto', random_state=123, ...)]


Now that we have created a `VotingRegressor` using the `blend_models()` function. The model returned by the `blend_models` function is just like any other model that you would create using `create_model()` or `tune_model()`. You can use this model for predictions on unseen data using `predict_model()` in the same way you would for any other model.

# 9.4 Stacking

Stacking is another popular technique for ensembling but is less commonly implemented due to practical difficulties. Stacking is an ensemble learning technique that combines multiple models via a meta-model. Another way to think about stacking is that multiple models are trained to predict the outcome and a meta-model is created that uses the predictions from those models as an input along with the original features. The implementation of `stack_models()` is based on Wolpert, D. H. (1992b). Stacked generalization __[(Read More)](https://www.sciencedirect.com/science/article/abs/pii/S0893608005800231)__. 

Let's see an example below using the top 3 models we have obtained from `compare_models`:

In [27]:
stacker = stack_models(top3)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1632.3619,4524599.1384,2127.1105,0.9503,0.2195,0.1834
1,1790.7553,6451750.6996,2540.0297,0.9506,0.2225,0.1912
2,1728.5853,5658584.3262,2378.7779,0.9481,0.2375,0.2033
3,1657.0831,4694787.2553,2166.7458,0.9482,0.2111,0.1781
4,1749.1464,7033193.28,2652.0168,0.9123,0.2306,0.1978
5,6458.1397,8661019495.693,93064.5985,-72.9195,0.2664,0.2378
6,1806.6697,7240286.5919,2690.7781,0.932,0.226,0.1895
7,1569.2824,6031239.4375,2455.8582,0.9237,0.2064,0.1654
8,1734.9555,6543405.7723,2558.0082,0.9357,0.2195,0.1829
9,1693.9885,5443240.0594,2333.0752,0.9482,0.2333,0.202


By default, the meta model (final model to generate predictions) is Linear Regression. The meta model can be changed using the `meta_model` parameter. See an example below:

In [28]:
xgboost = create_model('xgboost')
stacker2 = stack_models(top3, meta_model = xgboost)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1673.7116,5319233.6757,2306.3464,0.9416,0.225,0.1834
1,1916.2224,9797890.6746,3130.1583,0.9249,0.2313,0.1949
2,1782.3041,6372413.3866,2524.364,0.9416,0.2444,0.2048
3,1674.4973,5021679.5404,2240.9104,0.9446,0.2122,0.1756
4,1803.2105,9087803.8697,3014.5985,0.8867,0.2384,0.1991
5,1835.9699,8405137.8449,2899.1616,0.9283,0.2218,0.1904
6,1888.8559,9015611.6168,3002.6008,0.9153,0.2275,0.1865
7,1304.4661,3674818.3166,1916.9816,0.9535,0.1918,0.1512
8,1708.6405,6795447.2166,2606.8079,0.9333,0.2237,0.1813
9,1741.2143,6497930.8996,2549.1039,0.9381,0.2474,0.207


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1812.7926,6192806.5037,2488.535,0.932,0.2442,0.1942
1,1951.9821,13692958.9656,3700.3998,0.8951,0.2413,0.1948
2,1747.1343,5962745.1181,2441.8733,0.9453,0.2521,0.2079
3,1660.9236,5018632.3193,2240.2304,0.9446,0.2281,0.1841
4,1779.5969,6877323.0203,2622.4651,0.9142,0.2468,0.2037
5,1982.3381,19802716.3548,4450.0243,0.831,0.2353,0.1991
6,1808.8721,7240357.3288,2690.7912,0.932,0.2337,0.1891
7,1336.8906,4156648.6308,2038.7861,0.9474,0.1982,0.1503
8,1630.4806,5294020.9249,2300.8739,0.948,0.2265,0.1812
9,1809.3773,7746880.3306,2783.3218,0.9262,0.2565,0.2121


Before we wrap up this section, there is another parameter in `stack_models()` that we haven't seen yet called `restack`. This parameter controls the ability to expose the raw data to the meta model. When set to `True`, it exposes the raw data to the meta model along with all the predictions of the base level models. By default it is set to `True`. See the example below with the `restack` parameter changed to `False`.

# 10.0 Experiment Logging

PyCaret 2.0 embeds MLflow Tracking component as a backend API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. To log your experiments in pycaret simply use log_experiment and experiment_name parameter in the setup function, as we did in this example.

You can start the UI on `localhost:5000`. Simply initiate the MLFlow server from command line or from notebook. See example below:

In [29]:
# to start the MLFlow server from notebook:
# !mlflow ui

# Open localhost:5000 on your browser (below is example of how UI looks like)
![title](https://i2.wp.com/pycaret.org/wp-content/uploads/2020/07/classification_mlflow_ui.png?resize=1080%2C508&ssl=1)

# 11.0 Wrap-up / Next Steps?

We have covered a lot of new concepts in this tutorial. Most importantly we have seen how to use exploratory data analysis to customize a pipeline in `setup()` which has improved the results considerably when compared to what we saw earlier in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__. WWe have also learned how to perform and tune ensembling in PyCaret.

There are however a few more advanced things to cover in `pycaret.regression` which include interpretating more complex tree based models using shapley values, advanced ensembling techniques such as multiple layer stacknet and more pre-processing pipeline methods. We will cover all of this in our next and final tutorial in the `pycaret.regression` series. 

See you at the next tutorial. Follow the link to __[Regression Tutorial (REG103) - Level Expert](https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Expert%20-%20REG103.ipynb)__