<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# PyCaret - AutoML Regression
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/PyCaret/PyCaret_automl_regression.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/open_in_naas.svg"/></a>

**Tags:** #automl #pandas #snippet #regression #dataframe #visualize #pycaret #operations

**Author:** [Minura Punchihewa](https://www.linkedin.com/in/minurapunchihewa/)

## Input

### Import libraries

In [2]:
import pandas as pd

try:
    from pycaret.regression import setup, compare_models, evaluate_model, predict_model, finalize_model, \
         save_model, load_model, create_docker
except:
    !pip install --user pycaret
    from pycaret.regression import setup, compare_models, evaluate_model, predict_model, finalize_model, \
     save_model, load_model, create_docker

### Variables

In [3]:
csv_path = "https://raw.githubusercontent.com/MinuraPunchihewa/pycaret-automl/main/data/wine-quality.csv"
target_column = 'quality'

## Model

### Read the CSV from path

In [4]:
df = pd.read_csv(csv_path)

### View a sample of the data

In [5]:
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


### Setup the dataset

In [6]:
# must be called before executing any other function
# change target column as required
# can configure many types of transformation operations
# by default Missing Value Imputation, One-Hot Encoding and Train-Test Split operations will be performed
# press enter to continue
grid = setup(data=df, target=target_column)

Unnamed: 0,Description,Value
0,session_id,7626
1,Target,quality
2,Original Data,"(1599, 12)"
3,Missing Values,False
4,Numeric Features,11
5,Categorical Features,0
6,Ordinal Features,False
7,High Cardinality Features,False
8,High Cardinality Method,
9,Transformed Train Set,"(1119, 11)"


### Train and compare all supported models

In [7]:
# uses cross-validation
best_model = compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
et,Extra Trees Regressor,0.402,0.3391,0.577,0.4647,0.0903,0.0749,0.154
rf,Random Forest Regressor,0.4374,0.3533,0.5904,0.4372,0.092,0.081,0.21
gbr,Gradient Boosting Regressor,0.4791,0.3816,0.6149,0.3926,0.0953,0.0882,0.069
lightgbm,Light Gradient Boosting Machine,0.4479,0.3808,0.6131,0.3909,0.0956,0.0828,0.095
lr,Linear Regression,0.5029,0.4168,0.6427,0.3369,0.0997,0.0927,0.353
ridge,Ridge Regression,0.503,0.4174,0.6433,0.3356,0.0998,0.0927,0.013
br,Bayesian Ridge,0.5032,0.4178,0.6435,0.3351,0.0998,0.0927,0.008
ada,AdaBoost Regressor,0.5229,0.4175,0.6442,0.3324,0.1,0.0964,0.051
huber,Huber Regressor,0.5087,0.4332,0.6551,0.31,0.1018,0.0942,0.018
lar,Least Angle Regression,0.5182,0.4431,0.6624,0.2928,0.1026,0.0955,0.008


### Report the best model

In [8]:
print(best_model)

ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                    max_depth=None, max_features='auto', max_leaf_nodes=None,
                    max_samples=None, min_impurity_decrease=0.0,
                    min_impurity_split=None, min_samples_leaf=1,
                    min_samples_split=2, min_weight_fraction_leaf=0.0,
                    n_estimators=100, n_jobs=-1, oob_score=False,
                    random_state=7626, verbose=0, warm_start=False)


### Evaluate the model using a number of different plots

In [9]:
# click on the different plot types to exlpore
# some plots may not work depending on the data and the model
evaluate_model(best_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

### Make predictions on new data

In [10]:
# data should be a DataFrame without label
# predict_model(best_model, new_data)

### Finalize model

In [11]:
# trains the model on the entire dataset including the hold-out set
# does not change any parameter of the model
final_model = finalize_model(best_model)

## Output

### Save model as a pickle file

In [12]:
save_model(final_model, 'regression_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[], ml_usecase='regression',
                                       numerical_features=[], target='quality',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_strategy...
                  ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0,
                                      criterion='mse', max_depth=None,
                                      max_features='auto', max_leaf_nodes=None,
                                      max_samples=None,
                                    

### Load saved model from pickle file

In [13]:
model = load_model('regression_model')

Transformation Pipeline and Model Successfully Loaded


### Create Dockerfile for model

In [14]:
# also creates a requirements.txt file for dependencies
create_docker('regression_model')

Writing requirements.txt
Writing Dockerfile
Dockerfile and requirements.txt successfully created.
To build image you have to run --> !docker image build -f "Dockerfile" -t IMAGE_NAME:IMAGE_TAG .
        
