<div style="text-align: center"> <h1>An open-source, low-code machine learning library in Python</h1></div> 

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*Cku5-rqmqSIuhUyFkIAdIA.png" 
        alt="Picture"  
        height="500"
        width="700"
        style="display: block; margin: 0 auto" />

### PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

### In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, Optuna, Hyperopt, Ray, and few more.

### The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise. PyCaret was inspired by the caret library in R programming language.

# ðŸ‘¥ Who should use PyCaret?
### PyCaret is an open source library that anybody can use. In our view the ideal target audience of PyCaret is: <br />

- #### Experienced Data Scientists who want to increase productivity.
- #### Citizen Data Scientists who prefer a low code machine learning solution.
- #### Data Science Professionals who want to build rapid prototypes.
- #### Data Science and Machine Learning students and enthusiasts.

## Why use PyCaret?
- #### It helps in data preprocessing.
- #### It trains multiple models simultaneously and outputs a table comparing the performance of each model by considering a few performance metrics such as precision, recall, f1-score, and so on.
- #### It is easy to analyze and interpret as it requires minimal codes to run.
- #### In a few lines of code, it increases productivity.

> ## PyCaret Implementation on Ad Click Prediction Dataset

In [1]:
# Import pandas
import pandas as pd

# Reading the data
df = pd.read_csv('test.csv')
df.head()

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Male,Clicked on Ad
0,68.95,35,61833.9,256.09,0.0,0
1,80.23,31,68441.85,193.77,1.0,0
2,69.47,26,59785.94,236.5,0.0,0
3,74.15,29,54806.18,245.89,1.0,0
4,68.37,35,73889.99,225.58,0.0,0


> ## Importing pycaret classification module

In [2]:
from pycaret.classification import *

> ## Setting up the Data for preprocessing with some parameters

In [9]:
s = setup(data=df,target='Clicked on Ad',session_id=123,remove_multicollinearity=True,remove_outliers=True)

Unnamed: 0,Description,Value
0,session_id,123
1,Target,Clicked on Ad
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(1000, 6)"
5,Missing Values,False
6,Numeric Features,5
7,Categorical Features,0
8,Ordinal Features,False
9,High Cardinality Features,False


> ## Compare all models for classification

In [10]:
best = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
nb,Naive Bayes,0.9639,0.9914,0.956,0.9684,0.9621,0.9276,0.9279,0.014
rf,Random Forest Classifier,0.9639,0.9865,0.956,0.9688,0.9623,0.9277,0.9279,0.347
qda,Quadratic Discriminant Analysis,0.9579,0.9914,0.9623,0.9512,0.9565,0.9158,0.9162,0.013
et,Extra Trees Classifier,0.9549,0.9877,0.9372,0.9686,0.9524,0.9095,0.9104,0.31
gbc,Gradient Boosting Classifier,0.9534,0.9826,0.9403,0.9632,0.9512,0.9066,0.9076,0.101
ridge,Ridge Classifier,0.9533,0.0,0.9089,0.9933,0.9489,0.9062,0.9096,0.014
lda,Linear Discriminant Analysis,0.9533,0.9914,0.9089,0.9933,0.9489,0.9062,0.9096,0.014
lightgbm,Light Gradient Boosting Machine,0.9504,0.9854,0.9372,0.9591,0.9478,0.9005,0.9011,0.047
ada,Ada Boost Classifier,0.9459,0.9766,0.9434,0.9447,0.9437,0.8917,0.8922,0.096
dt,Decision Tree Classifier,0.9293,0.9296,0.9371,0.9184,0.9273,0.8585,0.8593,0.014


> ## Creating model with best classification algorithm fro the dataset

In [11]:
nb = create_model('nb')

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.9254,0.9857,0.9062,0.9355,0.9206,0.8502,0.8506
1,0.9701,0.9964,0.9688,0.9688,0.9688,0.9402,0.9402
2,0.9851,0.9973,0.9688,1.0,0.9841,0.97,0.9705
3,0.9552,0.9982,0.9688,0.9394,0.9538,0.9104,0.9108
4,0.9697,0.9724,0.9677,0.9677,0.9677,0.9392,0.9392
5,0.9848,0.977,0.9677,1.0,0.9836,0.9695,0.97
6,0.9545,0.9972,0.9375,0.9677,0.9524,0.9089,0.9093
7,1.0,1.0,1.0,1.0,1.0,1.0,1.0
8,0.9394,0.9926,0.9375,0.9375,0.9375,0.8787,0.8787
9,0.9545,0.9972,0.9375,0.9677,0.9524,0.9089,0.9093


> ## Evaluating the model

In [12]:
evaluate_model(nb)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'paramâ€¦

> ## Predicting the test data using the model

In [15]:
predict_model(nb)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Naive Bayes,0.9668,0.986,0.9728,0.9597,0.9662,0.9335,0.9336


Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Male,Clicked on Ad,Label,Score
0,41.669998,36.0,53817.019531,132.550003,0.0,1,1,1.0000
1,70.040001,31.0,74780.742188,183.850006,1.0,0,0,0.9537
2,73.180000,23.0,61526.250000,196.710007,1.0,0,0,0.9942
3,78.599998,46.0,41768.128906,254.589996,1.0,0,0,0.9827
4,71.889999,23.0,61617.980469,172.809998,1.0,0,0,0.9334
...,...,...,...,...,...,...,...,...
296,83.529999,36.0,67686.156250,204.559998,0.0,0,0,0.9980
297,64.879997,42.0,70005.507812,129.800003,1.0,1,1,0.9974
298,78.290001,38.0,57844.960938,252.070007,0.0,0,0,0.9997
299,44.570000,31.0,38349.781250,133.169998,1.0,1,1,1.0000


> ## Predicting the raw score of the labels

In [20]:
predictions = predict_model(nb, data=df,raw_score=True)
predictions.head(10)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Naive Bayes,0.966,0.9906,0.964,0.9679,0.9659,0.932,0.932


Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Male,Clicked on Ad,Label,Score_0,Score_1
0,68.95,35,61833.9,256.09,0.0,0,0,0.9993,0.0007
1,80.23,31,68441.85,193.77,1.0,0,0,0.9966,0.0034
2,69.47,26,59785.94,236.5,0.0,0,0,0.9992,0.0008
3,74.15,29,54806.18,245.89,1.0,0,0,0.9996,0.0004
4,68.37,35,73889.99,225.58,0.0,0,0,0.9962,0.0038
5,59.99,23,59761.56,226.74,1.0,0,0,0.9821,0.0179
6,88.91,33,53852.85,208.36,0.0,0,0,0.998,0.002
7,66.0,48,24593.33,131.76,1.0,1,1,0.0,1.0
8,74.53,30,68862.0,221.51,1.0,0,0,0.9993,0.0007
9,69.88,20,55642.32,183.82,1.0,0,0,0.9434,0.0566


> ## Transforming the model into pipeline and saving it as pickle file

In [21]:
save_model(nb, 'my_best_pipeline')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[],
                                       target='Clicked on Ad',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_...
                 ('fix_perfect', Remove_100(target='Clicked on Ad')),
                 ('clean_names', Clean_Colum_Names()),
                 ('feature_select', 'passthrough'),
                 ('fix_multi',
                  Fix_mul

# Regression

In [21]:
from pycaret.regression import *
df = pd.read_csv('cleanusa.csv')
pd.set_option('display.float_format',lambda x : '%.3f'%x)
df.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,79545.459,5.683,7.009,4,23086.801,1059033.558
1,79248.642,6.003,6.731,3,40173.072,1505890.915
2,61287.067,5.866,8.513,5,36882.159,1058987.988
3,63345.24,7.188,5.587,3,34310.243,1260616.807
4,59982.197,5.041,7.839,4,26354.109,630943.489


In [22]:
r = setup(data=df,target='Price',session_id=124,remove_multicollinearity=True,remove_outliers=True,normalize=True,normalize_method='minmax')

Unnamed: 0,Description,Value
0,session_id,124
1,Target,Price
2,Original Data,"(4943, 6)"
3,Missing Values,False
4,Numeric Features,4
5,Categorical Features,1
6,Ordinal Features,False
7,High Cardinality Features,False
8,High Cardinality Method,
9,Transformed Train Set,"(3287, 9)"


In [23]:
models()

Unnamed: 0_level_0,Name,Reference,Turbo
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lr,Linear Regression,sklearn.linear_model._base.LinearRegression,True
lasso,Lasso Regression,sklearn.linear_model._coordinate_descent.Lasso,True
ridge,Ridge Regression,sklearn.linear_model._ridge.Ridge,True
en,Elastic Net,sklearn.linear_model._coordinate_descent.Elast...,True
lar,Least Angle Regression,sklearn.linear_model._least_angle.Lars,True
llar,Lasso Least Angle Regression,sklearn.linear_model._least_angle.LassoLars,True
omp,Orthogonal Matching Pursuit,sklearn.linear_model._omp.OrthogonalMatchingPu...,True
br,Bayesian Ridge,sklearn.linear_model._bayes.BayesianRidge,True
ard,Automatic Relevance Determination,sklearn.linear_model._bayes.ARDRegression,False
par,Passive Aggressive Regressor,sklearn.linear_model._passive_aggressive.Passi...,True


In [24]:
compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
lr,Linear Regression,81288.2547,10152786227.2,100691.2141,0.9099,0.0978,0.0741,0.014
lasso,Lasso Regression,81290.7836,10151754444.8,100686.0266,0.9099,0.0978,0.0742,0.016
lar,Least Angle Regression,81291.1226,10151834968.8694,100686.4407,0.9099,0.0978,0.0742,0.015
llar,Lasso Least Angle Regression,81284.364,10150397619.0056,100679.1207,0.9099,0.0978,0.0742,0.015
br,Bayesian Ridge,81291.3913,10151822457.8482,100686.3082,0.9099,0.0978,0.0742,0.021
ridge,Ridge Regression,81355.2305,10169067929.6,100768.6586,0.9098,0.098,0.0744,0.013
huber,Huber Regressor,88380.3278,12016800671.6732,109540.2414,0.8936,0.1101,0.0834,0.027
gbr,Gradient Boosting Regressor,87820.6689,12017006657.2405,109531.0231,0.8934,0.1066,0.0811,0.261
lightgbm,Light Gradient Boosting Machine,90206.2711,12525059031.5041,111827.5783,0.8888,0.1081,0.0826,0.084
et,Extra Trees Regressor,92194.518,13157742500.6925,114571.6625,0.8833,0.1126,0.0856,0.593


LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)

In [25]:
lr = create_model('lr')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,77330.8672,9708203008.0,98530.2109,0.9054,0.0913,0.0688
1,84676.4922,10529668096.0,102614.1719,0.9099,0.106,0.0809
2,78545.0625,9009589248.0,94918.8594,0.921,0.0913,0.0721
3,74525.3828,8935982080.0,94530.3203,0.93,0.091,0.0675
4,80544.2578,9879180288.0,99394.0625,0.9174,0.1006,0.0741
5,81336.5,10318883840.0,101581.9062,0.8917,0.0928,0.0708
6,80459.6016,10178726912.0,100889.6797,0.9149,0.0955,0.0729
7,86828.125,11429193728.0,106907.4062,0.9075,0.1054,0.0804
8,83787.1875,10905564160.0,104429.7109,0.8924,0.0996,0.0744
9,84849.0703,10632870912.0,103115.8125,0.9086,0.1044,0.0796


In [26]:
evaluate_model(lr)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'paramâ€¦

In [28]:
predict_model(lr)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Linear Regression,81855.3672,10473385984.0,102339.5625,0.9105,0.1002,0.0756


Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Area Population,Avg. Area Number of Bedrooms_2,Avg. Area Number of Bedrooms_3,Avg. Area Number of Bedrooms_4,Avg. Area Number of Bedrooms_5,Avg. Area Number of Bedrooms_6,Price,Label
0,0.516,0.776,0.572,0.485,0.000,1.000,0.000,0.000,0.000,1588196.250,1562133.250
1,0.245,0.509,0.413,0.673,0.000,1.000,0.000,0.000,0.000,847155.062,994223.375
2,0.623,0.064,0.569,0.402,0.000,0.000,1.000,0.000,0.000,1043281.312,937411.250
3,0.292,0.901,0.627,0.271,0.000,0.000,0.000,0.000,1.000,1164497.000,1233260.375
4,0.305,0.548,0.650,0.361,0.000,0.000,1.000,0.000,0.000,1048350.688,1008523.250
...,...,...,...,...,...,...,...,...,...,...,...
1478,0.251,0.245,0.703,0.651,0.000,0.000,0.000,0.000,1.000,858088.750,932052.375
1479,0.239,0.698,0.533,0.593,0.000,0.000,1.000,0.000,0.000,1237224.875,1190113.250
1480,0.801,0.394,0.399,0.413,1.000,0.000,0.000,0.000,0.000,1327975.250,1380786.750
1481,0.507,0.215,0.267,0.473,0.000,0.000,1.000,0.000,0.000,1084255.625,779375.500
