# PyCaret 
PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of notebook environment.

In [1]:
# install pycaret
!pip install pycaret



# Create Model
Creating a model in any module is as simple as writing create_model. It takes only one parameter i.e. the Model ID as a string. For supervised modules (classification and regression) this function returns a table with k-fold cross validated performance metrics along with the trained model object. For unsupervised module For unsupervised module clustering, it returns performance metrics along with trained model object and for remaining unsupervised modules anomaly detection, natural language processing and association rule mining, it only returns trained model object. The evaluation metrics used are:

* Classification: Accuracy, AUC, Recall, Precision, F1, Kappa, MCC
* Regression: MAE, MSE, RMSE, R2, RMSLE, MAPE

The number of folds can be defined using fold parameter within create_model function. By default, the fold is set to 10. All the metrics are rounded to 4 decimals by default by can be changed using round parameter within create_model. Although there is a separate function to ensemble the trained model, however there is a quick way available to ensemble the model while creating by using ensemble parameter along with method parameter within create_model function.

## Load Library and Dataset

In [4]:
import numpy as np # for scientific computation 
import pandas as pd # for working with data
from sklearn.model_selection import train_test_split # for splitting the data
from sklearn.datasets import load_breast_cancer # dataset that we will be using
from pycaret.classification import * # Importing PyCaret Module

## Dataset  Breast cancer

In [5]:
cancer = load_breast_cancer()
df = pd.DataFrame(np.c_[cancer['data'], cancer['target']],
                  columns= np.append(cancer['feature_names'], ['target']))
df

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,0.07871,1.0950,0.9053,8.589,153.40,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.380,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890,0.0
1,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.01860,0.01340,0.01389,0.003532,24.990,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902,0.0
2,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.006150,0.04006,0.03832,0.02058,0.02250,0.004571,23.570,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758,0.0
3,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,0.09744,0.4956,1.1560,3.445,27.23,0.009110,0.07458,0.05661,0.01867,0.05963,0.009208,14.910,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300,0.0
4,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.011490,0.02461,0.05688,0.01885,0.01756,0.005115,22.540,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,0.05623,1.1760,1.2560,7.673,158.70,0.010300,0.02891,0.05198,0.02454,0.01114,0.004239,25.450,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115,0.0
565,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,0.05533,0.7655,2.4630,5.203,99.04,0.005769,0.02423,0.03950,0.01678,0.01898,0.002498,23.690,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637,0.0
566,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,0.05648,0.4564,1.0750,3.425,48.55,0.005903,0.03731,0.04730,0.01557,0.01318,0.003892,18.980,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820,0.0
567,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,0.07016,0.7260,1.5950,5.772,86.22,0.006522,0.06158,0.07117,0.01664,0.02324,0.006185,25.740,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400,0.0


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

In [7]:
df.target.unique()

array([0., 1.])

## Initializing Setup 

In [6]:
clf1 = setup(data = df, target = 'target')

Unnamed: 0,Description,Value
0,session_id,7117
1,Target,target
2,Target Type,Binary
3,Label Encoded,"0.0: 0, 1.0: 1"
4,Original Data,"(569, 31)"
5,Missing Values,False
6,Numeric Features,30
7,Categorical Features,0
8,Ordinal Features,False
9,High Cardinality Features,False


## Check The Model Library to see all models in PyCaret Module

In [8]:
models()

Unnamed: 0_level_0,Name,Reference,Turbo
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lr,Logistic Regression,sklearn.linear_model._logistic.LogisticRegression,True
knn,K Neighbors Classifier,sklearn.neighbors._classification.KNeighborsCl...,True
nb,Naive Bayes,sklearn.naive_bayes.GaussianNB,True
dt,Decision Tree Classifier,sklearn.tree._classes.DecisionTreeClassifier,True
svm,SVM - Linear Kernel,sklearn.linear_model._stochastic_gradient.SGDC...,True
rbfsvm,SVM - Radial Kernel,sklearn.svm._classes.SVC,False
gpc,Gaussian Process Classifier,sklearn.gaussian_process._gpc.GaussianProcessC...,False
mlp,MLP Classifier,sklearn.neural_network._multilayer_perceptron....,False
ridge,Ridge Classifier,sklearn.linear_model._ridge.RidgeClassifier,True
rf,Random Forest Classifier,sklearn.ensemble._forest.RandomForestClassifier,True


## Train Logistic Regression Model

In [7]:
lr = create_model('lr') #lr is the id of the model

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.925,0.9863,0.9615,0.9259,0.9434,0.8324,0.8337
1,0.95,0.9973,0.9615,0.9615,0.9615,0.8901,0.8901
2,0.9,0.9835,0.9615,0.8929,0.9259,0.7727,0.7778
3,0.95,0.9973,0.9615,0.9615,0.9615,0.8901,0.8901
4,0.9,0.9835,0.8846,0.9583,0.92,0.7872,0.7917
5,0.975,0.9973,1.0,0.9615,0.9804,0.9459,0.9473
6,0.95,0.9973,1.0,0.9259,0.9615,0.8904,0.8958
7,0.975,1.0,1.0,0.9615,0.9804,0.9459,0.9473
8,0.9487,0.9457,1.0,0.9259,0.9615,0.885,0.8909
9,0.9744,0.9857,1.0,0.9615,0.9804,0.9434,0.9449


## Train Random Forest model using 5 fold CV (K-fold Cross Validation)

In [9]:
rf = create_model('rf', fold = 5)

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.9625,0.9976,0.9804,0.9615,0.9709,0.9183,0.9186
1,0.95,0.9936,0.9608,0.9608,0.9608,0.8918,0.8918
2,0.9625,0.9912,0.9804,0.9615,0.9709,0.9183,0.9186
3,0.962,0.9947,0.9608,0.98,0.9703,0.9177,0.918
4,0.9367,0.965,0.9608,0.9423,0.9515,0.8606,0.8609
Mean,0.9547,0.9884,0.9686,0.9612,0.9649,0.9013,0.9016
SD,0.0102,0.0119,0.0096,0.0119,0.0077,0.0228,0.0228


## Train Support Vector Machine (SVM) model without Cross Validation

In [10]:
svm = create_model('svm', cross_validation = False)

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.7953,0.7464,1.0,0.7445,0.8536,0.5368,0.6057


## Train multiple Light Gradient Boosting Machine models with n learning_rate

In [12]:
lgbms = [create_model('lightgbm', learning_rate = i) for i in np.arange(0.1,1,0.1)]

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.925,0.9698,1.0,0.8966,0.9455,0.8266,0.8393
1,1.0,1.0,1.0,1.0,1.0,1.0,1.0
2,0.975,0.9973,1.0,0.963,0.9811,0.9441,0.9456
3,0.925,0.9945,0.9615,0.9259,0.9434,0.8324,0.8337
4,0.975,0.9945,0.9615,1.0,0.9804,0.9459,0.9473
5,0.975,1.0,1.0,0.9615,0.9804,0.9459,0.9473
6,0.975,1.0,1.0,0.9615,0.9804,0.9459,0.9473
7,0.975,0.9973,1.0,0.9615,0.9804,0.9459,0.9473
8,0.9487,0.9714,1.0,0.9259,0.9615,0.885,0.8909
9,0.9487,0.9857,0.96,0.96,0.96,0.8886,0.8886


## Train Custom Model (GPLearn Models)
Genetic Programming (GP) can be used to perform a very wide variety of tasks, gplearn is purposefully constrained to solving symbolic regression problems.
Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship. It begins by building a population of naive random formulas to represent a relationship between known independent variables and their dependent variable targets to predict new data. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations.
To use models from gplearn you will have to first install it:

In [15]:
# install gplearn
!pip install gplearn

Collecting gplearn
  Downloading gplearn-0.4.1-py3-none-any.whl (41 kB)
[?25l[K     |████████                        | 10 kB 29.7 MB/s eta 0:00:01[K     |███████████████▉                | 20 kB 31.2 MB/s eta 0:00:01[K     |███████████████████████▊        | 30 kB 12.0 MB/s eta 0:00:01[K     |███████████████████████████████▊| 40 kB 9.4 MB/s eta 0:00:01[K     |████████████████████████████████| 41 kB 316 kB/s 
Installing collected packages: gplearn
Successfully installed gplearn-0.4.1


In [17]:
from gplearn.genetic import SymbolicClassifier
symclf = SymbolicClassifier()
sc = create_model(symclf)

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.95,1.0,1.0,0.9286,0.963,0.8864,0.8921
1,0.925,0.9835,0.9231,0.96,0.9412,0.8378,0.8391
2,0.925,0.9615,0.9615,0.9259,0.9434,0.8324,0.8337
3,0.95,0.989,0.9615,0.9615,0.9615,0.8901,0.8901
4,0.975,0.9835,1.0,0.963,0.9811,0.9441,0.9456
5,0.975,0.9973,0.96,1.0,0.9796,0.9474,0.9487
6,0.9,0.9733,0.96,0.8889,0.9231,0.7808,0.7856
7,0.95,0.992,1.0,0.9259,0.9615,0.8904,0.8958
8,0.9231,0.91,0.96,0.9231,0.9412,0.8302,0.8315
9,0.9744,0.9743,1.0,0.9615,0.9804,0.9434,0.9449
