# **Introduction** 

PyCaret is an open-source machine learning library that helps automate the entire process of training a machine learning model. From model selection to training and testing, PyCaret is a great tool that can be used in machine learning. In this article, I will introduce you to a machine learning tutorial on PyCaret using Python.

**PyCaret**

PyCaret is an open-source machine learning library that automates the entire process of training a machine learning model. When using it, you just need to have an idea of the best features you need to train your machine learning model, then you can use PyCaret from model selection to training and testing. Simply put, it automates the entire machine learning process, from choosing which model to select to training and testing your model.

The best feature of PyCaret is that it helps you know which is the best machine learning model that you should use on a particular dataset. It simply shows you the best performing models by ranking the models based on the performance measurement metrics of machine learning models. The best part about this feature is that it does everything with a few lines of code.

So even if you don’t like using shortcuts while training a machine learning model, you can still use it to select which model is best for your dataset. If you have never used it before you can easily install it by using the **pip command; pip install pycaret.** In the section below, I will take you through a machine learning tutorial on PyCaret using Python.


**PyCaret using Python?**

I hope you now have understood what is PyCaret and why it is used in machine learning. Now let’s see how to implement it using Python to automate model selection and model training. For this task, I will be using the famous Titanic dataset to predict the titanic survival using PyCaret and the Python programming language. So let’s start by importing the dataset:




In [10]:
import numpy as np
import pandas as pd

In [None]:
!pip install pycaret



In [8]:
import zipfile
import os

In [None]:
!wget --no-check-certificate \
    "https://github.com/hussain0048/Machine-Learning/archive/refs/heads/master.zip" \
    -O "/tmp/Machine-Learning.zip"


zip_ref = zipfile.ZipFile('/tmp/Machine-Learning.zip', 'r') #Opens the zip file in read mode
zip_ref.extractall('/tmp') #Extracts the files into the /tmp folder
zip_ref.close()

In [12]:
data = pd.read_csv("/tmp/Machine-Learning-master/Datasets/train.csv")

In [None]:
data.head()

Now let’s set up the model. As this is the problem of classification so I will set up the model for classification. While setting up this model we need to declare the data and the target labels. We also need to declare the features that need to be ignored while training the model. Below is how to set up the PyCaret model for classification: 

In [None]:
from pycaret.classification import *
clf = setup(data, target = "Survived",
            ignore_features=["Ticket", "Name", "PassengerId"], 
            silent = True, session_id = 786)

Now I am going to use the most important feature provided by this library which compares models. In machine learning, this is called model selection. If you don’t understand much about model selection, you can use this feature for model selection. Here’s how to compare machine learning models using PyCaret:

In [15]:
compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lightgbm,Light Gradient Boosting Machine,0.8363,0.8739,0.7542,0.8123,0.7778,0.6489,0.6546,0.075
lr,Logistic Regression,0.8266,0.865,0.7493,0.7871,0.7649,0.6279,0.6312,0.318
gbc,Gradient Boosting Classifier,0.8217,0.8719,0.7245,0.7963,0.7551,0.6158,0.6209,0.117
ridge,Ridge Classifier,0.8202,0.0,0.7199,0.7892,0.7506,0.6108,0.6144,0.018
rf,Random Forest Classifier,0.8154,0.868,0.7292,0.7767,0.7489,0.6036,0.6079,0.452
lda,Linear Discriminant Analysis,0.8138,0.8587,0.7158,0.7786,0.7432,0.5979,0.6016,0.026
et,Extra Trees Classifier,0.8091,0.8621,0.7292,0.7636,0.7435,0.5919,0.5949,0.452
ada,Ada Boost Classifier,0.8074,0.8405,0.7746,0.7357,0.7519,0.595,0.5984,0.09
dt,Decision Tree Classifier,0.8041,0.7905,0.7413,0.7513,0.7415,0.5843,0.5896,0.015
svm,SVM - Linear Kernel,0.7382,0.0,0.7111,0.692,0.6722,0.4617,0.4897,0.016


LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               importance_type='split', learning_rate=0.1, max_depth=-1,
               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
               n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
               random_state=786, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

So according to the above output the Light Gradient Boosting model is the best model that can be used in the Titanic dataset. So let’s initialize the LightGBM model and make predictions on the test set:

In [17]:
lightgbm = create_model('lightgbm')
test_data = pd.read_csv('/tmp/Machine-Learning-master/Datasets/test.csv')
predict = predict_model(lightgbm, data=test_data)
predict.head()

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.8413,0.9017,0.7917,0.7917,0.7917,0.6635,0.6635
1,0.9048,0.8846,0.8333,0.9091,0.8696,0.7948,0.7967
2,0.7778,0.8494,0.7083,0.7083,0.7083,0.5288,0.5288
3,0.8226,0.8662,0.75,0.7826,0.766,0.6232,0.6236
4,0.8871,0.9397,0.7917,0.9048,0.8444,0.7565,0.7606
5,0.8387,0.8651,0.6667,0.8889,0.7619,0.6437,0.6589
6,0.8226,0.8066,0.6522,0.8333,0.7317,0.6021,0.6122
7,0.7742,0.8902,0.8696,0.6452,0.7407,0.5484,0.5676
8,0.8548,0.8685,0.7391,0.85,0.7907,0.6804,0.6843
9,0.8387,0.8673,0.7391,0.8095,0.7727,0.6481,0.6497


Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Label,Score
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q,0,0.8253
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S,0,0.8537
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q,0,0.8944
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S,1,0.7349
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S,0,0.7536


**Summary**

PyCaret is a great machine learning library to automate the complete process of training a machine learning model as it helps in from model selection to training and testing. You can use it for at least model selection if you don’t like shortcuts while training machine learning models. 

I hope you liked this article on a machine learning tutorial on PyCaret using the Python programming language. Feel free to ask your valuable questions in the comments section below.


# **References**

[PyCaret in Machine Learning](https://thecleverprogrammer.com/2021/03/07/pycaret-in-machine-learning/)