<a href="https://colab.research.google.com/github/psrana/Machine-Learning-using-PyCaret/blob/main/02_PyCaret_for_Classification_without_Results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# **PyCaret for Classification**
---
- It is a bundle of many Machine Learning algorithms.


### **(a) Install Pycaret**

In [None]:
!pip install pycaret &> /dev/null
print ("Pycaret installed sucessfully!!")

### **(b) Get the version of the pycaret**

In [None]:
from pycaret.utils import version
version()

---
# **1. Classification: Basics**
---
### **1.1 Get the list of datasets available in pycaret (Total Datasets = 55)**




In [None]:
from pycaret.datasets import get_data
dataSets = get_data('index')

---
### **1.2 Get the "diabetes" dataset (Step-I)**
---

In [None]:
diabetesDataSet = get_data("diabetes")    # SN is 7

---
### **1.3 Parameter setting for all models (Step-II)**
---

In [None]:
from pycaret.classification import *
s = setup(data=diabetesDataSet, target='Class variable')

# Other Parameters:
# train_size = 0.7
# data_split_shuffle = False

---
### **1.4 Run all models (Step-III)**
---

In [None]:
cm = compare_models()

---
### **1.5 "Three line of code" for model comparison for "Diabetes" dataset**
---



In [None]:
from pycaret.datasets import get_data
from pycaret.classification import *

diabetesDataSet = get_data("diabetes")
setup(data=diabetesDataSet, target='Class variable')
cm = compare_models()

---
### **1.6 "Three line of code" for model comparison for "Cancer" dataset**
---



In [None]:
from pycaret.datasets import get_data
from pycaret.classification import *

cancerDataSet = get_data("cancer")
setup(data = cancerDataSet, target='Class')
cm = compare_models()

---
# **2. Classification: working with user dataset**
---
### **2.1 Download the "diabetes" dataset to local system**
---


In [None]:
diabetesDataSet.to_csv("diabetesDataSet.csv", index=False)

from google.colab import files
files.download('diabetesDataSet.csv')

---
### **2.2 Uploading "user file" from user system**
---

In [None]:
from google.colab import files
files.upload()

---
### **2.3 "Read" the uploaded file**
---

In [None]:
import pandas as pd
myDataSet = pd.read_csv('diabetesDataSet (1).csv')
myDataSet.head()

---
### **2.4 "Compare" the model performance**
---

In [None]:
from pycaret.classification import *

setup(data = myDataSet, target='Class variable')
cm = compare_models()

---
### **2.5 "Three line of code" for model comparison for "user dataset**

##### Use it, while working on **"Anaconda/Jupyter notebook"** on local machine
---

In [None]:
from pycaret.classification import *
import pandas as pd

#myDataSet = pd.read_csv("myData.csv")
#s = setup(data = myDataSet, target='cancer')
#cm = compare_models()

---
# **3. Classification: Apply "Data Preprocessing"**
---

### **3.1 Model performance using "Normalization"**

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      normalize = True, normalize_method = 'zscore')
cm = compare_models()

#normalize_method = {zscore, minmax, maxabs, robust}

---
### **3.2 Model performance using "Feature Selection"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      feature_selection = True, feature_selection_method = 'classic')
cm = compare_models()

#feature_selection_method = {classic, univariate, sequential}

---
### **3.3 Model performance using "Outlier Removal"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      remove_outliers = True, outliers_threshold = 0.05)
cm = compare_models()

---
### **3.4 Model performance using "Transformation"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      transformation = True, transformation_method = 'yeo-johnson')
cm = compare_models()

---
### **3.5 Model performance using "PCA"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      pca = True, pca_method = 'linear')
cm = compare_models()
#pca_method = {linear, kernel, incremental}

---
### **3.6 Model performance using "Outlier Removal" + "Normalization"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      remove_outliers = True, outliers_threshold = 0.05,
      normalize = True, normalize_method = 'zscore')
cm = compare_models()

---
### **3.7 Model performance using "Outlier Removal" +  "Normalization" + "Transformation"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      remove_outliers = True, outliers_threshold = 0.05,
      normalize = True, normalize_method = 'zscore',
      transformation = True, transformation_method = 'yeo-johnson')
cm = compare_models()

---
### **3.8 Explore more parameters of "setup()" on pycaret**
---
- Explore setup() paramaeters in **Step 1.3**
- **<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html" target="_blank"> Click Here</a>** for more

---
# **4. Classification: More Operations**
---
### **4.1 Build a single model - "RandomForest"**

In [None]:
from pycaret.datasets import get_data
from pycaret.classification import *

diabetesDataSet = get_data("diabetes")
setup(data=diabetesDataSet, target='Class variable')

rfModel = create_model('rf')
# Explore more parameters

---
### **4.2 Other available classification models**
---
-	'ada' -	Ada Boost Classifier
-	'dt' -	Decision Tree Classifier
-	'et' -	Extra Trees Classifier
-	'gbc' -	Gradient Boosting Classifier
-	'knn' -	K Neighbors Classifier
-	'lightgbm' -	Light Gradient Boosting Machine
-	'lda' -	Linear Discriminant Analysis
-	'lr' -	Logistic Regression
-	'nb' -	Naive Bayes
-	'qda' -	Quadratic Discriminant Analysis
-	'rf' -	Random Forest Classifier
-	'ridge' -	Ridge Classifier
-	'svm' -	SVM - Linear Kernel

---
### **4.3 Explore more parameters of "create_model()" on pycaret**
---

**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.create_model" target="_blank"> Click Here</a>**

---
### **4.4 Make prediction on the "new unseen dataset"**
---
#### **Get the "new unseen dataset"**



In [None]:
# Select top 10 rows from diabetes dataset
newDataSet = get_data("diabetes").iloc[:10]

#### **Make prediction on "new unseen dataset"**

In [None]:
newPredictions = predict_model(rfModel, data = newDataSet)
newPredictions

---
### **4.5 "Save" and "Download" the prediction result**
---

In [None]:
newPredictions.to_csv("NewPredictions.csv", index=False)

from google.colab import files
files.download('NewPredictions.csv')

---
### **4.6 "Save" the trained model**
---

In [None]:
sm = save_model(rfModel, 'rfModelFile')

---
### **4.7 Download the "trained model file" to user local system**
---

In [None]:
from google.colab import files
files.download('rfModelFile.pkl')

---
### **4.8  "Upload the trained model" --> "Load the model"  --> "Make the prediction" on "new unseen dataset"**
---
### **4.8.1 Upload the  "Trained Model"**


In [None]:
from google.colab import files
files.upload()

---
### **4.8.2 Load the "Model"**
---

In [None]:
rfModel = load_model('rfModelFile (1)')

---
### **4.8.3 Make the prediction on "new unseen dataset"**
---

In [None]:
newPredictions = predict_model(rfModel, data = newDataSet)
newPredictions

---
# **5. Plot the trained model**
---
**Following parameters can be plot for a trained model**
*   Area Under the Curve         - 'auc'
*   Discrimination Threshold     - 'threshold'
*   Precision Recall Curve       - 'pr'
*   Confusion Matrix             - 'confusion_matrix'
*   Class Prediction Error       - 'error'
*   Classification Report        - 'class_report'
*   Decision Boundary            - 'boundary'
*   Recursive Feat. Selection    - 'rfe'
*   Learning Curve               - 'learning'
*   Manifold Learning            - 'manifold'
*   Calibration Curve            - 'calibration'
*   Validation Curve             - 'vc'
*   Dimension Learning           - 'dimension'
*   Feature Importance           - 'feature'
*   Model Hyperparameter         - 'parameter'

---
### **5.1 Create RandomForest model or any other model**
---

In [None]:
rfModel = create_model('rf')

---
### **5.2 Create "Confusion Matrix"**
---

In [None]:
plot_model(rfModel, plot='confusion_matrix')

---
### **5.3 Plot the "learning curve"**
---

In [None]:
plot_model(rfModel, plot='learning')

---
### **5.4 Plot the "AUC Curve" (Area Under the Curve)**
---

In [None]:
plot_model(rfModel, plot='auc')

---
### **5.5 Plot the "Decision Boundary"**
---

In [None]:
plot_model(rfModel, plot='boundary')

---
### **5.6 Get the model "parameters"**
---

In [None]:
plot_model(rfModel, plot='parameter')

---
### **5.7 Explore the more parameters of "plot_model()" on pycaret**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.plot_model" target="_blank"> Click Here </a>**

---
# **6. Feature Importance**
---
### **6.1 Feature Importance using "Random Forest"**


In [None]:
rfModel = create_model('rf', verbose=False)
plot_model(rfModel, plot='feature')

---
### **6.2 Feature Importance using "Extra Trees Regressor"**
---

In [None]:
etModel = create_model('et', verbose=False)
plot_model(etModel, plot='feature')

---
### **6.3 Feature Importance using "Decision Tree"**
---

In [None]:
dtModel = create_model('dt', verbose=False)
plot_model(dtModel, plot='feature')

---
# **7. Tune/Optimize the model performance**
---
### **7.1 Train "Decision Tree" with default parameters**


In [None]:
dtModel = create_model('dt')

#### **Get the "parameters" of Decision Tree**

In [None]:
plot_model(dtModel, plot='parameter')

---
### **7.2 Tune "Decision Tree" model**
---

In [None]:
dtModelTuned = tune_model(dtModel, n_iter=50)

#### **Get the "tuned parameters" of Decision Tree**

In [None]:
plot_model(dtModelTuned, plot='parameter')

---
### **7.3 Explore more parameters of "tune_model()" on pycaret**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.tune_model" target="_blank"> Click Here </a>**

---
# **8. AutoML - Advanced Machine Learning**
---

- Select n Best Models:
  - Ensemble, Stacking, Begging, Blending
  - Auto tune the best n models

**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.automl" target="_blank">Click Here</a>**


---
# **9. Deploy the model on AWS / Azure**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.deploy_model" target="_blank">Click Here</a>**