<a href="https://colab.research.google.com/github/Rochika10/Machine-Learning_using_Pycaret-/blob/main/PyCaret_for_Clustering_without_Results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# **PyCaret for Clustering**
---
- It is a bundle of many Machine Learning algorithms.
- Only three lines of code is required to compare 20 ML models.
- Pycaret is available for:
    - Classification
    - Regression
    - Clustering

---

### **Self learning resource**
1. Tutorial on Pycaret **<a href="https://pycaret.readthedocs.io/en/latest/tutorials.html"> Click Here</a>**

2. Documentation on Pycaret-Clustering: **<a href="https://pycaret.readthedocs.io/en/latest/api/clustering.html"> Click Here </a>**

---


### **(a) Install Pycaret**

In [None]:
!pip install pycaret &> /dev/null
print ("Pycaret installed sucessfully!!")

### **(b) Get the version of the pycaret**

In [None]:
from pycaret.utils import version
version()

---
# **1. Clustering - Part 1 (Kmean Clustering)**
---
### **1.1 Get the list of datasets available in pycaret (55)**

In [None]:
from pycaret.datasets import get_data
dataSets = get_data('index')

---
### **1.2 Get the "jewellery" dataset**
---

In [None]:
jewelleryDataSet = get_data("jewellery")    # SN is 30
# This is unsupervised dataset.
# No target is defined.

---
### **1.3 Download the "jewellery" dataset to local system**
---

In [None]:
jewelleryDataSet.to_csv("jewelleryDataSet.csv")
from google.colab import files
#files.download('jewelleryDataSet.csv')

  ---
### **1.4 "Parameter setting"  for clustering model**
##### **Train/Test division, applying data pre-processing** {Sampling, Normalization, Transformation, PCA, Handaling of Outliers, Feature Selection}
---

In [None]:
from pycaret.clustering import *
kMeanClusteringParameters = setup(jewelleryDataSet)
# Re-run the code if any error occur

---
### **1.5 Building "KMean" clustering model**
---

In [None]:
KMeanClusteringModel = create_model('kmeans', num_clusters=4)

---
### **1.6 Assign Model - "Assign the labels" to the dataset**
---



In [None]:
kMeanPrediction = assign_model(KMeanClusteringModel)
kMeanPrediction

---
### **1.7 "Saving" the result**
---



In [None]:
kMeanPrediction.to_csv("KMeanResult.csv")
print("Result file save sucessfully!!")

---
### **1.8 Download the "result file" to user local system**
---

In [None]:
from google.colab import files
#files.download('KMeanResult.csv')      # Uncomment this line
# Open and Explore result file (KMeanResult.csv).

---
# **2. Clustering: Saving and Loading the Model**
---
### **2.1 Save the "trained model"**
---

In [None]:
x = save_model(KMeanClusteringModel, 'kMeanClusteringModelFile')

---
### **2.2 Download the "trained model**
---

In [None]:
from google.colab import files
#files.download('kMeanClusteringModelFile.pkl')      # Uncomment this line

---
### **2.3 Load the model**
---
##### Use it, while working on **"Anaconda/Jupyter notebook"** on local machine

In [None]:
KMeanClusteringModel1 = load_model('kMeanClusteringModelFile')

---
### **2.4 Upload and Load the trained model to "Colab Environment"**
---
##### **Upload the trained model**

In [None]:
from google.colab import files
#files.upload()                     # Uncomment this line

##### **Load the trained model**

In [None]:
#KMeanClusteringModel1 = load_model('kMeanClusteringModelFile (1)')

---
# **3. Clustering: Cluster the new dataset (Unseen Data)**
---
### **3.1 Select some data or upload user dataset file**

In [None]:
# Select top 10 rows
newData = get_data("jewellery").iloc[:10]

---
### **3.2 Make prediction on the new dataset (Unseen Data)**
---

In [None]:
newPredictions = predict_model(KMeanClusteringModel, data = newData)
newPredictions

---
### **3.3 Save the prediction result to csv**
---

In [None]:
newPredictions.to_csv("NewPredictions.csv")
print("Result file save sucessfully!!")

---
# **4. Clustering: Ploting the Cluster**
---
```
- Cluster PCA Plot (2d)          'cluster'
- Cluster TSnE (3d)              'tsne'
- Elbow Plot                     'elbow'
- Silhouette Plot                'silhouette'
- Distance Plot                  'distance'
- Distribution Plot              'distribution'
```

---
### **4.1 Evaluate Cluster Model**
---

In [None]:
evaluate_model(KMeanClusteringModel)

---
### **4.2 2D-plot for Cluster**
---

In [None]:
plot_model(KMeanClusteringModel, plot='cluster')

---
### **4.3 3D-plot for Cluster**
---

In [None]:
plot_model(KMeanClusteringModel, plot = 'tsne')

---
### **4.4 Elbow Plot**
---

In [None]:
plot_model(KMeanClusteringModel, plot = 'elbow')

---
### **4.5 Silhouette Plot**
---

In [None]:
plot_model(KMeanClusteringModel, plot = 'silhouette')

---
### **4.6 Distribution Plot**
---

In [None]:
plot_model(KMeanClusteringModel, plot = 'distribution')

---
### **4.7 Distance Plot**
---

In [None]:
plot_model(KMeanClusteringModel, plot = 'distance')

---
# **5. Compelete Code for Clustering (KMean)**
---
### **5.1 For Cluster = 3**

In [None]:
from pycaret.datasets import get_data
from pycaret.clustering import *

jewelleryDataSet = get_data('jewellery')
setup(data = jewelleryDataSet)
x = create_model('kmeans', num_clusters = 3)

---
### **5.2 For Cluster = 4**
---

In [None]:
from pycaret.datasets import get_data
from pycaret.clustering import *

jewelleryDataSet = get_data('jewellery')
setup(data = jewelleryDataSet)
x = create_model('kmeans', num_clusters = 4)

---
### **5.3 Other Clustering Algorithms**
---
```
- K-Means clustering                 'kmeans'
- Affinity Propagation               'ap'
- Mean shift clustering              'meanshift'
- Spectral Clustering                'sc'
- Agglomerative Clustering           'hclust'
- Density-Based Spatial Clustering   'dbscan'
- OPTICS Clustering                  'optics'
- Birch Clustering                   'birch'
- K-Modes clustering                 'kmodes'
```

---
# **6. Clustering: Apply "Data Preprocessing"**
---
### **Read the Dataset**

In [None]:
from pycaret.clustering import *
from pycaret.datasets import get_data

jewelleryDataSet = get_data('jewellery')

---
### **6.1 Model Performance using "Normalization"**
---
### **6.1.1 Elbow Plot**


In [None]:
setup(data = jewelleryDataSet, normalize = True, normalize_method = 'zscore')
x = create_model('kmeans')
plot_model(x, plot = 'elbow')

---
### **6.1.2 For Cluster = 3**
---

In [None]:
setup(data = jewelleryDataSet, normalize = True, normalize_method = 'zscore')
x = create_model('kmeans', num_clusters = 3)

---
### **6.1.3 For Cluster = 4**
---

In [None]:
setup(data = jewelleryDataSet, normalize = True, normalize_method = 'zscore')
x = create_model('kmeans', num_clusters = 4)

---
### **6.1.4 For Cluster = 5**
---

In [None]:
setup(data = jewelleryDataSet, normalize = True, normalize_method = 'zscore')
x = create_model('kmeans', num_clusters = 5)

---
### **6.1.5 3D Plot for Cluster = 5**
---

In [None]:
setup(data = jewelleryDataSet, normalize = True, normalize_method = 'zscore')
x = create_model('kmeans', num_clusters = 5)
plot_model(x, plot = 'tsne')

---
### **6.2 Model Performance using "Transformation"**
---

### **6.2.1 Elbow Plot**


In [None]:
setup(data = jewelleryDataSet, transformation = True, transformation_method = 'yeo-johnson')
x = create_model('kmeans')
plot_model(x, plot = 'elbow')

---
### **6.2.2 For Cluster = 3**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, transformation_method = 'yeo-johnson')
x = create_model('kmeans', num_clusters = 3)

---
### **6.2.3 For Cluster = 4**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, transformation_method = 'yeo-johnson')
x = create_model('kmeans', num_clusters = 4)

---
### **6.2.4 For Cluster = 5**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, transformation_method = 'yeo-johnson')
x = create_model('kmeans', num_clusters = 5)

---
### **6.3 Model Performance using "PCA"**
---
### **6.3.1 Elbow Plot**

In [None]:
setup(data = jewelleryDataSet, pca = True, pca_method = 'linear')
x = create_model('kmeans')
plot_model(x, plot = 'elbow')

---
### **6.3.2 For Cluster = 3**
---

In [None]:
setup(data = jewelleryDataSet, pca = True, pca_method = 'linear')
x = create_model('kmeans', num_clusters = 3)

---
### **6.3.3 For Cluster = 4**
---

In [None]:
setup(data = jewelleryDataSet, pca = True, pca_method = 'linear')
x = create_model('kmeans', num_clusters = 4)

---
### **6.3.4 For Cluster = 5**
---

In [None]:
setup(data = jewelleryDataSet, pca = True, pca_method = 'linear')
x = create_model('kmeans', num_clusters = 5)

---
### **6.4 Model Performance using "Transformation" + "Normalization"**
---
### **6.4.1 Elbow Plot**

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson')
x = create_model('kmeans')
plot_model(x, plot = 'elbow')

---
### **6.4.2 For Cluster = 3**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson')
x = create_model('kmeans', num_clusters = 3)

---
### **6.4.3 For Cluster = 4**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson')
x = create_model('kmeans', num_clusters = 4)

---
### **6.4.4 For Cluster = 5**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson')
x = create_model('kmeans', num_clusters = 5)

---
### **6.5 Model Performance using "Transformation" + "Normalization" + "PCA"**
---
### **6.5.1 Elbow Plot**

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True, pca = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson', pca_method = 'linear')
x = create_model('kmeans')
plot_model(x, plot = 'elbow')

---
### **6.5.2 For Cluster = 3**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True, pca = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson', pca_method = 'linear')
x = create_model('kmeans', num_clusters = 3)

---
### **6.5.3 For Cluster = 4**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True, pca = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson', pca_method = 'linear')
x = create_model('kmeans', num_clusters = 4)

---
### **6.5.4 For Cluster = 5**
---

In [None]:
setup(data = jewelleryDataSet, transformation = True, normalize = True, pca = True,
      normalize_method = 'zscore', transformation_method = 'yeo-johnson', pca_method = 'linear')
x = create_model('kmeans', num_clusters = 5)

---
# **7. Other Clustering Techniques**
---
```
K-Means clustering                 'kmeans'
Affinity Propagation               'ap'
Mean shift clustering              'meanshift'
Spectral Clustering                'sc'
Agglomerative Clustering           'hclust'
Density-Based Spatial Clustering   'dbscan'
OPTICS Clustering                  'optics'
Birch Clustering                   'birch'
K-Modes clustering                 'kmodes'
```

---
### **7.1 Buildign Agglomerative (Hierarchical) clustering model**
---

In [None]:
from pycaret.datasets import get_data
from pycaret.clustering import *

jewelleryDataSet = get_data('jewellery')
setup(data = jewelleryDataSet)
hierarchicalModel = create_model('hclust', num_clusters=3)

---
### **7.1.1 Assign Model - "Assign the labels" to the dataset**
---



In [None]:
hierarchicalModelPrediction = assign_model(hierarchicalModel)
hierarchicalModelPrediction

---
### **7.1.2 Evaluate Agglomerative (Hierarchical) Clustering**
---

In [None]:
evaluate_model(hierarchicalModel)

---
### **7.2 Density-Based Spatial Clustering**
---

In [None]:
from pycaret.datasets import get_data
from pycaret.clustering import *

jewelleryDataSet = get_data('jewellery')
setup(data = jewelleryDataSet)
dbscanModel = create_model('dbscan')

---
### **7.2.1 Assign Model - "Assign the labels" to the dataset**
---



In [None]:
dbscanModelPrediction = assign_model(dbscanModel)
dbscanModelPrediction

# Noisy samples are given the label -1 i.e. 'Cluster -1'

### **Key Points**

- num_clusters not required for some of the clustering Alorithms (Affinity Propagation ('ap'), Mean shift
  clustering ('meanshift'), Density-Based Spatial Clustering ('dbscan') and OPTICS Clustering ('optics')).
- num_clusters param for these models are automatically determined.

- When fit doesn't converge in Affinity Propagation ('ap') model, all datapoints are labelled as -1.

- Noisy samples are given the label -1, when using Density-Based Spatial  ('dbscan') or OPTICS Clustering ('optics').

- OPTICS ('optics') clustering may take longer training times on large datasets.


---
# **8. Deploy the model on AWS**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/clustering.html#pycaret.clustering.deploy_model">Click Here</a>**