# Amr Hacoglu - #GRIPAUGUST2024

# 📝 #2 K-Means Clustering: Predicting Optimal Number of Clusters for Iris Dataset

## 📋 Overview
This notebook demonstrates the use of unsupervised machine learning, specifically K-Means clustering, to predict the optimal number of clusters in the Iris dataset. The following steps will be covered:

1. Importing Libraries
1. Loading and Displaying the Iris Dataset
1. Finding the Optimal Number of Clusters using the Elbow Method
1. Applying K-Means Clustering and Visualizing the Results

Let's get started! 🚀

# Importing Libraries #

In [None]:
import numpy as np
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Loading and Displaying the Iris Dataset

In [None]:
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
iris_df.head() 

# Finding the Optimal Number of Clusters using the Elbow Method

In [None]:
# Extract the feature data
X = iris_df.values

# Perform the elbow method to find the optimal number of clusters
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

# Plot the elbow curve
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

*The plot shows that the elbow point is at 3 clusters, so the optimal number of clusters for the Iris dataset is 3.*

# Applying K-Means Clustering and Visualizing the Results

In [None]:
# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, init='k-means++', max_iter=300, n_init=10, random_state=0)
y_kmeans = kmeans.fit_predict(X)

# Visualize the clusters
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s=70, c='Green', label='Iris-setosa')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s=70, c='orange', label='Iris-versicolor')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s=70, c='purple', label='Iris-virginica')

# Plot the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red', label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.legend()
plt.show()

*The plot shows the 3 clusters identified by the K-Means algorithm, along with the centroids of each cluster.*