# #**Iris Information:**
Iris is a genus of 260–300 species of flowering plants with showy flowers. It takes its name from the Greek word for a rainbow, which is also the name for the Greek goddess of the rainbow, Iris. Iris is also widely used as a common name for all Iris species, as well as some belonging to other closely related genera. Iris is extensively grown as ornamental plant in home and botanical gardens. The Iris flowers color ranges from white, pink, orange, purple, lavender.

There are almost 300 different species of Iris has been already discoverd, for our Data Science purpose we are going to make EDA for following 3 different Iris species:

Setosa
Versicolor
Virginica
The flowers are classified by the features

sepal lenght in cm
sepal width in cm
petal lenth in cm
petal lenght in cm

Let us start with our work on Iris Dataset

Load the important required libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Load the dataset now

In [None]:
iris = pd.read_csv("../input/iris-flower-dataset/IRIS.csv")

In [None]:
iris.head()

In [None]:
iris.tail()

In [None]:
iris.shape

In [None]:
iris.isnull().sum()

In [None]:
iris.describe()

Let's check count for each species

In [None]:
iris['species'].value_counts()

We can see here iris data set is a balanced dataset. The Iris dataset consists of 150 data instances. There are 3 classes(Features) - Iris Setosa,Iris Versicolor and Iris Virginica, each have 50 instances.

In [None]:
sns.countplot('species',data=iris)
plt.show()

Since the data set is equally balanced, we got the equal no of counts

In [None]:
plt.figure(figsize=(7,7))
sns.set_style('whitegrid')
sns.scatterplot(x=iris['sepal_length'], y=iris['sepal_width'], hue=iris['species'], palette=['green','orange','dodgerblue'])

From above plot using sepal length and sepal width, the setosa variety is easily distinguishable. The versicolor and virginica are overlapping, so harder to distinguish.

In [None]:
plt.figure(figsize=(7,5))
sns.set_style('whitegrid')
sns.scatterplot(x=iris['petal_length'], y=iris['petal_width'], hue=iris['species'], palette=['green','orange','dodgerblue'])

From above plot using sepal length and sepal width, the setosa variety is easily distinguishable. The versicolor and virginica are overlapping, so harder to distinguish.

In [None]:
plt.figure(figsize=(7,7))
sns.set_style('whitegrid')
sns.pairplot(data=iris, hue='species', palette=['green','orange','dodgerblue'])

From above plot we can see that,

In case of sepal length & sepal width, setosa is easily seperable but versicolor & virginica have some overlap.
In case of petal length & petal width, all the species are quite seperable. And the useful features to distinguish flower types.

Modelling with K-Means Clustering

In [None]:
x = iris.drop(['species'],axis=1)
y = iris.species

Describing the data set

In [None]:
iris.describe()

In [None]:
from sklearn.preprocessing import StandardScaler
x = StandardScaler().fit_transform(x)

Applying K-Means Clustering

In [None]:
from sklearn.cluster import KMeans
wcss = []

for i in range(1, 11):
    kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
    kmeans.fit(x)
    wcss.append(kmeans.inertia_)

In [None]:
plt.plot(range(1, 11), wcss)
plt.title('The elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

The above method is the implementation of "The Elbow Method". The elbow method shows us to pick optimum no of clusters for classification. Also, the plot clearly shows why it is called "The elbow method".

The optimum clusters is where the elbow curve happens. This is when the within cluster sum of squares (WCSS) doesn't decrease significantly with every iteration. In above graph, clusters = k =3.

Now that we have the optimum amount of clusters, we can move on to applying K-means clustering to the Iris dataset.

In [None]:
kmeans = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x)

In [None]:
y_kmeans

In [None]:
centroids=kmeans.cluster_centers_
centroids

In [None]:
iris1=iris.copy()
iris1["species"]=iris1["species"].map({'Iris-versicolor':0,'Iris-setosa':1,'Iris-virginica':2}).astype(int)
iris1.head()

In [None]:
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 70, label = 'Iris-setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 70, label = 'Iris-versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 70, label = 'Iris-virginica')

#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 70, c = 'black', label = 'Centroids')

plt.legend()

In [None]:
fig = plt.figure(figsize = (10,10))
ax = fig.add_subplot(111, projection='3d')
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, label = 'Iris-setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, label = 'Iris-versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, label = 'Iris-virginica')

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 70, c = 'black', label = 'Centroids')
plt.show()