# IRIS Flower Dataset Classification

### The Iris dataset is a classic and well-known dataset in the field of machine learning. It comprises information about three species of Iris flowers: Iris setosa, Iris versicolor, and Iris virginica. This dataset is often employed for various tasks such as classification, pattern recognition, and data visualization. 

#### The dataset contains the following attributes for each flower specimen:

    1.sepal length in centimeters
    2.sepal width in centimeters
    3.petal length in centimeters
    4.petal width in centimeters
    
    
    
### These attributes serve as features that help discriminate between the three Iris species. The Iris dataset is often used as a fundamental example for introducing concepts such as data preprocessing, exploratory data analysis, and classification algorithms in machine learning courses and tutorials.
### Based on these four features, the goal is often to classify each flower into one of the three species. The dataset is balanced, meaning there is an equal number of samples for each species.

## Import Libraries

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


## Load the required dataset (Iris Dataset)

In [None]:
df= pd.read_csv("/kaggle/input/iriscsv/Iris.csv")
df.head()

In [None]:
# Drop the Id column as it is not significant here
df=df.drop("Id",axis=1)
df.head(10)

In [None]:
df.columns

In [None]:
df.info()

In [None]:
#Display the number of samples for each species
df["Species"].value_counts()

In [None]:
#Converting class labels into numerical form

from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
df['Species']=le.fit_transform(df['Species'])
df["Species"]

## Exploratory Data Analysis

In [None]:
# Plotting the three verities of iris flowers
sns.FacetGrid(df,hue='Species',height=5).map(plt.scatter,'SepalLengthCm','PetalLengthCm').add_legend()

In [None]:
sns.FacetGrid(df,hue='Species',height=5).map(plt.scatter,'SepalWidthCm','PetalWidthCm').add_legend()

In [None]:
# Pairwise scatter plot: pair plot
#Dis-advantage are 1.can be used when number of features are high.
                #  2.cannot visualize higher dimensional patterns in 3D and 4
plt.close() # closes any previously open plots that might be displayed
sns.set_style("darkgrid")
sns.pairplot(df,hue="Species",size=3)
plt.show()

In [None]:
# Plotting histogram for each feature
df['SepalLengthCm'].hist()

In [None]:
df['SepalWidthCm'].hist()

In [None]:
df['PetalLengthCm'].hist()

In [None]:
df['PetalWidthCm'].hist()

In [None]:
#Plotting histogram for all the features together
df['SepalLengthCm'].hist(color='pink')
df['SepalWidthCm'].hist(color='red')
df['PetalLengthCm'].hist(color='blue')
df['PetalWidthCm'].hist(color='green')

In [None]:
# Plot scatter plot to visualize relationships between features
species=[0,1,2]
color=['red','orange','blue']

In [None]:
# scatter plot showing relation between Sepal Length and Sepal Width

for i in range(3):
    x = df[df['Species'] == species[i]]
    plt.scatter(x['SepalLengthCm'], x['SepalWidthCm'], c=color[i], label=species[i])
plt.xlabel("Sepal Length")
plt.ylabel("Sepal Width")
plt.legend()
    

In [None]:
# Scatter plot for Petal Length vs Petal Width 
for i in range(3):
    x = df[df['Species'] == species[i]]
    plt.scatter(x['PetalLengthCm'], x['PetalWidthCm'], c = color[i], label=species[i])
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.legend()

In [None]:
# Scatter plot for Petal Length vs Sepal Length
for i in range(3):
    x = df[df['Species'] == species[i]]
    plt.scatter(x['SepalLengthCm'], x['PetalLengthCm'], c = color[i], label=species[i])
plt.xlabel("Sepal Length")
plt.ylabel("Petal Length")
plt.legend()

In [None]:
# Scatter plot for Sepal Width vs Petal Width
for i in range(3):
    x = df[df['Species'] == species[i]]
    plt.scatter(x['SepalWidthCm'], x['PetalWidthCm'], c = color[i], label=species[i])
plt.xlabel("Sepal Width")
plt.ylabel("Petal Width")
plt.legend()

## Correlation Matrix

A correlation matrix holds a pivotal role in the realm of data analysis as it unveils intricate connections among variables within a dataset. By providing a comprehensive overview of how variables relate to each other, this matrix aids in the identification of patterns, dependencies, and potential insights hidden within the data. 



A correlation matrix is a table that displays the correlation coefficients between different variables in a dataset. Correlation coefficients quantify the strength and direction of the linear relationship between two variables. The values typically range from -1 to 1, where:

1: A perfect positive correlation, meaning that as one variable increases, the other variable also increases proportionally.
0: No correlation, indicating that there's no linear relationship between the variables.
-1: A perfect negative correlation, implying that as one variable increases, the other variable decreases proportionally.

In [None]:
# Finding the correlation matrix
df.corr()

In [None]:
# displaying the correlation matrix using a heatmap
corr=df.corr()
fig, ax=plt.subplots(figsize=(5,4))
sns.heatmap(corr,annot=True,ax=ax,cmap='RdBu') #cmap can be Greens,coolwarm,YlGnBu,RdBu

## Model Training

In [None]:
#splitting the data into features X and target Y
X = df.drop('Species',axis=1)
X

In [None]:
y = df["Species"]
y

In [None]:
# Doing train-test split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size =0.3,random_state=1)

In [None]:
# Initializing logistic regression model
from sklearn.linear_model import LogisticRegression
model1=LogisticRegression()

# model fitting
model1.fit(X_train,y_train)

# model accuracy
print("Accuracy(Logistic Regression): ",model1.score(X_test,y_test)*100)
#model1 can be optimized using standardscalar( The accuracy of optimized model will be 28.888 which creats difficulty while comparing with other model accuracies)

In [None]:
# K-nearest model (KNN)

from sklearn.neighbors import KNeighborsClassifier
model2=KNeighborsClassifier()
model2.fit(X_train,y_train)
print("Accuracy (KNN): ",model2.score(X_test,y_test)*100)

In [None]:
#Decision Tree model

from sklearn.tree import DecisionTreeClassifier
model3=DecisionTreeClassifier()
model3.fit(X_train,y_train)
print("Accuracy (DecisionTree): ",model3.score(X_test,y_test)*100)

## Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix


In [None]:
y_pred1=model1.predict(X_test)
y_pred2=model2.predict(X_test)
y_pred3=model3.predict(X_test)



In [None]:
conf_matrix1 = confusion_matrix(y_test, y_pred1)
conf_matrix2 = confusion_matrix(y_test, y_pred2)
conf_matrix3 = confusion_matrix(y_test, y_pred3)

In [None]:
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix1, annot=True, fmt='d', cmap='Greens', xticklabels=np.unique(y), yticklabels=np.unique(y))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix of Logistic Regression')
plt.show()

In [None]:
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix2, annot=True, fmt='d', cmap='Greens', xticklabels=np.unique(y), yticklabels=np.unique(y))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix of Logistic Regression')
plt.show()

In [None]:
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix3, annot=True, fmt='d', cmap='Greens', xticklabels=np.unique(y), yticklabels=np.unique(y))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix of Logistic Regression')
plt.show()