# Iris Flower Classification
Introduction:

The Iris Flower Classification project at Oasis Infobyte aims to develop a machine learning model capable of accurately classifying iris flowers based on their measurements. The task involves differentiating between three species: setosa, versicolor, and virginica. By leveraging machine learning algorithms, we can automate the classification process and enhance efficiency in various fields, such as botany, horticulture, and environmental studies.

In this project, we will utilize the popular Iris flower dataset from Kaggle, which provides measurements of sepal length, sepal width, petal length, and petal width for each iris flower species. This dataset will serve as our training and testing data, enabling us to evaluate the performance of our classification models.

To achieve accurate classification, we will explore two machine learning algorithms:

##1. K-Nearest Neighbors (KNN) Algorithm for Iris Flower Classification

In the Iris Flower Classification project at Oasis Infobyte, we will employ the K-Nearest Neighbors (KNN) algorithm to accurately classify different species of iris flowers. KNN is a non-parametric algorithm that operates based on the principle of similarity. By considering the K nearest neighbors from the training dataset, KNN assigns the majority class among those neighbors to a new, unlabeled data point.

By leveraging the measurements of sepal length, sepal width, petal length, and petal width, KNN will determine the similarity between iris flowers and classify them accordingly. Throughout this project, we will analyze the performance of the KNN algorithm, explore different values of K, and evaluate its effectiveness in accurately predicting the species of iris flowers.

In [None]:
#importing required libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df=pd.read_csv('Iris.csv') # Reading the dataset into a dataframe using the pandas library
df.head() # displays the first 5 rows of the DataFrame

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [None]:
df.shape # returns the dimensions (rows, columns) of the DataFrame.

(150, 6)

In [None]:
df.info() # provides a concise summary of the DataFrame's structure and content.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


In [None]:
df.describe() # generates summary statistics of the DataFrame's numerical columns

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0,150.0
mean,75.5,5.843333,3.054,3.758667,1.198667
std,43.445368,0.828066,0.433594,1.76442,0.763161
min,1.0,4.3,2.0,1.0,0.1
25%,38.25,5.1,2.8,1.6,0.3
50%,75.5,5.8,3.0,4.35,1.3
75%,112.75,6.4,3.3,5.1,1.8
max,150.0,7.9,4.4,6.9,2.5


In [None]:
df['Species'].value_counts() # returns the count of unique values in the 'species' column.

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: Species, dtype: int64

In [None]:
df.isnull().sum() #returns the number of missing values (null values) in each column of the DataFrame

Id               0
SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64

In [None]:
# Assigning the feature matrix and target variable
X= df[['Id','SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
Y= df['Species']

In [None]:
Y # Display target variable

0         Iris-setosa
1         Iris-setosa
2         Iris-setosa
3         Iris-setosa
4         Iris-setosa
            ...      
145    Iris-virginica
146    Iris-virginica
147    Iris-virginica
148    Iris-virginica
149    Iris-virginica
Name: Species, Length: 150, dtype: object

In [None]:
X # Display feature matrix

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
0,1,5.1,3.5,1.4,0.2
1,2,4.9,3.0,1.4,0.2
2,3,4.7,3.2,1.3,0.2
3,4,4.6,3.1,1.5,0.2
4,5,5.0,3.6,1.4,0.2
...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3
146,147,6.3,2.5,5.0,1.9
147,148,6.5,3.0,5.2,2.0
148,149,6.2,3.4,5.4,2.3


In [None]:
# Utilizing the `train_test_split` function from the `sklearn.model_selection` module to split the data into training and testing sets.
# Assigning the variables `X_train`, `X_test`, `Y_train`, and `Y_test` to the results of the `train_test_split` function.
#`X` represents the feature matrix, and `Y` represents the target variable.
# Setting the `test_size` parameter to 0.20, indicating that we want to allocate 30% of the data for testing, while using the remaining 80% for training.
# Additionally, setting the `random_state` parameter to 1 to ensure the same random split is obtained every time the code is run, providing reproducibility of the results.

from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=1)

In [None]:
from sklearn.neighbors import KNeighborsClassifier # imports the KNeighborsClassifier class for performing K-Nearest Neighbors classification
knn = KNeighborsClassifier() # creates an instance of the KNeighborsClassifier model.
knn.fit(X_train,Y_train) # trains the K-Nearest Neighbors classifier knn using the training data.

In [None]:
knn.fit(X,Y) # trains the K-Nearest Neighbors classifier knn using the provided feature matrix X and target variable Y.

In [None]:
# Used the trained K-Nearest Neighbors classifier knn to predict the target variable values for the feature matrix X.
# The predicted values are assigned to the variable prediction1.
prediction1 = knn.predict(X)
# Comparison of actual and predicted values
Scores1 = pd.DataFrame({'Actual Values':Y,'Predicted values':prediction1})
Scores1.head()

Unnamed: 0,Actual Values,Predicted values
0,Iris-setosa,Iris-setosa
1,Iris-setosa,Iris-setosa
2,Iris-setosa,Iris-setosa
3,Iris-setosa,Iris-setosa
4,Iris-setosa,Iris-setosa


In [None]:
# Predicting the target variable values for the testing feature matrix X_test. The predicted values are assigned to the variable Y_test2
Y_testk=knn.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score # imports the accuracy_score function for evaluating classification model accuracy.
print('Accuracy Score :',accuracy_score(Y_test,Y_testk)*100,'%') # Displays the accuracy score

Accuracy Score : 100.0 %


## 2. Logistic Regression Algorithm for Iris Flower Classification

Here we will utilize the Logistic Regression algorithm to classify iris flowers based on their measurements. Logistic Regression is a popular machine learning algorithm that models the relationship between independent variables and the probability of a specific outcome.

By analyzing the measurements of sepal length, sepal width, petal length, and petal width, Logistic Regression will learn the underlying patterns and calculate the probability of each iris flower species. Applying an appropriate threshold, we can assign class labels to the iris flowers, effectively classifying them into setosa, versicolor, or virginica species. Throughout this project, we will explore the interpretability, performance, and limitations of the Logistic Regression algorithm for iris flower classification.

In [None]:
# Utilizing the `train_test_split` function from the `sklearn.model_selection` module to split the data into training and testing sets.
# Assigning the variables `X_train`, `X_test`, `Y_train`, and `Y_test` to the results of the `train_test_split` function.
#`X` represents the feature matrix, and `Y` represents the target variable.
# Setting the `test_size` parameter to 0.20, indicating that we want to allocate 20% of the data for testing, while using the remaining 80% for training.
# Additionally, setting the `random_state` parameter to 40 to ensure the same random split is obtained every time the code is run, providing reproducibility of the results.

from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.20,random_state=40)


In [None]:
from sklearn.linear_model import LogisticRegression # imports the LogisticRegression class from the sklearn.linear_model module.
lr = LogisticRegression() # creates an instance of the LogisticRegression model.
lr.fit(X_train,Y_train) # trains the logistic regression model lr using the training data.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:
lr.fit(X,Y) # trains the logistic regression model lr using the feature matrix X and the target variable Y.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:
# Used the trained logistic regression model lr to predict the target variable values for the feature matrix X.
# The predicted values are assigned to the variable predictions.
prediction = lr.predict(X)

# Comparison of actual and predicted values
Scores = pd.DataFrame({'Actual Values':Y,'Predicted values':prediction})
Scores.head()

Unnamed: 0,Actual Values,Predicted values
0,Iris-setosa,Iris-setosa
1,Iris-setosa,Iris-setosa
2,Iris-setosa,Iris-setosa
3,Iris-setosa,Iris-setosa
4,Iris-setosa,Iris-setosa


In [None]:
# Predicting the target variable values for the testing feature matrix X_test. The predicted values are assigned to the variable Y_test2
Y_testl=lr.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score # imports the accuracy_score function for evaluating classification model accuracy.
print('Accuracy Score :',accuracy_score(Y_test,Y_testl)*100,'%') # Displays the accuracy score

Accuracy Score : 100.0 %


## Conclusion:

In conclusion, the Iris Flower Classification project at Oasis Infobyte aimed to develop machine learning models capable of accurately classifying iris flowers based on their measurements. Two algorithms, Logistic Regression and K-Nearest Neighbors (KNN), were employed for this task.

The Logistic Regression algorithm, which models the relationship between independent variables and the probability of a specific outcome, was applied to classify the iris flowers. By analyzing the measurements of sepal length, sepal width, petal length, and petal width, Logistic Regression successfully learned the underlying patterns and accurately predicted the species of the iris flowers.

Similarly, the K-Nearest Neighbors algorithm, which makes predictions based on the similarity of instances in the feature space, was utilized for classification. By considering the nearest neighbors in the feature space, KNN achieved perfect accuracy in classifying the iris flowers.

Both algorithms demonstrated outstanding performance, achieving 100% accuracy on the given dataset. This suggests that the models were able to effectively distinguish between the different species of iris flowers based on their measurements.

However, it is important to note that achieving perfect accuracy on the training dataset does not guarantee similar performance on unseen data. It is recommended to evaluate the models on a separate test set to assess their generalization capabilities and ensure their reliability in real-world scenarios.

Overall, the Iris Flower Classification project provided valuable hands-on experience in developing machine learning models, understanding classification algorithms, and analyzing real-world datasets, contributing to a deeper understanding of the practical applications of machine learning in the field of flower classification.