<h1>Iris Dataset</h1>

<p>Use sklearn.datasets iris flower dataset to train your model using logistic regression. You need to figure out accuracy
of your model and use that to predict different samples in your test dataset. In iris dataset there are 150 samples containing following features,

1. Sepal Length
2. Sepal Width
3. Petal Length
4. Petal Width

Using above 4 features you will clasify a flower in one of the three categories,

1. Setosa
2. Versicolour
3. Virginica
</p>

In [23]:
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import numpy as np
import seaborn as sns
%matplotlib inline

iris = load_iris()

In [22]:
dir(iris)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [4]:
iris.data[[0,1]]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2]])

In [5]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [6]:
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [8]:
df['species_name'] = df['species'].apply(lambda x: iris.target_names[x])
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species,species_name
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


In [9]:
X = df.drop(['species', 'species_name'], axis='columns')
y = df.species

In [10]:
model = LogisticRegression(max_iter=2000)

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [12]:
model.fit(X_train, y_train)

In [13]:
y_predicted = model.predict(X_test)
y_predicted

array([2, 2, 0, 1, 2, 0, 0, 0, 2, 0, 1, 0, 2, 2, 0, 0, 1, 1, 0, 1, 1, 1,
       0, 0, 1, 0, 2, 2, 1, 1])

In [14]:
model.score(X_test, y_test)

1.0

In [15]:
model.coef_, model.intercept_

(array([[-0.42381728,  0.88875343, -2.36877887, -0.95573856],
        [ 0.45866664, -0.31341647, -0.15358822, -0.93126851],
        [-0.03484936, -0.57533696,  2.52236708,  1.88700707]]),
 array([  9.47839054,   2.26433813, -11.74272866]))

In [16]:
y_test_df = pd.DataFrame(y_test).reset_index(drop=True)
y_pred_df = pd.DataFrame(y_predicted, columns=['y_predicted'])

# Combine X_test, y_test, and predictions
final_df = pd.concat([X_test.reset_index(drop=True),
                      y_test_df,
                      y_pred_df],
                     axis=1)
final_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species,y_predicted
0,7.4,2.8,6.1,1.9,2,2
1,6.3,2.5,5.0,1.9,2,2
2,5.2,3.4,1.4,0.2,0,0
3,5.7,2.9,4.2,1.3,1,1
4,6.4,3.2,5.3,2.3,2,2


In [18]:
test = [[6.9, 2.0, 3.1, 1.8]]
# Convert test data to DataFrame with feature names
test_df = pd.DataFrame(test, columns=iris.feature_names)
predicted_species_encoded = model.predict(test_df)
predicted_species_name = iris.target_names[predicted_species_encoded][0]
print(predicted_species_name)

versicolor


In [19]:
correct_count = (y_test_df.iloc[:, 0] == y_pred_df['y_predicted']).sum()
incorrect_count = (y_test_df.iloc[:, 0] != y_pred_df['y_predicted']).sum()

print("Correct predictions:", correct_count)
print("Incorrect predictions:", incorrect_count)

Correct predictions: 30
Incorrect predictions: 0


In [20]:
# 1. Accuracy
acc = accuracy_score(y_test, y_predicted)
print(f"1. Accuracy: {acc:.4f}\n")

# 2. Confusion Matrix
cm = confusion_matrix(y_test, y_predicted)
print("2. Confusion Matrix:")
print(cm, "\n")

# 3. Classification Report
report = classification_report(y_test, y_predicted)
print("3. Classification Report:")
print(report)

1. Accuracy: 1.0000

2. Confusion Matrix:
[[12  0  0]
 [ 0 10  0]
 [ 0  0  8]] 

3. Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00        10
           2       1.00      1.00      1.00         8

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

