## Logistic Regression Workshop

1) Load iris datasets from iris-data-clean.csv
   Replace the values in the columns 'Class' as follows:
     "Setosa" = 0
     "Virginica" = 1
     "Versicolor" = 2
2) Using Logistic Regression, classify the outcome (Column : 'Class') based on the labels (Columns :'sepal length /cm', 'sepal width /cm', 'petal length /cm', 'petal width /cm')
    a) Provide some values to predict the outcome
    b) Validate the model - print the confusion matrix and the accuracy score
3) Redo the above steps with any two **features
    a) Compare the accuracy score with the model built in the above with four features

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('iris-data-clean.csv')
df.head() 

Unnamed: 0,sepal_length_cm,sepal_width_cm,petal_length_cm,petal_width_cm,class
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


In [4]:
df['class'] = df['class'].replace({
    'Setosa':0,
    'Versicolor':1,
    'Virginica':2
})

df.head()

Unnamed: 0,sepal_length_cm,sepal_width_cm,petal_length_cm,petal_width_cm,class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [20]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score

x = df.drop(columns=['class'])
y = df['class']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(x_train, y_train)

sample = x_test.iloc[0]
pred = model.predict([sample])
print("Sample features:\n", sample)
print("Predicted class for the sample:", pred[0])

print()

y_pred = model.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
print("Confusion Matrix:\n", cm)
print("Accuracy:", acc)

Sample features:
 sepal_length_cm    6.4
sepal_width_cm     2.9
petal_length_cm    4.3
petal_width_cm     1.3
Name: 69, dtype: float64
Predicted class for the sample: 1

Confusion Matrix:
 [[ 8  0  0]
 [ 0  9  1]
 [ 0  1 10]]
Accuracy: 0.9310344827586207




In [19]:
# Redo with any 2 features
x_2features = df[['sepal_length_cm', 'sepal_width_cm']]
y = df['class']
x_train, x_test, y_train, y_test = train_test_split(x_2features, y, test_size=0.2, random_state=42)
model_2features = LogisticRegression()
model_2features.fit(x_train, y_train)

sample_2features = x_test.iloc[0]
pred_2features = model_2features.predict([sample_2features])
print("Sample features:\n", sample_2features)
print("Predicted class for the sample:", pred_2features[0])

print()

y_pred_2features = model_2features.predict(x_test)
cm_2features = confusion_matrix(y_test, y_pred_2features)
acc_2features = accuracy_score(y_test, y_pred_2features)
print("Confusion Matrix (2 features):\n", cm_2features)
print("Accuracy (2 features):", acc_2features)

Sample features:
 sepal_length_cm    6.4
sepal_width_cm     2.9
Name: 69, dtype: float64
Predicted class for the sample: 2

Confusion Matrix (2 features):
 [[7 1 0]
 [0 7 3]
 [0 3 8]]
Accuracy (2 features): 0.7586206896551724


