## Logistic Regression Workshop

1) Load iris datasets from iris-data-clean.csv
   Replace the values in the columns 'Class' as follows:
     "Setosa" = 0
     "Virginica" = 1
     "Versicolor" = 2
2) Using Logistic Regression, classify the outcome (Column : 'Class') based on the labels (Columns :'sepal length /cm', 'sepal width /cm', 'petal length /cm', 'petal width /cm')
    a) Provide some values to predict the outcome
    b) Validate the model - print the confusion matrix and the accuracy score
3) Redo the above steps with any two **features
    a) Compare the accuracy score with the model built in the above with four features

In [32]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
df = pd.read_csv('iris-data-clean.csv')
print(df.head())

   sepal_length_cm  sepal_width_cm  petal_length_cm  petal_width_cm   class
0              5.1             3.5              1.4             0.2  Setosa
1              4.9             3.0              1.4             0.2  Setosa
2              4.7             3.2              1.3             0.2  Setosa
3              4.6             3.1              1.5             0.2  Setosa
4              5.0             3.6              1.4             0.2  Setosa


In [33]:
class_mapping = {
    "Setosa": 0,
    "Virginica": 1,
    "Versicolor": 2
}
df['class'] = df['class'].replace(class_mapping)

print("Replace class (0=Setosa, 1=Virginica, 2=Versicolor):")
print(df['class'].value_counts())
print("\n")

Replace class (0=Setosa, 1=Virginica, 2=Versicolor):
class
2    50
1    50
0    45
Name: count, dtype: int64




  df['class'] = df['class'].replace(class_mapping)


In [34]:
all_features = ['sepal_length_cm', 'sepal_width_cm', 'petal_length_cm', 'petal_width_cm']
y = df['class']
X_all = df[all_features]

X_train_all, X_test_all, y_train, y_test = train_test_split(X_all, y, test_size=0.2, random_state=42)

In [35]:
# Prepare
X_train_4 = X_train_all
X_test_4 = X_test_all

# Train model
model_4 = LogisticRegression(max_iter=500, random_state=42)
model_4.fit(X_train_4, y_train)

# 2a) Provide input and predict
# Provide input (Setosa-like, Virginica-like, Versicolor-like)
sample_data = pd.DataFrame({
    'sepal_length_cm': [5.0, 6.5, 5.8],
    'sepal_width_cm': [3.5, 3.0, 2.7],
    'petal_length_cm': [1.4, 5.5, 4.1],
    'petal_width_cm': [0.2, 2.0, 1.2]
})

sample_prediction_4 = model_4.predict(sample_data)

print("\n2a) Sample Prediction:")
print(f"Setosa-like: {sample_prediction_4[0]}")
print(f"Virginica-like: {sample_prediction_4[1]}")
print(f"Versicolor-like: {sample_prediction_4[2]}")

# 2b) Varify model
y_pred_4 = model_4.predict(X_test_4)
conf_matrix_4 = confusion_matrix(y_test, y_pred_4)
accuracy_4 = accuracy_score(y_test, y_pred_4)

print("\n2b) Model varification:")
print(f"Confusion Matrix:\n{conf_matrix_4}")
print(f"Accuracy Score: {accuracy_4:.4f}")


2a) Sample Prediction:
Setosa-like: 0
Virginica-like: 1
Versicolor-like: 2

2b) Model varification:
Confusion Matrix:
[[ 8  0  0]
 [ 0 10  1]
 [ 0  1  9]]
Accuracy Score: 0.9310


In [36]:
print("\n" + "="*50)
print("--- Two feature model regression ('petal length /cm' & 'petal width /cm') ---")

# Prepare two feature model
features_2 = ['petal_length_cm', 'petal_width_cm']
X_train_2 = X_train_all[features_2]
X_test_2 = X_test_all[features_2]

# Train model
model_2 = LogisticRegression(max_iter=500, random_state=42)
model_2.fit(X_train_2, y_train)

# 3a) Predict result
sample_data_2 = sample_data[features_2]
sample_prediction_2 = model_2.predict(sample_data_2)

print("\n3a)Sample Prediction:")
print(f"Setosa-like: {sample_prediction_2[0]}")
print(f"Virginica-like: {sample_prediction_2[1]}")
print(f"Versicolor-like: {sample_prediction_2[2]}")


# 3b) Varify model
y_pred_2 = model_2.predict(X_test_2)
conf_matrix_2 = confusion_matrix(y_test, y_pred_2)
accuracy_2 = accuracy_score(y_test, y_pred_2)

print("\n3b) Model varification:")
print(f"Confusion Matrix:\n{conf_matrix_2}")
print(f"Accuracy Score: {accuracy_2:.4f}")


# 3a) Compare accuracy
print("\n" + "="*50)
print("--- Comparison of accuracy ---")
print(f"Four feature model accuracy: {accuracy_4:.4f}")
print(f"Two feature model accuracy: {accuracy_2:.4f}")

if accuracy_4 > accuracy_2:
    print("\nThe four feature model has higher accuracy")
elif accuracy_4 < accuracy_2:
    print("\nThe two feature model has higher accuracy")
else:
    print("\nSame accuracy")


--- Two feature model regression ('petal length /cm' & 'petal width /cm') ---

3a)Sample Prediction:
Setosa-like: 0
Virginica-like: 1
Versicolor-like: 2

3b) Model varification:
Confusion Matrix:
[[ 8  0  0]
 [ 0 10  1]
 [ 0  1  9]]
Accuracy Score: 0.9310

--- Comparison of accuracy ---
Four feature model accuracy: 0.9310
Two feature model accuracy: 0.9310

Same accuracy
