Lab 6 - Logistic Regression

1. Using SciKit-Learn, train a binary logistic regression model on the Iris dataset. Use all four features and define only 2 labels:        virginica and non-virginica. See the logistic regression notebook presented in class for a demonstration on how to set up these labels.

2. Evaluate the model:
    i. Failure modes: in which data instances is the model wrong? 
    ii. Are there any shared properties for these cases?
    iii. How is the model doing across a set of evaluation metrics: accuracy and confusion metric.


In [2]:
from sklearn.datasets import load_iris

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split

In [None]:
iris = load_iris()
# Separate the feature matrix (X) and target vector (y)
X, y = iris.data, iris.target

# Convert the multi-class labels to binary labels (1 for Virginica, 0 for others)
y_bin = (y == 2).astype(int)

# Split the dataset into training (80%) and testing (20%) sets with a random state for reproducibility
X_train, X_test, y_train, y_test = train_test_split(X, y_bin, test_size=0.2, random_state=42)

# Create a Logistic Regression model object
log_reg = LogisticRegression(random_state=42)
# Train the Logistic Regression model on the training data
log_reg.fit(X_train, y_train)

In [4]:
#Evaluate the model:
y_pred = log_reg.predict(X_test)

# Identify the indices of misclassified instances
misclassified_idx = (y_pred != y_test)
X_misclassified = X_test[misclassified_idx]
y_misclassified_true = y_test[misclassified_idx]
y_misclassified_pred = y_pred[misclassified_idx]

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print(f'Confusion Matrix:\n{conf_matrix}')

# Other metrics
print(classification_report(y_test, y_pred, target_names=["non-virginica", "virginica"]))

Accuracy: 100.00%
Confusion Matrix:
[[19  0]
 [ 0 11]]
               precision    recall  f1-score   support

non-virginica       1.00      1.00      1.00        19
    virginica       1.00      1.00      1.00        11

     accuracy                           1.00        30
    macro avg       1.00      1.00      1.00        30
 weighted avg       1.00      1.00      1.00        30



 There are no failure modes observed as the model has correctly classified all instances in the testing data.

 Since there are no incorrect classifications, there are no shared properties to analyze for failure cases.


 In the above confusion matrices, the values on the diagonal (representing correct classifications) are all non-zero, while the off-diagonal values (representing incorrect classifications) are all zero. This indicates a perfect classification performance by the model.
 
Accuracy is 100%, which is the highest possible value, suggesting an excellent performance by the model.

Since the model has achieved perfect classification, all of the metrics(Precision, Recall, f1-score) also yield the highest possible value of 1.00.