In [1]:
# Importing necessary libraries
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Loading the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Defining labels: 0 for non-virginica, 1 for virginica
y_binary = (y == 2).astype(int)
#In this above code, original labels (0,1,2) are converted to binary format.
#it checls if the label is equal to 2 because this label corresponds to virginica in this dataset and it converts it to 1.


# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2, random_state=42)

# Initializing the Logistic Regression model
model = LogisticRegression()

# Training the model
model.fit(X_train, y_train)

# Evaluating the model
y_pred = model.predict(X_test)



In [5]:
#Failure modes: in which data instances is the model wrong? (1 point)


#Identifying instances in the test set that were misclassfied by the logistic regression model.
misclassified_indices = (y_pred != y_test)
misclassified_instances = X_test[misclassified_indices]

print(misclassified_instances)



[]


In this case, the model did not make any mistakes on the test set. There are no instances that were classified incorrectly (i.e., all predictions matched the true labels). Therefore, there are no "failure modes" in this specific evaluation.

#Are there any shared properties for these cases? (1 point)

In this case, we can say that there are no misclassified instances, and therefore, there are no shared properties to analyze among misclassified cases. The model performed perfectly on this particular test set.


In [2]:
# How is the model doing across a set of evaluation metrics: accuracy and confusion metric. (1 point)

# Calculating Evaluation metrics like accuracy and confusion matrix.

accuracy = accuracy_score(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{confusion_mat}")
print(f"Classification Report:\n{report}")


Accuracy: 1.0
Confusion Matrix:
[[19  0]
 [ 0 11]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



From the above measurements, it can be said that Accuracy is 1.0 which implies that the model correctly classified all of the instances in the test set.

If we have to saye more about confusion matrix then it can be said that the first row represents the instances with true label 0 which is non-verginica. 0 instances were incorrectly classified as verginica (false positive). 19 instances were correctly explained as non-verginica (true negatives).

The second row represents the instances with true label 1 (virginica). 11 instances were correctly classified as virginica (true positives). 0 instances were incorrectly classified as non-verginica (false negatives).

We can also calculate precision, recall, f1-score and support as well as accuracy, macro average and weighted avergage using classification report function. 

Precision: It is the ratio of true positives to the sum of true positives and false positives. It measures the accuracy of the positive predictions. In this case, it's 1.0 for both classes, meaning all positive predictions were correct.

Recall: It is the ratio of true positives to the sum of true positives and false negatives. It measures the ability of the model to correctly identify all instances of the positive class. Again, it's 1.0 for both classes which are non-verginica and verginica.

F1-score: It is the harmonic mean of precision and recall. It provides a balance between precision and recall. A high F1-score indicates good performance.

Support: It is the number of occurrences of each class in the test set.

The 'accuracy', 'macro avg', and 'weighted avg' are aggregate measures.