In [4]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, roc_auc_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier


digits = load_digits()
X, y = digits.data, digits.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

models = {
    "Logistic Regression": LogisticRegression(max_iter=10000),
    "Decision Tree": DecisionTreeClassifier(),
    "Random Forest": RandomForestClassifier(),
    "Support Vector Machine": SVC(probability=True),
    "K-Nearest Neighbors": KNeighborsClassifier()
}

for name, model in models.items():
    model.fit(X_train, y_train)


results = {}
for name, model in models.items():
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    cm = confusion_matrix(y_test, y_pred)
    auc = roc_auc_score(y_test, model.predict_proba(X_test), multi_class='ovr', average='weighted')
    results[name] = {"Accuracy": accuracy, "Precision": precision, "Recall": recall, "F1 Score": f1, "Confusion Matrix": cm, "AUC": auc}

best_model = max(results, key=lambda x: results[x]['Accuracy'])

print("Evaluation Metrics:")
for name, metrics in results.items():
    print(f"Model: {name}")
    print(f"Accuracy: {metrics['Accuracy']:.4f}")
    print(f"Precision: {metrics['Precision']:.4f}")
    print(f"Recall: {metrics['Recall']:.4f}")
    print(f"F1 Score: {metrics['F1 Score']:.4f}")
    print(f"Confusion Matrix:\n{metrics['Confusion Matrix']}")
    print(f"AUC: {metrics['AUC']:.4f}\n")

print(f"The best model based on accuracy is: {best_model}")

Evaluation Metrics:
Model: Logistic Regression
Accuracy: 0.9722
Precision: 0.9726
Recall: 0.9722
F1 Score: 0.9723
Confusion Matrix:
[[33  0  0  0  0  0  0  0  0  0]
 [ 0 28  0  0  0  0  0  0  0  0]
 [ 0  0 33  0  0  0  0  0  0  0]
 [ 0  0  0 33  0  1  0  0  0  0]
 [ 0  1  0  0 45  0  0  0  0  0]
 [ 0  0  1  0  0 44  1  0  0  1]
 [ 0  0  0  0  0  1 34  0  0  0]
 [ 0  0  0  0  0  1  0 33  0  0]
 [ 0  0  0  0  0  1  0  0 29  0]
 [ 0  0  0  1  0  0  0  0  1 38]]
AUC: 0.9989

Model: Decision Tree
Accuracy: 0.8639
Precision: 0.8672
Recall: 0.8639
F1 Score: 0.8638
Confusion Matrix:
[[29  0  0  0  2  1  0  1  0  0]
 [ 1 21  1  0  3  0  0  1  0  1]
 [ 0  0 27  3  1  0  0  1  1  0]
 [ 0  0  0 29  0  1  0  1  2  1]
 [ 0  0  0  0 41  2  2  1  0  0]
 [ 0  0  2  0  1 43  0  0  1  0]
 [ 0  0  0  0  1  0 34  0  0  0]
 [ 0  0  0  3  1  0  0 30  0  0]
 [ 0  2  0  1  0  1  0  0 23  3]
 [ 0  0  0  2  1  0  1  1  1 34]]
AUC: 0.9240

Model: Random Forest
Accuracy: 0.9694
Precision: 0.9705
Recall: 0.9694
F1 

1. Liner Regression

strengths: Fast training and prediction

Weaknesses: Sensitive to outliers

2. Logistic Regression

Strengths: Efficient for binary classification tasks

Weaknesses: Sensitivity to irrelevant features

3. Naive Bayes Classifier

Strengths: Performs well with high-dimensional data

Weaknesses: May not capture complex relationships between features

4. Support Vector Machine (SVM)

Strengths: Effective in high-dimensional spaces

Weaknesses: Computationally expensive for large datasets

5. Decision Trees

Strengths: Can handle both numerical and categorical data

Weaknesses: Prone to overfitting

6. Random Forests

Strengths: Combines multiple decision trees to reduce overfitting 

Weaknesses: May not perform well with highly imbalanced datasets


1. Precision:

Precision measures the proportion of true positive predictions among all positive predictions made by the model.

Formula: Precision = TP / (TP + FP)

Importance: Precision is essential when the cost of false positives is high. It indicates the model's ability to avoid false alarms and make accurate positive predictions.

2. Recall (Sensitivity):

Recall measures the proportion of true positive predictions among all actual positive instances in the dataset.

Formula: Recall = TP / (TP + FN)

Importance: Recall is crucial when the cost of false negatives is high. It ensures that the model can identify all positive instances and avoid missing important cases.

3. Accuracy:

Accuracy measures the proportion of correct predictions (both true positives and true negatives) among all predictions made by the model.

Formula: Accuracy = (TP + TN) / (TP + TN + FP + FN)

Importance: Accuracy provides an overall assessment of the model's correctness. However, it may not be suitable for imbalanced datasets where the classes are unevenly distributed.

4. F1 Score:

F1 score is the harmonic mean of precision and recall, providing a balanced measure of a model's performance.

Formula: F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

Importance: F1 score balances both precision and recall, making it useful when there is an uneven class distribution or when both false positives and false negatives are equally important.

5. Confusion Matrix:

A confusion matrix is a table that summarizes the performance of a classification model by comparing predicted and actual class labels.
It consists of four metrics: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN).

Importance: Confusion matrix provides insights into the types of errors made by the model and helps in understanding its strengths and weaknesses.

6. Area Under the ROC Curve (AUC):

AUC measures the ability of a model to distinguish between positive and negative instances.
The ROC curve is a graphical representation of the trade-off between true positive rate (TPR) and false positive rate (FPR) at various threshold settings.

Importance: AUC provides a single scalar value to compare the performance of different models. It is robust to class imbalance and threshold selection.
