# Classifier Comparison on Digits Dataset

In this notebook, we compare several basic classifiers on the Digits dataset from `scikit-learn`. This task helps to understand how different models behave and which ones perform better.

## Import Libraries

In [1]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

## Load Dataset

In [3]:
digits = load_digits()
X = digits.data
y = digits.target

print("Dataset shape:", X.shape)

Dataset shape: (1797, 64)


## Split the Data (80% Train / 20% Test)

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

## Scale the Data

In [7]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Create Classifiers

In [9]:
classifiers = {
    "Decision Tree": DecisionTreeClassifier(random_state=42),
    "K-Nearest Neighbors": KNeighborsClassifier(),
    "Logistic Regression": LogisticRegression(max_iter=1000, random_state=42)
}

## Train and Evaluate Models

In [11]:
results = {}

for name, model in classifiers.items():
    
    model.fit(X_train_scaled, y_train)
    
    train_pred = model.predict(X_train_scaled)
    test_pred = model.predict(X_test_scaled)
    
    train_acc = accuracy_score(y_train, train_pred)
    test_acc = accuracy_score(y_test, test_pred)
    
    results[name] = {"Train Accuracy": train_acc, "Test Accuracy": test_acc}
    
    print(f"{name} -> Train Accuracy: {train_acc:.4f} | Test Accuracy: {test_acc:.4f}")

Decision Tree -> Train Accuracy: 1.0000 | Test Accuracy: 0.8417
K-Nearest Neighbors -> Train Accuracy: 0.9868 | Test Accuracy: 0.9750
Logistic Regression -> Train Accuracy: 0.9986 | Test Accuracy: 0.9722


## Summary of Results

In [13]:
import pandas as pd

results_df = pd.DataFrame(results).T
print(results_df)

                     Train Accuracy  Test Accuracy
Decision Tree              1.000000       0.841667
K-Nearest Neighbors        0.986778       0.975000
Logistic Regression        0.998608       0.972222


## Conclusions

- **Decision Tree:** Achieved perfect accuracy on the training set, but lower accuracy on the test set. This indicates overfitting, as decision trees easily memorize the training data.
- **K-Nearest Neighbors (KNN):** Performed well on both training and test sets, suggesting good generalization. KNN benefits from proper scaling of features.
- **Logistic Regression:** Also performed very well and showed stable results, both on training and test sets. Logistic regression is a simple yet powerful classifier for many problems.

**Overall:** KNN and Logistic Regression showed better generalization than Decision Tree. Decision Trees require careful tuning (e.g., limiting depth) to avoid overfitting.

---