🎯 Objective:
Learn how to explore built-in datasets in Scikit-learn.

Understand and compare classification models.

Practice model evaluation with accuracy, confusion matrix, and classification report.

📦 Step 1: Choose a Dataset
Scikit-learn has several toy datasets. Choose one from the list below:


load_wine

load_digits

In [1]:
# your code to load the dataset
# Load the dataset
from sklearn.datasets import load_digits

# Load the digits dataset
digits = load_digits()

# Features and target
X, y = digits.data, digits.target

print("Dataset loaded successfully.")

Dataset loaded successfully.


🔍 Step 2: Explore the Dataset
Answer these:

1. How many samples and features are there?

2. What are the feature names?

3. What are the target classes?

4. What are the dimensions of the dataset

5. is Scaling/Normalisation needed for our dataset and what is the difference 


In [6]:
# your code to explore the dataset
# Explore the dataset
print(f"Number of samples: {X.shape[0]}")
print(f"Number of features: {X.shape[1]}")
print(f"Feature names: {digits.feature_names if hasattr(digits, 'feature_names') else 'Not available'}")
print(f"Target classes: {digits.target_names}")
print(f"Dataset dimensions: {X.shape}")

# Check if scaling/normalization is needed
print(f"Feature range (min, max): ({X.min()}, {X.max()})")
#"Scaling is recommended for models sensitive to feature magnitudes (e.g., SVM)."

Number of samples: 1797
Number of features: 64
Feature names: ['pixel_0_0', 'pixel_0_1', 'pixel_0_2', 'pixel_0_3', 'pixel_0_4', 'pixel_0_5', 'pixel_0_6', 'pixel_0_7', 'pixel_1_0', 'pixel_1_1', 'pixel_1_2', 'pixel_1_3', 'pixel_1_4', 'pixel_1_5', 'pixel_1_6', 'pixel_1_7', 'pixel_2_0', 'pixel_2_1', 'pixel_2_2', 'pixel_2_3', 'pixel_2_4', 'pixel_2_5', 'pixel_2_6', 'pixel_2_7', 'pixel_3_0', 'pixel_3_1', 'pixel_3_2', 'pixel_3_3', 'pixel_3_4', 'pixel_3_5', 'pixel_3_6', 'pixel_3_7', 'pixel_4_0', 'pixel_4_1', 'pixel_4_2', 'pixel_4_3', 'pixel_4_4', 'pixel_4_5', 'pixel_4_6', 'pixel_4_7', 'pixel_5_0', 'pixel_5_1', 'pixel_5_2', 'pixel_5_3', 'pixel_5_4', 'pixel_5_5', 'pixel_5_6', 'pixel_5_7', 'pixel_6_0', 'pixel_6_1', 'pixel_6_2', 'pixel_6_3', 'pixel_6_4', 'pixel_6_5', 'pixel_6_6', 'pixel_6_7', 'pixel_7_0', 'pixel_7_1', 'pixel_7_2', 'pixel_7_3', 'pixel_7_4', 'pixel_7_5', 'pixel_7_6', 'pixel_7_7']
Target classes: [0 1 2 3 4 5 6 7 8 9]
Dataset dimensions: (1797, 64)
Feature range (min, max): (0.0, 16.0

Step 3: Based on the insights you gained from the previous step perform any necessary preprocessing if needed

In [3]:
# Your preprocessing steps
# Preprocessing: Scale the features
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Features scaled successfully.")

Features scaled successfully.


Step 4: Build and Train a Model
Use a classification model (your choice — try more than one):

Steps:

Split the data using train_test_split()

Train your model

Predict on the test set

In [4]:
# explore and experiment with different models!
# provide a brief description of each model
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

# Split the data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Initialize models
log_reg = LogisticRegression(max_iter=1000)
decision_tree = DecisionTreeClassifier()
random_forest = RandomForestClassifier()
svc = SVC()

# Train models
log_reg.fit(X_train, y_train)
decision_tree.fit(X_train, y_train)
random_forest.fit(X_train, y_train)
svc.fit(X_train, y_train)

print("Models trained successfully.")


Models trained successfully.


📊 Step 4: Evaluate the Model
Use the following metrics to evaluate your classifier:

accuracy_score

confusion_matrix

classification_report



In [5]:
# your code to evaluate the models
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Evaluate Logistic Regression
log_reg_preds = log_reg.predict(X_test)
print("Logistic Regression:")
print(f"Accuracy: {accuracy_score(y_test, log_reg_preds)}")
print("Confusion Matrix:")
print(confusion_matrix(y_test, log_reg_preds))
print("Classification Report:")
print(classification_report(y_test, log_reg_preds))

# Evaluate Decision Tree
decision_tree_preds = decision_tree.predict(X_test)
print("\nDecision Tree:")
print(f"Accuracy: {accuracy_score(y_test, decision_tree_preds)}")
print("Confusion Matrix:")
print(confusion_matrix(y_test, decision_tree_preds))
print("Classification Report:")
print(classification_report(y_test, decision_tree_preds))

# Evaluate Random Forest
random_forest_preds = random_forest.predict(X_test)
print("\nRandom Forest:")
print(f"Accuracy: {accuracy_score(y_test, random_forest_preds)}")
print("Confusion Matrix:")
print(confusion_matrix(y_test, random_forest_preds))
print("Classification Report:")
print(classification_report(y_test, random_forest_preds))

# Evaluate SVC
svc_preds = svc.predict(X_test)
print("\nSupport Vector Classifier:")
print(f"Accuracy: {accuracy_score(y_test, svc_preds)}")
print("Confusion Matrix:")
print(confusion_matrix(y_test, svc_preds))
print("Classification Report:")
print(classification_report(y_test, svc_preds))

Logistic Regression:
Accuracy: 0.9722222222222222
Confusion Matrix:
[[33  0  0  0  0  0  0  0  0  0]
 [ 0 28  0  0  0  0  0  0  0  0]
 [ 0  0 33  0  0  0  0  0  0  0]
 [ 0  0  0 33  0  1  0  0  0  0]
 [ 0  1  0  0 45  0  0  0  0  0]
 [ 0  0  0  0  0 44  1  0  0  2]
 [ 0  0  0  0  0  1 34  0  0  0]
 [ 0  0  0  0  0  0  0 33  0  1]
 [ 0  0  0  0  0  1  0  0 29  0]
 [ 0  0  0  1  0  0  0  0  1 38]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        33
           1       0.97      1.00      0.98        28
           2       1.00      1.00      1.00        33
           3       0.97      0.97      0.97        34
           4       1.00      0.98      0.99        46
           5       0.94      0.94      0.94        47
           6       0.97      0.97      0.97        35
           7       1.00      0.97      0.99        34
           8       0.97      0.97      0.97        30
           9       0.93      0.95    