# **Title of Project**

Handwritten Digit Classification

-------------

## **Objective**

To classify handwritten digits (0-9) using a machine learning model.

## **Data Source**

The data source is the load_digits dataset from sklearn.datasets.

## **Import Library**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report


## **Import Data**

In [None]:
df = load_digits()


## **Describe Data**

In [None]:
Shape of images: (1797, 8, 8)
Shape of data: (1797, 64)
Target labels: [0 1 2 3 4 5 6 7 8 9]

In [None]:
# The dataset contains 1797 samples, each with an 8x8 image of a digit
print("Shape of images:", df.images.shape)
print("Shape of data:", df.data.shape)
print("Target labels:", np.unique(df.target))


## **Data Visualization**

In [None]:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, df.images, df.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    ax.set_title('Training: %i' % label)
plt.show()


## **Data Preprocessing**

In [None]:
# Reshape the data
n_samples = len(df.images)
data = df.images.reshape((n_samples, -1))

# Normalize the data
data = data / 16


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
X = data
y = df.target


## **Train Test Split**

In [None]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("Training set shape:", x_train.shape, y_train.shape)
print("Testing set shape:", x_test.shape, y_test.shape)


In [None]:
Training set shape: (1257, 64), (1257,)
Testing set shape: (540, 64), (540,)

## **Modeling**

In [None]:
rf = RandomForestClassifier()
rf.fit(x_train, y_train)


## **Model Evaluation**

In [None]:
y_pred = rf.predict(x_test)
print(classification_report(y_test, y_pred))


## **Prediction**

In [None]:
# Predicting on the test set
predictions = rf.predict(x_test)
print(predictions)


## **Explaination**

The RandomForestClassifier was trained on the handwritten digit dataset. After training, the model's performance was evaluated using the test set. The classification report provides metrics such as precision, recall, and F1-score for each digit. The predictions show the model's ability to classify new, unseen handwritten digits accurately.