**Hand Written Digit Prediction - Classification Analysis**

**Objective**

The primary objectives of this machine learning project are to explore and understand the MNIST dataset, preprocess the data for effective model training, construct a neural network with convolutional and dense layers for accurate handwritten digit prediction, optimize the model through experimentation with architectures and hyperparameters, document the code with clear comments for ease of comprehension and modification, showcase the project on GitHub for accessibility and sharing, design it to be versatile for adaptation to other classification tasks, and ultimately, foster a learning environment that encourages exploration and expansion of knowledge in the realm of machine learning.

In [None]:
# import libraries
import pandas as pd

In [None]:
import numpy as np


In [None]:
import matplotlib.pyplot as plt


In [None]:
# import data
from sklearn.datasets import load_digits

In [None]:
df = load_digits()

In [None]:
_, axes = plt.subplots(nrowa=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, df.images, df.target):
  ax.set_axis_off()
  ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
  ax.set_title("Training: %i" % label)

In [None]:
df.images.shape

(1797, 8, 8)

In [None]:
df.images[0]

array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.],
       [ 0.,  0., 13., 15., 10., 15.,  5.,  0.],
       [ 0.,  3., 15.,  2.,  0., 11.,  8.,  0.],
       [ 0.,  4., 12.,  0.,  0.,  8.,  8.,  0.],
       [ 0.,  5.,  8.,  0.,  0.,  9.,  8.,  0.],
       [ 0.,  4., 11.,  0.,  1., 12.,  7.,  0.],
       [ 0.,  2., 14.,  5., 10., 12.,  0.,  0.],
       [ 0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

In [None]:
df.images[0].shape

(8, 8)

In [None]:
len(df.images)

1797

In [None]:
n_samples = len(df.images)
data = df.images.reshape(n_samples, -1)

In [None]:
data[0]

array([ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.,  0.,  0., 13., 15., 10.,
       15.,  5.,  0.,  0.,  3., 15.,  2.,  0., 11.,  8.,  0.,  0.,  4.,
       12.,  0.,  0.,  8.,  8.,  0.,  0.,  5.,  8.,  0.,  0.,  9.,  8.,
        0.,  0.,  4., 11.,  0.,  1., 12.,  7.,  0.,  0.,  2., 14.,  5.,
       10., 12.,  0.,  0.,  0.,  0.,  6., 13., 10.,  0.,  0.,  0.])

In [None]:
data[0].shape

(64,)

In [None]:
data.shape

(1797, 64)

In [None]:
data.min()

0.0

In [None]:
data.max()

16.0

In [None]:
data = data/16

In [None]:
data.min()

0.0

In [None]:
data.max()

0.0625

In [None]:
data[0]

array([0.        , 0.        , 0.01953125, 0.05078125, 0.03515625,
       0.00390625, 0.        , 0.        , 0.        , 0.        ,
       0.05078125, 0.05859375, 0.0390625 , 0.05859375, 0.01953125,
       0.        , 0.        , 0.01171875, 0.05859375, 0.0078125 ,
       0.        , 0.04296875, 0.03125   , 0.        , 0.        ,
       0.015625  , 0.046875  , 0.        , 0.        , 0.03125   ,
       0.03125   , 0.        , 0.        , 0.01953125, 0.03125   ,
       0.        , 0.        , 0.03515625, 0.03125   , 0.        ,
       0.        , 0.015625  , 0.04296875, 0.        , 0.00390625,
       0.046875  , 0.02734375, 0.        , 0.        , 0.0078125 ,
       0.0546875 , 0.01953125, 0.0390625 , 0.046875  , 0.        ,
       0.        , 0.        , 0.        , 0.0234375 , 0.05078125,
       0.0390625 , 0.        , 0.        , 0.        ])

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train, x_test, y_train, y_test= train_test_split(data,df.target, test_size=0.3)

In [None]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((1257, 64), (540, 64), (1257,), (540,))

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
rf = RandomForestClassifier()

In [None]:
rf.fit(x_train, y_train)

In [None]:
y_pred = rf.predict(x_test)

In [None]:
y_pred

array([7, 1, 1, 5, 8, 3, 3, 2, 2, 1, 7, 1, 6, 4, 6, 0, 4, 3, 7, 3, 0, 4,
       3, 0, 8, 7, 8, 2, 7, 2, 6, 4, 9, 7, 9, 7, 8, 8, 3, 2, 8, 1, 2, 6,
       4, 9, 6, 0, 9, 8, 2, 7, 6, 6, 1, 1, 1, 3, 1, 7, 0, 0, 2, 4, 2, 5,
       5, 9, 6, 8, 6, 9, 5, 8, 7, 7, 4, 5, 5, 1, 2, 8, 6, 3, 4, 7, 4, 0,
       2, 6, 0, 4, 8, 1, 4, 1, 6, 7, 4, 0, 1, 6, 5, 6, 3, 8, 4, 5, 1, 4,
       6, 6, 1, 4, 2, 3, 9, 0, 6, 3, 7, 5, 3, 3, 7, 8, 7, 4, 0, 4, 4, 4,
       1, 6, 7, 6, 3, 9, 1, 8, 9, 0, 0, 7, 5, 5, 4, 9, 7, 3, 8, 0, 2, 8,
       5, 9, 5, 6, 8, 5, 6, 1, 4, 7, 5, 5, 4, 8, 0, 4, 6, 9, 1, 2, 8, 8,
       6, 8, 5, 5, 2, 1, 5, 0, 5, 3, 8, 2, 6, 2, 9, 8, 7, 0, 0, 8, 3, 7,
       7, 5, 9, 9, 1, 5, 6, 9, 3, 3, 4, 2, 3, 5, 7, 8, 8, 7, 4, 2, 6, 5,
       9, 8, 1, 4, 3, 3, 9, 3, 1, 3, 0, 1, 6, 7, 3, 6, 0, 0, 7, 2, 9, 3,
       9, 1, 2, 0, 4, 2, 4, 7, 5, 7, 7, 8, 9, 3, 2, 1, 1, 0, 6, 5, 6, 3,
       8, 0, 3, 6, 4, 6, 2, 5, 9, 8, 0, 0, 7, 4, 1, 5, 4, 2, 9, 6, 3, 5,
       8, 6, 7, 2, 7, 9, 6, 7, 8, 7, 0, 3, 4, 4, 7,

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
confusion_matrix(y_test, y_pred)

array([[42,  0,  0,  0,  1,  0,  0,  0,  0,  0],
       [ 0, 54,  0,  0,  0,  1,  0,  0,  0,  0],
       [ 1,  0, 59,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 63,  0,  1,  0,  0,  2,  1],
       [ 0,  1,  0,  0, 49,  0,  0,  1,  0,  0],
       [ 0,  0,  0,  0,  1, 50,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  1,  0, 56,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0, 54,  0,  0],
       [ 0,  0,  0,  0,  1,  0,  0,  0, 45,  0],
       [ 0,  0,  0,  1,  0,  0,  0,  0,  0, 55]])

In [None]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.98      0.98      0.98        43
           1       0.98      0.98      0.98        55
           2       1.00      0.98      0.99        60
           3       0.98      0.94      0.96        67
           4       0.92      0.96      0.94        51
           5       0.96      0.98      0.97        51
           6       1.00      0.98      0.99        57
           7       0.98      1.00      0.99        54
           8       0.96      0.98      0.97        46
           9       0.98      0.98      0.98        56

    accuracy                           0.98       540
   macro avg       0.98      0.98      0.98       540
weighted avg       0.98      0.98      0.98       540



**Explanation**

This repository showcases a machine learning project for handwritten digit prediction, utilizing the MNIST dataset. The project encompasses key stages such as data preprocessing, model building with TensorFlow/Keras, training, evaluation, and prediction. The Python script is well-commented for clarity and ease of modification. Explore this repository to understand the fundamental steps involved in a classification analysis project. Feel free to adapt the code for other classification tasks, experiment with different datasets, and further your understanding of machine learning concepts.

