# Handwritten Digit Recognition with Neural Networks

This notebook demonstrates how to build and train a neural network for digit recognition using Tensorflow, Keras, Pandas and Numpy.

## Setup Instructions

To set up the environment, please follow these steps:

0. open terminal and cd your way to the folder containing the jupiter notebook.

1. create a virtual environment using Python's built-in venv module:

    *python -m venv venv*

2. Activate the virtual environment:

    for windows:

    *venv\Scripts\activate*

    for macOS/Linux:
    
    *source venv/bin/activate*

3. Install dependencies

    write:

    *pip install -r requirements.txt*

    The requirements.txt file includes all the necessary dependencies for the project,

If you encounter any issues with the setup, try updating pip and the installed packages:

*pip install --upgrade pip*

*pip install --upgrade -r requirements.txt*

If the save_canvas_image function does not work and you can't draw on the canvas using your right click, try running the notebook on a browser or a different kernel.
You might also skip the canvas image function and draw a handwritten digit using editing software like paint. 
Save the digit image in the same folder as your notebook and specify the digit's path. 



In [None]:
# Import necessary libraries
import numpy as np  # For numerical computations and handling arrays
import pandas as pd  # For data manipulation (optional, depending on use)
import matplotlib.pyplot as plt  # For plotting and visualizing data
from tensorflow.keras.datasets import mnist  # To load the MNIST dataset
from tensorflow.keras.utils import to_categorical  # For one-hot encoding of labels
from tensorflow.keras.models import Sequential  # To build a sequential model
from tensorflow.keras.layers import Dense  # To add dense layers to the model
from tensorflow.keras.optimizers import Adam  # For the optimizer used in training
from PIL import Image  # To handle images (useful for user input)
import cv2  # For image processing tasks
from ipycanvas import Canvas # For drawing on a canvas
from IPython.display import display # For displaying the canvas
from ipywidgets import IntSlider 
from PIL import Image, ImageFilter # For image processing tasks
from PIL import Image, ImageOps
import matplotlib.pyplot as plt
print("Environment setup successfully!")



In [None]:
# Load MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Display the shapes of the loaded data
print(f'Training data shape: {x_train.shape}, Training labels shape: {y_train.shape}')
print(f'Testing data shape: {x_test.shape}, Testing labels shape: {y_test.shape}')

# Display a few samples of the data
plt.figure(figsize=(10, 5)) # create a new figure for plotting with a specified size of 10x5 inches.
for i in range(10):
    plt.subplot(2, 5, i + 1) # create a 2x5 grid of subplots and selects the (i+1)th subplot.
    plt.imshow(x_train[i], cmap='gray')  # Display the image in grayscale
    plt.title(f'Label: {y_train[i]}') # Set the title of the subplot to the corresponding label
    plt.axis('off') # Turn off the axes for the subplot for clean display
plt.show()


In the context of the MNIST dataset, the training data shape refers to the dimensions of the **x_train array**, which contains the training images. The MNIST dataset consists of 28x28 pixel grayscale images of handwritten digits (0-9).

When you load the MNIST data using **mnist.load_data()**, the x_train array typically has the shape **(60000, 28, 28)**, meaning:

60000: The number of training images.
28: The height of each image in pixels.
28: The width of each image in pixels.
So, x_train.shape would output (60000, 28, 28), indicating that there are 60,000 training images, each of size 28x28 pixels.

Similarly, **y_train.shape** would typically be (60000,), indicating that there are 60,000 labels corresponding to the 60,000 training images. Each label is a single integer representing the digit (0-9) shown in the corresponding image.

**x set (Features/Input Data):**
This set contains the input data that the model will use to learn patterns. In the MNIST dataset, x_train contains the images of handwritten digits.

**y set (Labels/Output Data):**
This set contains the labels or the target values that correspond to the input data. In the MNIST dataset, y_train contains the digit labels (0-9) for each image in x_train.

 The model uses the **x set** to learn and the **y set** to evaluate its performance.


The reason that we have 4 datasets **X_train, Y_train, X_test, Y_test** is to train the model on one set, and then test the model on new unseen data (test set)

In [None]:
# Define the model
model = Sequential() #nitialize a sequential model, which means the layers are stacked in a linear way.

# Input layer with 784 neurons (28*28 pixels flattened)
# First hidden layer with 128 neurons and ReLU activation
model.add(Dense(128, activation='relu', input_shape=(784,)))

# Second hidden layer with 64 neurons and ReLU activation
model.add(Dense(64, activation='relu'))

# Output layer with 10 neurons (one for each digit) and softmax activation
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Display the model's architecture
model.summary()


A Sequential model is a linear stack of layers, where each layer has exactly one input tensor and one output tensor. This model is best suited for building simple neural networks where layers are added one after another.

The Sequential model is an object from the **tensorflow.keras.models module**, part of the **Keras API** integrated within TensorFlow.

**Dense** refers to a densely connected neural network layer, and is a type of layer in a neural network where **each neuron is connected to every neuron in the previous layer**.

Function: The **model.add** method is used to add layers to the model. In a Sequential model, layers are added one after another in a linear stack.
Usage: Each call to model.add appends a new layer to the model.

**Softmax** is specifically designed for multi-class classification problems where only one output class can be correct at a time.

**ReLU** outputs zero for all negative inputs, effectively "turning off" neurons that don’t contribute positively, leading to sparse representations that can improve efficiency.
Hidden layers are responsible for transforming the input data into more abstract and useful representations. ReLU enables the network to learn complex, non-linear patterns by introducing non-linearity, which is crucial for the model's overall performance.

**Compiling** the model sets up the rules for how the model will learn and be evaluated.

**optimizer='adam':** Specifies how the model updates its weights based on the error calculated by the loss function. Adam is efficient, adaptive, and widely used.

**loss='categorical_crossentropy':** Measures how well the model's predicted probabilities align with the true labels, crucial for multi-class classification tasks.

**metrics=['accuracy']:** Provides a way to monitor how well the model is performing during training and evaluation.

In [None]:
# Prepare the labels by converting them to one-hot encoded format
y_train_one_hot = to_categorical(y_train)
y_test_one_hot = to_categorical(y_test)

# Train the model
history = model.fit(
    x_train.reshape(-1, 784),  # Reshape training images to 784 (28x28 flattened)
    y_train_one_hot,           # Use one-hot encoded labels
    epochs=15,                 # Number of times the model will see the entire training set
    batch_size=64,             # Number of samples per gradient update
    validation_split=0.2,      # Use 20% of training data for validation to monitor performance
    verbose=1                  # Print progress during training
)


**to_categorical(y_train)** and **to_categorical(y_test)** are converted into a binary vector with the length equal to the number of classes, 10 numbers. if the label is "3", the vector will look like this: **[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]**

**x_train.reshape(-1, 784)** Reshapes the training images into a 2D array where each image is a single row of 784 pixels. The -1 automatically adjusts the first dimension to match the number of training samples.

**y_train_one_hot** Uses the one-hot encoded labels for training, which match the softmax output format from the model.

**epochs=10** Specifies the number of times the model will iterate over the entire training dataset. More epochs allow the model to learn better, but too many can lead to overfitting.

**batch_size=32** Determines how many training samples are used per gradient update. A smaller batch size allows for quicker updates but can lead to noisier training, while a larger batch size is more stable but slower.

**validation_split=0.2**: Allocates 20% of the training data for validation. This helps you monitor the model's performance on unseen data during training, providing a sense of whether it is overfitting.

**verbose=1**: Controls the verbosity of the training output. A setting of 1 provides detailed logs of training progress, while 0 would suppress this output.

The **model.fit()** function returns a history object, which stores details about the training process, including the loss and accuracy over each epoch for both training and validation data. This can be useful for plotting learning curves and evaluating model performance over time.

In [None]:
# Plot training & validation accuracy values
plt.figure(figsize=(10, 5))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc='best')
plt.grid(True)
plt.show()


**history.history['accuracy']** Extracts the training accuracy values for each epoch from the history object.

**history.history['val_accuracy']** Extracts the validation accuracy values for each epoch (the 20% of the dataset that wac allocated by the validation split).



In [None]:
# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(x_test.reshape(-1, 784), y_test_one_hot, verbose=1)

# Print the test accuracy
print(f'Test Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy:.4f}')


**x_test.reshape(-1, 784)** reshapes test images to flattened (required for evaluation)

**verbose=1** Prints the evaluation progress as its set to 1.


In [None]:

# Create a canvas for drawing
canvas = Canvas(width=200, height=200, background_color='black')  # Larger canvas for easier drawing
display(canvas)

# Function to save the drawing as an image
def save_canvas_image(canvas, save_path):
    """
    Saves the current drawing on the canvas to an image file.

    Parameters:
    - canvas: ipycanvas.Canvas, the drawing canvas
    - save_path: str, the file path where the image will be saved
    """
    # Save the current drawing to a PNG file
    with open(save_path, 'wb') as f:
        f.write(canvas.to_filelike_object().getvalue())
    print(f'Image saved to {save_path}')

# Instructions for the user
print("Draw a digit on the canvas. Use the left mouse button to draw.")
print("When finished, run the cell below to save and predict the digit.")




If you cant draw on the white canvas. Try to use open the jupiter notebook on a browser.
You might also skip the following cell and draw a digit in paint with black background and white digit. 
Use the preprocessing and predict user image functions with your paint digit. 

In [None]:
canvas_digit_path = 'path/to/save/digit.png'  # File path to save the drawn digit
save_canvas_image(canvas, canvas_digit_path)


In [None]:


def preprocess_image(image_path):
    """
    Loads and preprocesses an image to match the format used in training the model.
    
    Parameters:
    - image_path: str, path to the image file.
    - is_user_image: bool, whether the image is user-drawn (potentially inverted colors).

    Returns:
    - preprocessed_image: numpy array, formatted for prediction.
    """
    # Load, convert to grayscale, and resize
    img = Image.open(image_path).convert('L')  # 'L' mode is for grayscale
    img = img.resize((28, 28))  # Resize to 28x28 pixels


    # Convert to array without normalizing
    img_array = np.array(img).astype('float32')  # Keep values between 0 and 255
    print("Preprocessed image details:")
    print(f"Shape: {img_array.shape}, Min: {img_array.min()}, Max: {img_array.max()}")

    # Flatten to match input shape (1, 784)
    preprocessed_image = img_array.flatten().reshape(1, 784)

    # Display the processed image
    plt.imshow(img_array, cmap='gray')
    plt.title('Processed Test Image')
    plt.axis('off')
    plt.show()
    
    return preprocessed_image



In [None]:
def predict_user_images(model, preprocessed_image1, preprocessed_image2):
    """
    Predicts the digits from two preprocessed images using the trained model.

    Parameters:
    - model: trained Keras model.
    - preprocessed_image1: numpy array, first processed image ready for prediction (user image).
    - preprocessed_image2: numpy array, second processed image ready for prediction (additional image).

    Returns:
    - predicted_digit1: int, the digit predicted by the model for the first image.
    - predicted_digit2: int, the digit predicted by the model for the second image.
    """
    # Predict the first image
    prediction1 = model.predict(preprocessed_image1)
    predicted_digit1 = np.argmax(prediction1)
    
    # Print results for the first image
    print(f'Prediction Probabilities for the user digit: {prediction1}')
    print(f'Predicted Digit for the user digit: {predicted_digit1}')
    
    # Predict the second image
    prediction2 = model.predict(preprocessed_image2)
    predicted_digit2 = np.argmax(prediction2)
    
    # Print results for the second image
    print(f'Prediction Probabilities for the digit from dataset: {prediction2}')
    print(f'Predicted Digit for the dataset digit: {predicted_digit2}')
    
    return predicted_digit1, predicted_digit2


In [None]:
# Path to the user-provided handwritten digit image
image_path = 'path/to/user/image.png'
processed_user_image = preprocess_image(image_path)

random_image = x_train[np.random.randint(x_train.shape[0])]
random_image_path = 'path/to/random/digit.png'  # Save the random training set image
Image.fromarray(random_image).save(random_image_path)
processed_dataset_image = preprocess_image(random_image_path) 



In [None]:
predicted_digit_user = model.predict(processed_user_image).argmax()

predicted_digit_dataset = model.predict(processed_dataset_image).argmax()

print(f'Predicted Digit for User drawn digit: {predicted_digit_user}')
print(f'Predicted Digit for Training Dataset digit: {predicted_digit_dataset}')