# Neural Network from Scratch: Classifying Letters A, B, and C

This notebook implements a simple feedforward neural network **from scratch** using only NumPy.
The goal is to classify 5×6 binary images of letters **A**, **B**, and **C**.

## Outline
1. Create binary image patterns for A, B, and C (5×6 grid → 30 pixels)
2. Build a small dataset (with some noisy variants)
3. Define a two-layer neural network (input → hidden → output)
4. Implement forward pass, loss computation, and backpropagation manually
5. Train with gradient descent and track loss & accuracy
6. Visualize training curves and test predictions (with `imshow`)


In [None]:
# Imports
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# For reproducibility
np.random.seed(42)

## 1. Create Binary Patterns for Letters A, B, and C
Each letter is represented as a 5×6 grid of 0s and 1s. We flatten this grid into a 1D array of length 30.

In [None]:
def show_pattern(pattern, title=""):
    """Utility to visualize a 5x6 binary pattern."""
    plt.imshow(pattern.reshape(5, 6), cmap='gray_r')
    plt.title(title)
    plt.axis('off')
    plt.show()

# Manually define base 5x6 patterns for A, B, C
# You can tweak these shapes if you like, but keep them 5x6.
A_pattern = np.array([
    [0,1,1,1,1,0],
    [1,0,0,0,0,1],
    [1,1,1,1,1,1],
    [1,0,0,0,0,1],
    [1,0,0,0,0,1]
], dtype=np.float32).reshape(-1)

B_pattern = np.array([
    [1,1,1,1,0,0],
    [1,0,0,0,1,0],
    [1,1,1,1,0,0],
    [1,0,0,0,1,0],
    [1,1,1,1,0,0]
], dtype=np.float32).reshape(-1)

C_pattern = np.array([
    [0,1,1,1,1,0],
    [1,0,0,0,0,1],
    [1,0,0,0,0,0],
    [1,0,0,0,0,1],
    [0,1,1,1,1,0]
], dtype=np.float32).reshape(-1)

print('Pattern shapes:', A_pattern.shape, B_pattern.shape, C_pattern.shape)

In [None]:
# Visualize the base patterns
show_pattern(A_pattern, "Letter A (base pattern)")
show_pattern(B_pattern, "Letter B (base pattern)")
show_pattern(C_pattern, "Letter C (base pattern)")

## 2. Build the Dataset
We will:
- Use the base patterns for A, B, and C
- Create a few **noisy variants** by flipping some pixels randomly
- Stack everything into `X` (features) and `y` (labels)

Labels:
- A → 0
- B → 1
- C → 2

In [None]:
def add_noise(pattern, noise_level=0.1):
    """Randomly flips a fraction (noise_level) of pixels in the pattern."""
    noisy = pattern.copy()
    num_pixels = len(pattern)
    num_noisy = int(noise_level * num_pixels)
    indices = np.random.choice(num_pixels, size=num_noisy, replace=False)
    for idx in indices:
        noisy[idx] = 1.0 - noisy[idx]  # flip 0 -> 1 or 1 -> 0
    return noisy

# Create dataset
X_list = []
y_list = []

num_variants = 10  # how many noisy copies per letter

for _ in range(num_variants):
    X_list.append(add_noise(A_pattern, noise_level=0.1))
    y_list.append(0)
    X_list.append(add_noise(B_pattern, noise_level=0.1))
    y_list.append(1)
    X_list.append(add_noise(C_pattern, noise_level=0.1))
    y_list.append(2)

# Also include the clean base patterns
X_list.extend([A_pattern, B_pattern, C_pattern])
y_list.extend([0, 1, 2])

X = np.array(X_list)  # shape: (N, 30)
y = np.array(y_list)  # shape: (N,)

print('Dataset shape:', X.shape)
print('Labels shape:', y.shape)
print('Unique labels:', np.unique(y))

In [None]:
# Show a few random samples from the dataset
for i in range(3):
    idx = np.random.randint(0, len(X))
    label = y[idx]
    letter = ['A', 'B', 'C'][label]
    show_pattern(X[idx], f"Sample {idx} (Label: {letter})")

### One-Hot Encoding and Train/Test Split
We convert labels to one-hot vectors (for softmax cross-entropy) and split the data into training and testing sets.

In [None]:
def one_hot_encode(y, num_classes=3):
    """Convert label vector y into one-hot encoded matrix."""
    N = y.shape[0]
    Y = np.zeros((N, num_classes))
    Y[np.arange(N), y] = 1.0
    return Y

Y = one_hot_encode(y, num_classes=3)
print('One-hot encoded Y shape:', Y.shape)

# Simple train/test split (80% train, 20% test)
num_samples = X.shape[0]
indices = np.arange(num_samples)
np.random.shuffle(indices)

split = int(0.8 * num_samples)
train_idx = indices[:split]
test_idx = indices[split:]

X_train, X_test = X[train_idx], X[test_idx]
Y_train, Y_test = Y[train_idx], Y[test_idx]
y_train, y_test = y[train_idx], y[test_idx]

print('Train set:', X_train.shape, Y_train.shape)
print('Test set:', X_test.shape, Y_test.shape)

## 3. Define the Neural Network Architecture
We build a **two-layer** neural network:
- Input layer: 30 neurons (pixels)
- Hidden layer: `hidden_size` neurons with **sigmoid** activation
- Output layer: 3 neurons (A, B, C) with **softmax** activation

We will implement:
- Weight initialization
- Forward pass
- Loss function (cross-entropy)
- Backpropagation and gradient descent update

In [None]:
# Activation functions and helpers
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(a):
    # derivative of sigmoid wrt its output a
    return a * (1 - a)

def softmax(z):
    # subtract max for numerical stability
    z_shifted = z - np.max(z, axis=1, keepdims=True)
    exp_z = np.exp(z_shifted)
    return exp_z / np.sum(exp_z, axis=1, keepdims=True)

def cross_entropy_loss(y_true, y_pred):
    # y_true and y_pred are (N, num_classes)
    # Add small epsilon to avoid log(0)
    eps = 1e-10
    y_pred_clipped = np.clip(y_pred, eps, 1 - eps)
    loss = -np.sum(y_true * np.log(y_pred_clipped)) / y_true.shape[0]
    return loss

def accuracy(y_true_labels, y_pred_probs):
    y_pred_labels = np.argmax(y_pred_probs, axis=1)
    return np.mean(y_true_labels == y_pred_labels)

In [None]:
# Initialize network parameters
input_size = 30
hidden_size = 16   # you can experiment with this
output_size = 3

def initialize_parameters(input_size, hidden_size, output_size):
    W1 = np.random.randn(hidden_size, input_size) * 0.1
    b1 = np.zeros((hidden_size, 1))
    W2 = np.random.randn(output_size, hidden_size) * 0.1
    b2 = np.zeros((output_size, 1))
    return W1, b1, W2, b2

W1, b1, W2, b2 = initialize_parameters(input_size, hidden_size, output_size)
W1.shape, b1.shape, W2.shape, b2.shape

In [None]:
# Forward pass
def forward_pass(X, W1, b1, W2, b2):
    # X: (N, input_size)
    Z1 = X @ W1.T + b1.T          # (N, hidden_size)
    A1 = sigmoid(Z1)             # (N, hidden_size)
    Z2 = A1 @ W2.T + b2.T        # (N, output_size)
    A2 = softmax(Z2)             # (N, output_size)
    return Z1, A1, Z2, A2

In [None]:
# Backpropagation and parameter update
def backward_pass(X, Y, Z1, A1, Z2, A2, W1, W2, learning_rate=0.1):
    N = X.shape[0]

    # Output layer gradient (softmax + cross-entropy)
    dZ2 = A2 - Y                  # (N, output_size)
    dW2 = (dZ2.T @ A1) / N        # (output_size, hidden_size)
    db2 = np.sum(dZ2, axis=0, keepdims=True).T / N  # (output_size, 1)

    # Hidden layer gradients
    dA1 = dZ2 @ W2                # (N, hidden_size)
    dZ1 = dA1 * sigmoid_derivative(A1)  # (N, hidden_size)
    dW1 = (dZ1.T @ X) / N         # (hidden_size, input_size)
    db1 = np.sum(dZ1, axis=0, keepdims=True).T / N  # (hidden_size, 1)

    # Gradient descent update
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1

    return W1, b1, W2, b2

## 4. Training Loop
We train the network using **batch gradient descent**:
- Forward pass on all training data
- Compute loss and accuracy
- Backpropagate errors and update weights
- Repeat for many epochs

We will store loss and accuracy per epoch and plot them.

In [None]:
learning_rate = 0.5
epochs = 500

train_losses = []
train_accuracies = []
test_losses = []
test_accuracies = []

W1, b1, W2, b2 = initialize_parameters(input_size, hidden_size, output_size)

for epoch in range(1, epochs + 1):
    # Forward pass on training data
    Z1_train, A1_train, Z2_train, A2_train = forward_pass(X_train, W1, b1, W2, b2)
    loss_train = cross_entropy_loss(Y_train, A2_train)
    acc_train = accuracy(y_train, A2_train)

    # Backpropagation
    W1, b1, W2, b2 = backward_pass(X_train, Y_train, Z1_train, A1_train, Z2_train, A2_train,
                                   W1, W2, learning_rate=learning_rate)

    # Evaluate on test data
    Z1_test, A1_test, Z2_test, A2_test = forward_pass(X_test, W1, b1, W2, b2)
    loss_test = cross_entropy_loss(Y_test, A2_test)
    acc_test = accuracy(y_test, A2_test)

    # Store metrics
    train_losses.append(loss_train)
    train_accuracies.append(acc_train)
    test_losses.append(loss_test)
    test_accuracies.append(acc_test)

    if epoch % 50 == 0 or epoch == 1:
        print(f"Epoch {epoch:3d}: Train Loss={loss_train:.4f}, Train Acc={acc_train:.3f}, "
              f"Test Loss={loss_test:.4f}, Test Acc={acc_test:.3f}")

## 5. Visualize Loss and Accuracy
We plot training and testing loss and accuracy over epochs to see how the model learns.

In [None]:
epochs_range = np.arange(1, epochs + 1)

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_losses, label='Train Loss')
plt.plot(epochs_range, test_losses, label='Test Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Loss vs Epoch')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_accuracies, label='Train Accuracy')
plt.plot(epochs_range, test_accuracies, label='Test Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Accuracy vs Epoch')
plt.legend()

plt.tight_layout()
plt.show()

## 6. Testing the Model and Visualizing Predictions
We use the trained model to predict the labels for some test samples and display their images using `imshow`.

In [None]:
def predict(X, W1, b1, W2, b2):
    _, _, _, A2 = forward_pass(X, W1, b1, W2, b2)
    return np.argmax(A2, axis=1), A2

y_pred_test, y_pred_probs_test = predict(X_test, W1, b1, W2, b2)
test_acc_final = np.mean(y_pred_test == y_test)
print('Final Test Accuracy:', test_acc_final)

# Show a few predictions
for i in range(min(5, len(X_test))):
    pattern = X_test[i]
    true_label = y_test[i]
    pred_label = y_pred_test[i]
    true_letter = ['A', 'B', 'C'][true_label]
    pred_letter = ['A', 'B', 'C'][pred_label]
    show_pattern(pattern, f"True: {true_letter}, Predicted: {pred_letter}")

### Test on Clean Base Patterns
Finally, we test the network on the original clean patterns for A, B, and C to see if it classifies them correctly.

In [None]:
X_clean = np.stack([A_pattern, B_pattern, C_pattern])
y_clean = np.array([0, 1, 2])
y_pred_clean, _ = predict(X_clean, W1, b1, W2, b2)

for i in range(3):
    true_letter = ['A', 'B', 'C'][y_clean[i]]
    pred_letter = ['A', 'B', 'C'][y_pred_clean[i]]
    show_pattern(X_clean[i], f"Clean Pattern - True: {true_letter}, Predicted: {pred_letter}")