# Neural-Network-7-Segment-Display

### Problem Overview

1. **Input Representation**:
   Each digit is represented using a 7-segment display with binary encoding for segments `a-g`, as shown in the uploaded table. This will form the input layer of our network.
   
2. **Output Representation**:
   The network will have 10 output nodes (one for each digit, 0 to 9). Each node should output `1` if the corresponding digit is the input; otherwise, it should output `0`.

3. **Network Design**:
   - **Input Layer**: 7 nodes (corresponding to segments `a` to `g` of the 7-segment display).
   - **Hidden Layers**: Two hidden layers with adjustable numbers of neurons (we'll analyze the impact of this in point (i)).
   - **Output Layer**: 10 nodes (one-hot encoding for digits 0 to 9).

4. **Activation Function**: Sigmoid function (non-linear) for each layer except the output layer. This will allow us to compute gradients smoothly.
   
5. **Loss Function**: Mean Squared Error (MSE), given by:
   $
   MSE = \frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{10} (y_{ij} - \hat{y}_{ij})^2
   $
   where $ y_{ij} $ is the true output (1 if the digit is $ i $, 0 otherwise) and $ \hat{y}_{ij} $ is the predicted output for each of the 10 output neurons.

---

### Solution Approach

1. **Data Preparation**:
   - **Dataset Creation**: Use the 7-segment code table to create an input-output dataset. Each digit (0-9) will have a binary input vector of 7 segments and a one-hot encoded output vector of 10 values.
   - **Training Data**: Since the patterns are fixed, we may create data pairs (input, output) for digits 0-9.
   - **Data Augmentation** (if needed): Introduce slight noise to the 7-segment code for robustness (optional).

2. **Mathematical Formulation**:

   - **Feedforward Pass**:
     - Let $ x $ represent the input vector (binary vector of length 7).
     - Let $ W^{(1)}, W^{(2)}, W^{(3)} $ represent the weight matrices between the input and first hidden layer, first and second hidden layer, and second hidden layer and output layer, respectively.
     - **Hidden Layer 1**: Compute activations $ h^{(1)} = \sigma(W^{(1)} x + b^{(1)}) $.
     - **Hidden Layer 2**: Compute activations $ h^{(2)} = \sigma(W^{(2)} h^{(1)} + b^{(2)}) $.
     - **Output Layer**: Compute the output $ y = \sigma(W^{(3)} h^{(2)} + b^{(3)}) $.

   - **Backpropagation**:
     - Compute the loss gradient with respect to the output layer weights, then propagate this gradient back through each layer.
     - For each layer $ l $, compute:
       $
       \delta^{(l)} = (y^{(l)} - \hat{y}^{(l)}) \odot \sigma'(z^{(l)})
       $
       where $ \delta $ represents the error term for layer $ l $, and $ z^{(l)} $ is the linear combination of inputs to layer $ l $.
     - Update the weights using the gradients computed with respect to $ W^{(l)} $ for each layer.

Backpropagation is the process used to calculate the gradient of the loss function with respect to each weight in the network. In this problem, we have a feedforward neural network with two hidden layers, a sigmoid activation function, and Mean Squared Error (MSE) as the loss function. I'll walk through each step of the backpropagation process mathematically.

### Notation and Setup

1. **Inputs and Outputs**:
   - Let $ x $ be the input vector of length 7 (for the 7-segment display segments).
   - The network has two hidden layers with $ H_1 $ and $ H_2 $ neurons respectively.
   - The output layer has 10 neurons (one for each digit, 0-9).

2. **Weight Matrices and Biases**:
   - $ W^{(1)} $: Weight matrix between the input layer and the first hidden layer, of shape $ H_1 \times 7 $.
   - $ W^{(2)} $: Weight matrix between the first and second hidden layer, of shape $ H_2 \times H_1 $.
   - $ W^{(3)} $: Weight matrix between the second hidden layer and the output layer, of shape $ 10 \times H_2 $.
   - $ b^{(1)}, b^{(2)}, b^{(3)} $: Bias vectors for each layer.

3. **Activations and Pre-Activations**:
   - $ z^{(l)} $: Linear combination (pre-activation) of inputs at layer $ l $.
   - $ a^{(l)} $: Activation (post-activation) of neurons at layer $ l $.

4. **Activation Function**:
   - Sigmoid function: $ \sigma(z) = \frac{1}{1 + e^{-z}} $.
   - Sigmoid derivative: $ \sigma'(z) = \sigma(z)(1 - \sigma(z)) $.

### Forward Pass

1. **Layer 1 (Input to First Hidden Layer)**:
   $
   z^{(1)} = W^{(1)} x + b^{(1)}
   $
   $
   a^{(1)} = \sigma(z^{(1)})
   $

2. **Layer 2 (First Hidden Layer to Second Hidden Layer)**:
   $
   z^{(2)} = W^{(2)} a^{(1)} + b^{(2)}
   $
   $
   a^{(2)} = \sigma(z^{(2)})
   $

3. **Output Layer (Second Hidden Layer to Output Layer)**:
   $
   z^{(3)} = W^{(3)} a^{(2)} + b^{(3)}
   $
   $
   a^{(3)} = \sigma(z^{(3)})
   $

Here, $ a^{(3)} $ is the final output vector of the network, representing the predicted probabilities for each digit.

### Loss Function

The Mean Squared Error (MSE) loss is given by:
$
L = \frac{1}{10} \sum_{j=1}^{10} (y_j - a_j^{(3)})^2
$
where $ y $ is the true output (one-hot encoded vector for the target digit) and $ a^{(3)} $ is the network’s predicted output.

### Backpropagation Steps

The goal of backpropagation is to compute the gradients of the loss $ L $ with respect to each weight and bias in the network, so that we can update them to minimize the loss.

    
#### Step 1: Compute the Output Layer Error

For each output neuron $ j $ in the output layer:
$
\delta^{(3)}_j = \frac{\partial L}{\partial z^{(3)}_j}
$
Using the chain rule, we get:
$
\delta^{(3)}_j = \frac{\partial L}{\partial a^{(3)}_j} \cdot \frac{\partial a^{(3)}_j}{\partial z^{(3)}_j}
$
1. **Derivative of Loss w.r.t. $ a^{(3)}_j $**:
   $
   \frac{\partial L}{\partial a^{(3)}_j} = \frac{2}{10} (a^{(3)}_j - y_j)
   $

2. **Derivative of Activation w.r.t. $ z^{(3)}_j $**:
   Since $ a^{(3)}_j = \sigma(z^{(3)}_j) $:
   $
   \frac{\partial a^{(3)}_j}{\partial z^{(3)}_j} = \sigma(z^{(3)}_j) (1 - \sigma(z^{(3)}_j)) = a^{(3)}_j (1 - a^{(3)}_j)
   $

Combining these, we get:
$
\delta^{(3)}_j = \frac{2}{10} (a^{(3)}_j - y_j) \cdot a^{(3)}_j (1 - a^{(3)}_j)
$

#### Step 2: Compute the Second Hidden Layer Error

The error at the second hidden layer is calculated by propagating the output layer error backward through the weights $ W^{(3)} $:
$
\delta^{(2)} = (W^{(3)})^T \delta^{(3)} \odot \sigma'(z^{(2)})
$
where $ \odot $ denotes element-wise multiplication, and $ \sigma'(z^{(2)}) $ is the derivative of the sigmoid activation at layer 2:
$
\sigma'(z^{(2)}) = a^{(2)} \odot (1 - a^{(2)})
$
Thus,
$
\delta^{(2)} = (W^{(3)})^T \delta^{(3)} \odot a^{(2)} (1 - a^{(2)})
$

#### Step 3: Compute the First Hidden Layer Error

Similarly, we propagate the error backward from the second hidden layer to the first hidden layer:
$
\delta^{(1)} = (W^{(2)})^T \delta^{(2)} \odot \sigma'(z^{(1)})
$
where
$
\sigma'(z^{(1)}) = a^{(1)} \odot (1 - a^{(1)})
$
So,
$
\delta^{(1)} = (W^{(2)})^T \delta^{(2)} \odot a^{(1)} (1 - a^{(1)})
$

### Step 4: Gradient Calculation

Using the error terms $ \delta^{(1)}, \delta^{(2)}, \delta^{(3)} $, we can now calculate the gradients with respect to each weight matrix and bias vector.

1. **Gradients for Output Layer Weights and Biases**:
   $
   \frac{\partial L}{\partial W^{(3)}} = \delta^{(3)} (a^{(2)})^T
   $
   $
   \frac{\partial L}{\partial b^{(3)}} = \delta^{(3)}
   $

2. **Gradients for Second Hidden Layer Weights and Biases**:
   $
   \frac{\partial L}{\partial W^{(2)}} = \delta^{(2)} (a^{(1)})^T
   $
   $
   \frac{\partial L}{\partial b^{(2)}} = \delta^{(2)}
   $

3. **Gradients for First Hidden Layer Weights and Biases**:
   $
   \frac{\partial L}{\partial W^{(1)}} = \delta^{(1)} x^T
   $
   $
   \frac{\partial L}{\partial b^{(1)}} = \delta^{(1)}
   $

### Step 5: Update Weights and Biases

After computing the gradients, we update each weight and bias using gradient descent with learning rate $ \eta $:
$
W^{(l)} = W^{(l)} - \eta \frac{\partial L}{\partial W^{(l)}}
$
$
b^{(l)} = b^{(l)} - \eta \frac{\partial L}{\partial b^{(l)}}
$
for each layer $ l $.



3. **Convergence Analysis**:
   - Plot the loss function (MSE) over iterations to study convergence. Adjust learning rates to observe differences in the rate and stability of convergence.

4. **Model Evaluation**:
   - **N-Fold Cross-Validation**: Divide the dataset into N folds and iteratively train on $ N-1 $ folds, evaluating on the remaining fold. Repeat this process N times to compute performance metrics:
     - **Accuracy**: Proportion of correct predictions over total predictions.
     - **Precision**: $ \frac{\text{True Positives}}{\text{True Positives + False Positives}} $
     - **Recall (Sensitivity)**: $ \frac{\text{True Positives}}{\text{True Positives + False Negatives}} $
     - **Specificity**: $ \frac{\text{True Negatives}}{\text{True Negatives + False Positives}} $
     - **F-Measure**: Harmonic mean of Precision and Recall.

5. **Experiment with Network Hyperparameters**:
   - **Learning Rate**: Test different learning rates (e.g., 0.01, 0.1, 0.5) to observe their effects on convergence.
   - **Hidden Layers**: Vary the number of hidden neurons and layers to study the trade-offs between network capacity and generalization.

---

