# Neural-Network-7-Segment-Display

### Problem Overview

1. **Input Representation**:
   Each digit is represented using a 7-segment display with binary encoding for segments `a-g`, as shown in the uploaded table. This will form the input layer of our network.
   
2. **Output Representation**:
   The network will have 10 output nodes (one for each digit, 0 to 9). Each node should output `1` if the corresponding digit is the input; otherwise, it should output `0`.

3. **Network Design**:
   - **Input Layer**: 7 nodes (corresponding to segments `a` to `g` of the 7-segment display).
   - **Hidden Layers**: Two hidden layers with adjustable numbers of neurons (we'll analyze the impact of this in point (i)).
   - **Output Layer**: 10 nodes (one-hot encoding for digits 0 to 9).

4. **Activation Function**: Sigmoid function (non-linear) for each layer except the output layer. This will allow us to compute gradients smoothly.
   
5. **Loss Function**: Mean Squared Error (MSE), given by:
   $
   MSE = \frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{10} (y_{ij} - \hat{y}_{ij})^2
   $
   where $ y_{ij} $ is the true output (1 if the digit is $ i $, 0 otherwise) and $ \hat{y}_{ij} $ is the predicted output for each of the 10 output neurons.

---

### Solution Approach

1. **Data Preparation**:
   - **Dataset Creation**: Use the 7-segment code table to create an input-output dataset. Each digit (0-9) will have a binary input vector of 7 segments and a one-hot encoded output vector of 10 values.
   - **Training Data**: Since the patterns are fixed, we may create data pairs (input, output) for digits 0-9.
   - **Data Augmentation** (if needed): Introduce slight noise to the 7-segment code for robustness (optional).

2. **Mathematical Formulation**:

   - **Feedforward Pass**:
     - Let $ x $ represent the input vector (binary vector of length 7).
     - Let $ W^{(1)}, W^{(2)}, W^{(3)} $ represent the weight matrices between the input and first hidden layer, first and second hidden layer, and second hidden layer and output layer, respectively.
     - **Hidden Layer 1**: Compute activations $ h^{(1)} = \sigma(W^{(1)} x + b^{(1)}) $.
     - **Hidden Layer 2**: Compute activations $ h^{(2)} = \sigma(W^{(2)} h^{(1)} + b^{(2)}) $.
     - **Output Layer**: Compute the output $ y = \sigma(W^{(3)} h^{(2)} + b^{(3)}) $.

   - **Backpropagation**:
     - Compute the loss gradient with respect to the output layer weights, then propagate this gradient back through each layer.
     - For each layer $ l $, compute:
       $
       \delta^{(l)} = (y^{(l)} - \hat{y}^{(l)}) \odot \sigma'(z^{(l)})
       $
       where $ \delta $ represents the error term for layer $ l $, and $ z^{(l)} $ is the linear combination of inputs to layer $ l $.
     - Update the weights using the gradients computed with respect to $ W^{(l)} $ for each layer.

Backpropagation is the process used to calculate the gradient of the loss function with respect to each weight in the network. In this problem, we have a feedforward neural network with two hidden layers, a sigmoid activation function, and Mean Squared Error (MSE) as the loss function. I'll walk through each step of the backpropagation process mathematically.

### Notation and Setup

1. **Inputs and Outputs**:
   - Let $ x $ be the input vector of length 7 (for the 7-segment display segments).
   - The network has two hidden layers with $ H_1 $ and $ H_2 $ neurons respectively.
   - The output layer has 10 neurons (one for each digit, 0-9).

2. **Weight Matrices and Biases**:
   - $ W^{(1)} $: Weight matrix between the input layer and the first hidden layer, of shape $ H_1 \times 7 $.
   - $ W^{(2)} $: Weight matrix between the first and second hidden layer, of shape $ H_2 \times H_1 $.
   - $ W^{(3)} $: Weight matrix between the second hidden layer and the output layer, of shape $ 10 \times H_2 $.
   - $ b^{(1)}, b^{(2)}, b^{(3)} $: Bias vectors for each layer.

3. **Activations and Pre-Activations**:
   - $ z^{(l)} $: Linear combination (pre-activation) of inputs at layer $ l $.
   - $ a^{(l)} $: Activation (post-activation) of neurons at layer $ l $.

4. **Activation Function**:
   - Sigmoid function: $ \sigma(z) = \frac{1}{1 + e^{-z}} $.
   - Sigmoid derivative: $ \sigma'(z) = \sigma(z)(1 - \sigma(z)) $.

### Forward Pass

1. **Layer 1 (Input to First Hidden Layer)**:
   $
   z^{(1)} = W^{(1)} x + b^{(1)}
   $
   $
   a^{(1)} = \sigma(z^{(1)})
   $

2. **Layer 2 (First Hidden Layer to Second Hidden Layer)**:
   $
   z^{(2)} = W^{(2)} a^{(1)} + b^{(2)}
   $
   $
   a^{(2)} = \sigma(z^{(2)})
   $

3. **Output Layer (Second Hidden Layer to Output Layer)**:
   $
   z^{(3)} = W^{(3)} a^{(2)} + b^{(3)}
   $
   $
   a^{(3)} = \sigma(z^{(3)})
   $

Here, $ a^{(3)} $ is the final output vector of the network, representing the predicted probabilities for each digit.

### Loss Function

The Mean Squared Error (MSE) loss is given by:
$
L = \frac{1}{10} \sum_{j=1}^{10} (y_j - a_j^{(3)})^2
$
where $ y $ is the true output (one-hot encoded vector for the target digit) and $ a^{(3)} $ is the network’s predicted output.

### Backpropagation Steps

The goal of backpropagation is to compute the gradients of the loss $ L $ with respect to each weight and bias in the network, so that we can update them to minimize the loss.

    
#### Step 1: Compute the Output Layer Error

For each output neuron $ j $ in the output layer:
$
\delta^{(3)}_j = \frac{\partial L}{\partial z^{(3)}_j}
$
Using the chain rule, we get:
$
\delta^{(3)}_j = \frac{\partial L}{\partial a^{(3)}_j} \cdot \frac{\partial a^{(3)}_j}{\partial z^{(3)}_j}
$
1. **Derivative of Loss w.r.t. $ a^{(3)}_j $**:
   $
   \frac{\partial L}{\partial a^{(3)}_j} = \frac{2}{10} (a^{(3)}_j - y_j)
   $

2. **Derivative of Activation w.r.t. $ z^{(3)}_j $**:
   Since $ a^{(3)}_j = \sigma(z^{(3)}_j) $:
   $
   \frac{\partial a^{(3)}_j}{\partial z^{(3)}_j} = \sigma(z^{(3)}_j) (1 - \sigma(z^{(3)}_j)) = a^{(3)}_j (1 - a^{(3)}_j)
   $

Combining these, we get:
$
\delta^{(3)}_j = \frac{2}{10} (a^{(3)}_j - y_j) \cdot a^{(3)}_j (1 - a^{(3)}_j)
$

#### Step 2: Compute the Second Hidden Layer Error

The error at the second hidden layer is calculated by propagating the output layer error backward through the weights $ W^{(3)} $:
$
\delta^{(2)} = (W^{(3)})^T \delta^{(3)} \odot \sigma'(z^{(2)})
$
where $ \odot $ denotes element-wise multiplication, and $ \sigma'(z^{(2)}) $ is the derivative of the sigmoid activation at layer 2:
$
\sigma'(z^{(2)}) = a^{(2)} \odot (1 - a^{(2)})
$
Thus,
$
\delta^{(2)} = (W^{(3)})^T \delta^{(3)} \odot a^{(2)} (1 - a^{(2)})
$

#### Step 3: Compute the First Hidden Layer Error

Similarly, we propagate the error backward from the second hidden layer to the first hidden layer:
$
\delta^{(1)} = (W^{(2)})^T \delta^{(2)} \odot \sigma'(z^{(1)})
$
where
$
\sigma'(z^{(1)}) = a^{(1)} \odot (1 - a^{(1)})
$
So,
$
\delta^{(1)} = (W^{(2)})^T \delta^{(2)} \odot a^{(1)} (1 - a^{(1)})
$

### Step 4: Gradient Calculation

Using the error terms $ \delta^{(1)}, \delta^{(2)}, \delta^{(3)} $, we can now calculate the gradients with respect to each weight matrix and bias vector.

1. **Gradients for Output Layer Weights and Biases**:
   $
   \frac{\partial L}{\partial W^{(3)}} = \delta^{(3)} (a^{(2)})^T
   $
   $
   \frac{\partial L}{\partial b^{(3)}} = \delta^{(3)}
   $

2. **Gradients for Second Hidden Layer Weights and Biases**:
   $
   \frac{\partial L}{\partial W^{(2)}} = \delta^{(2)} (a^{(1)})^T
   $
   $
   \frac{\partial L}{\partial b^{(2)}} = \delta^{(2)}
   $

3. **Gradients for First Hidden Layer Weights and Biases**:
   $
   \frac{\partial L}{\partial W^{(1)}} = \delta^{(1)} x^T
   $
   $
   \frac{\partial L}{\partial b^{(1)}} = \delta^{(1)}
   $

### Step 5: Update Weights and Biases

After computing the gradients, we update each weight and bias using gradient descent with learning rate $ \eta $:
$
W^{(l)} = W^{(l)} - \eta \frac{\partial L}{\partial W^{(l)}}
$
$
b^{(l)} = b^{(l)} - \eta \frac{\partial L}{\partial b^{(l)}}
$
for each layer $ l $.



3. **Convergence Analysis**:
   - Plot the loss function (MSE) over iterations to study convergence. Adjust learning rates to observe differences in the rate and stability of convergence.

4. **Model Evaluation**:
   - **N-Fold Cross-Validation**: Divide the dataset into N folds and iteratively train on $ N-1 $ folds, evaluating on the remaining fold. Repeat this process N times to compute performance metrics:
     - **Accuracy**: Proportion of correct predictions over total predictions.
     - **Precision**: $ \frac{\text{True Positives}}{\text{True Positives + False Positives}} $
     - **Recall (Sensitivity)**: $ \frac{\text{True Positives}}{\text{True Positives + False Negatives}} $
     - **Specificity**: $ \frac{\text{True Negatives}}{\text{True Negatives + False Positives}} $
     - **F-Measure**: Harmonic mean of Precision and Recall.

5. **Experiment with Network Hyperparameters**:
   - **Learning Rate**: Test different learning rates (e.g., 0.01, 0.1, 0.5) to observe their effects on convergence.
   - **Hidden Layers**: Vary the number of hidden neurons and layers to study the trade-offs between network capacity and generalization.

---

### Implementation Outline

1. **Data Preparation in Python**:
   ```python
   import numpy as np
   
   # 7-segment data for digits 0-9
   seven_segment_data = {
       0: [1, 1, 1, 1, 1, 1, 0],
       1: [0, 1, 1, 0, 0, 0, 0],
       # Complete this for digits 2-9
   }

   # One-hot encoded output
   labels = np.eye(10)  # 10x10 identity matrix
   ```

2. **Feedforward and Backpropagation Functions**:
   Implement the forward and backpropagation processes with numpy for efficiency. Here's a basic outline:

   ```python
   def sigmoid(x):
       return 1 / (1 + np.exp(-x))

   def sigmoid_derivative(x):
       return x * (1 - x)

   # Initialize weights and biases (randomly)
   # Implement feedforward and backpropagation using numpy operations
   ```

3. **Training and Evaluation Loop**:
   Implement a training loop that logs the loss over iterations and evaluates performance using N-fold cross-validation.

4. **Plotting Convergence**:
   Use `matplotlib` to plot the loss against iterations for each learning rate and network configuration.

5. **Cross-Validation**:
   Implement cross-validation to compute accuracy, precision, recall, etc., and display the results.



### Step 1: Data Preparation

1. **Prepare the 7-segment data**: Each digit from 0 to 9 is represented as a binary vector based on the state of each segment (on or off).
2. **One-hot encoding for labels**: The output for each digit is a one-hot encoded vector of length 10.

Here's the code to prepare the data:

```python
import numpy as np
import pandas as pd

# 7-segment display encoding for digits 0-9
seven_segment_data = {
    0: [1, 1, 1, 1, 1, 1, 0],
    1: [0, 1, 1, 0, 0, 0, 0],
    2: [1, 1, 0, 1, 1, 0, 1],
    3: [1, 1, 1, 1, 0, 0, 1],
    4: [0, 1, 1, 0, 0, 1, 1],
    5: [1, 0, 1, 1, 0, 1, 1],
    6: [1, 0, 1, 1, 1, 1, 1],
    7: [1, 1, 1, 0, 0, 0, 0],
    8: [1, 1, 1, 1, 1, 1, 1],
    9: [1, 1, 1, 1, 0, 1, 1]
}

# Converting the data to arrays
inputs = np.array(list(seven_segment_data.values()))
labels = np.eye(10)  # One-hot encoded labels for digits 0-9

# Combine into a DataFrame for easy visualization (optional)
df = pd.DataFrame(inputs, columns=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
df['digit'] = range(10)
print(df)

# The inputs and labels are ready for training
```

### Step 2: Define Activation Functions and Network Initialization

Since we’re using a feedforward neural network, we'll define the sigmoid activation function and initialize the weights and biases.

```python
# Activation function and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Network architecture
input_size = 7
hidden_layer1_size = 10
hidden_layer2_size = 10
output_size = 10

# Initialize weights and biases randomly
np.random.seed(42)  # For reproducibility
weights1 = np.random.randn(input_size, hidden_layer1_size)
bias1 = np.random.randn(hidden_layer1_size)
weights2 = np.random.randn(hidden_layer1_size, hidden_layer2_size)
bias2 = np.random.randn(hidden_layer2_size)
weights3 = np.random.randn(hidden_layer2_size, output_size)
bias3 = np.random.randn(output_size)
```

### Step 3: Implement Feedforward and Backpropagation Functions

The feedforward function calculates the activations at each layer, and the backpropagation function updates weights based on the calculated gradients.

```python
# Feedforward function
def feedforward(x):
    # Layer 1
    z1 = np.dot(x, weights1) + bias1
    a1 = sigmoid(z1)
    
    # Layer 2
    z2 = np.dot(a1, weights2) + bias2
    a2 = sigmoid(z2)
    
    # Output Layer
    z3 = np.dot(a2, weights3) + bias3
    output = sigmoid(z3)
    return output, a1, a2, z1, z2, z3

# Backpropagation function
def backpropagation(x, y, a1, a2, output, z1, z2, z3, learning_rate=0.1):
    # Output layer error and delta
    error_output = output - y
    delta_output = error_output * sigmoid_derivative(output)
    
    # Second hidden layer error and delta
    error_hidden2 = np.dot(delta_output, weights3.T)
    delta_hidden2 = error_hidden2 * sigmoid_derivative(a2)
    
    # First hidden layer error and delta
    error_hidden1 = np.dot(delta_hidden2, weights2.T)
    delta_hidden1 = error_hidden1 * sigmoid_derivative(a1)
    
    # Gradient descent weight updates
    global weights1, weights2, weights3, bias1, bias2, bias3
    weights3 -= learning_rate * np.dot(a2.T, delta_output)
    bias3 -= learning_rate * delta_output.sum(axis=0)
    
    weights2 -= learning_rate * np.dot(a1.T, delta_hidden2)
    bias2 -= learning_rate * delta_hidden2.sum(axis=0)
    
    weights1 -= learning_rate * np.dot(x.T, delta_hidden1)
    bias1 -= learning_rate * delta_hidden1.sum(axis=0)
    
    # Return the mean squared error for monitoring
    mse = np.mean(error_output**2)
    return mse
```

### Step 4: Training Loop with Loss Plotting

This loop performs feedforward and backpropagation across multiple epochs to train the network and plot the loss function.

```python
import matplotlib.pyplot as plt

# Training parameters
epochs = 1000
learning_rate = 0.1
loss_history = []

for epoch in range(epochs):
    epoch_loss = 0
    for i in range(len(inputs)):
        x = inputs[i].reshape(1, -1)  # Input vector for the digit
        y = labels[i].reshape(1, -1)  # One-hot encoded target output

        # Feedforward
        output, a1, a2, z1, z2, z3 = feedforward(x)

        # Backpropagation and loss calculation
        mse = backpropagation(x, y, a1, a2, output, z1, z2, z3, learning_rate)
        epoch_loss += mse

    # Record the average loss for this epoch
    loss_history.append(epoch_loss / len(inputs))
    
    # Print loss every 100 epochs
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Loss: {epoch_loss / len(inputs):.4f}')

# Plotting the loss over epochs
plt.plot(loss_history)
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error (Loss)')
plt.title('Loss Curve')
plt.show()
```

### Step 5: Evaluation Using Cross-Validation (Optional)

For evaluation metrics, we would ideally use a larger dataset with noise to observe performance changes. However, with the limited data here, we’ll skip cross-validation and simply validate by checking accuracy on our fixed patterns.

### Summary
This approach initializes, trains, and visualizes the network for recognizing digits from 7-segment patterns.

To expand the dataset, we can add slight noise to the binary segment values, flipping a segment on or off randomly. This will help simulate minor imperfections in input and make the model more robust to variations in the input patterns.

After generating this noisy dataset, we’ll apply **K-fold cross-validation** to evaluate the model’s performance on multiple splits of the data.


### Step 1: Generate Noisy Dataset

1. **Introduce Noise**: For each digit pattern, we’ll generate multiple noisy samples by flipping the values of each segment with a small probability. For example, with a 10% probability, each segment could randomly switch from 1 to 0 or vice versa.
2. **Create New Samples**: For each original sample, we can generate multiple noisy versions (e.g., 10 per sample).

Here’s the code to do that:

```python
import pandas as pd
import numpy as np

# Original dataset
data = {
    'digit': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'a': [1, 0, 1, 1, 0, 1, 1, 1, 1, 1],
    'b': [1, 1, 1, 1, 1, 0, 0, 1, 1, 1],
    'c': [1, 1, 0, 1, 1, 1, 1, 1, 1, 1],
    'd': [1, 0, 1, 1, 0, 1, 1, 0, 1, 1],
    'e': [1, 0, 1, 0, 0, 0, 1, 0, 1, 1],
    'f': [1, 0, 0, 0, 1, 1, 1, 0, 1, 0],
    'g': [0, 0, 1, 1, 1, 1, 1, 0, 1, 1]
}
df = pd.DataFrame(data)

# Function to add noise to a binary vector
def add_noise(row, noise_level=0.1):
    noisy_row = row.copy()
    for i in range(1, 8):  # Only apply noise to segments (columns a-g)
        if np.random.rand() < noise_level:
            noisy_row[i] = 1 - noisy_row[i]  # Flip the bit
    return noisy_row

# Generate noisy dataset
noisy_samples_per_digit = 10  # Number of noisy samples per digit
noisy_data = []

for _, row in df.iterrows():
    for _ in range(noisy_samples_per_digit):
        noisy_data.append(add_noise(row.values))

# Convert the noisy data to a DataFrame
noisy_df = pd.DataFrame(noisy_data, columns=df.columns)
print(noisy_df.head())  # Display some samples of the noisy dataset

# Save dataset to CSV for inspection if needed
noisy_df.to_csv("noisy_7_segment_data.csv", index=False)
```

This code generates 10 noisy samples per digit, creating a larger dataset with slightly varied patterns.

---

### Step 2: K-Fold Cross-Validation

We’ll split the noisy dataset into **K folds** (e.g., \( K = 5 \)) and then train and evaluate the model on each fold. For each fold, we’ll train on \( K-1 \) parts and test on the remaining part.

Here's how to perform **K-fold cross-validation** on this dataset:

1. **K-Fold Splitting**: Use `KFold` from `sklearn.model_selection` to create training and testing splits.
2. **Model Training and Evaluation**: For each fold, train the model on the training split and evaluate it on the test split.

```python
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Prepare inputs and labels for training
X = noisy_df.iloc[:, 1:].values  # Segment data
y = noisy_df['digit'].values     # Labels (digits)

# One-hot encode labels
Y = np.eye(10)[y]  # Convert digit labels to one-hot encoded vectors

# Define K-Fold Cross Validation
k = 5
kf = KFold(n_splits=k, shuffle=True, random_state=42)

# Initialize lists to track performance metrics
fold_accuracies = []

# Perform K-Fold Cross Validation
for fold, (train_index, test_index) in enumerate(kf.split(X)):
    # Split data
    X_train, X_test = X[train_index], X[test_index]
    Y_train, Y_test = Y[train_index], Y[test_index]
    
    # Train model
    epochs = 200
    learning_rate = 0.1
    for epoch in range(epochs):
        epoch_loss = 0
        for i in range(len(X_train)):
            x = X_train[i].reshape(1, -1)  # Input vector for the digit
            y_true = Y_train[i].reshape(1, -1)  # One-hot encoded target output

            # Feedforward
            output, a1, a2, z1, z2, z3 = feedforward(x)

            # Backpropagation and loss calculation
            mse = backpropagation(x, y_true, a1, a2, output, z1, z2, z3, learning_rate)
            epoch_loss += mse
    
    # Evaluate model on the test set
    predictions = []
    for i in range(len(X_test)):
        x = X_test[i].reshape(1, -1)
        output, _, _, _, _, _ = feedforward(x)
        predictions.append(np.argmax(output))  # Convert output to digit prediction
    
    accuracy = accuracy_score(np.argmax(Y_test, axis=1), predictions)
    fold_accuracies.append(accuracy)
    print(f"Fold {fold + 1} - Accuracy: {accuracy:.4f}")

# Calculate the average accuracy across all folds
average_accuracy = np.mean(fold_accuracies)
print(f"Average Accuracy across {k} folds: {average_accuracy:.4f}")

# Plot fold accuracies
plt.plot(range(1, k + 1), fold_accuracies, marker='o', color='b')
plt.xlabel('Fold Number')
plt.ylabel('Accuracy')
plt.title('K-Fold Cross-Validation Accuracy')
plt.show()
```

---

### Explanation of the Code

1. **Adding Noise**:
   - We create noisy samples for each digit pattern, which helps the model generalize better to variations in input.

2. **K-Fold Cross-Validation**:
   - We split the data into `k=5` folds.
   - For each fold, we train the neural network on \( K-1 \) folds and test it on the remaining fold.
   - The accuracy for each fold is computed and stored.

3. **Evaluating the Model**:
   - The average accuracy across all folds is calculated to give a robust estimate of model performance.
   - We plot the accuracy for each fold to observe any variations.

---

### Summary

This process expands the dataset with noise, making the neural network more robust. The use of K-fold cross-validation helps assess the model's generalization by testing it on different portions of the dataset. This approach is suitable for verifying the model's robustness and ensuring it can handle real-world variations in 7-segment display patterns.

To evaluate the model more comprehensively, we’ll calculate **Precision**, **Recall (Sensitivity)**, **Specificity**, and **F-Measure** for each fold in addition to accuracy. We’ll also compute the average of each metric across all folds for an overall performance summary.

To compute these metrics, we’ll use the following definitions for a multi-class classification problem:

1. **Precision**: The number of true positives (correct predictions for a specific digit) divided by the number of true positives and false positives (all predictions for that digit).
2. **Recall (Sensitivity)**: The number of true positives divided by the number of true positives and false negatives (all actual instances of that digit).
3. **Specificity**: The ability to correctly identify negatives, calculated for each class as the number of true negatives divided by the number of true negatives and false positives.
4. **F-Measure**: The harmonic mean of precision and recall.

Here’s how to implement it:

```python
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.preprocessing import label_binarize

# Convert labels to binary format for each class
y_bin = label_binarize(y, classes=range(10))

# Initialize lists to track metrics for each fold
fold_accuracies = []
fold_precisions = []
fold_recalls = []
fold_specificities = []
fold_f1_scores = []

# Perform K-Fold Cross Validation
for fold, (train_index, test_index) in enumerate(kf.split(X)):
    # Split data
    X_train, X_test = X[train_index], X[test_index]
    Y_train, Y_test = Y[train_index], Y[test_index]
    
    # Train model (same as before)
    epochs = 200
    learning_rate = 0.1
    for epoch in range(epochs):
        for i in range(len(X_train)):
            x = X_train[i].reshape(1, -1)  # Input vector for the digit
            y_true = Y_train[i].reshape(1, -1)  # One-hot encoded target output

            # Feedforward
            output, a1, a2, z1, z2, z3 = feedforward(x)

            # Backpropagation and loss calculation
            backpropagation(x, y_true, a1, a2, output, z1, z2, z3, learning_rate)

    # Evaluate model on the test set
    predictions = []
    for i in range(len(X_test)):
        x = X_test[i].reshape(1, -1)
        output, _, _, _, _, _ = feedforward(x)
        predictions.append(np.argmax(output))  # Convert output to digit prediction

    y_true = np.argmax(Y_test, axis=1)
    y_pred = np.array(predictions)
    
    # Calculate metrics for this fold
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='macro')
    recall = recall_score(y_true, y_pred, average='macro')
    f1 = f1_score(y_true, y_pred, average='macro')
    
    # Specificity calculation
    cm = confusion_matrix(y_true, y_pred)
    specificity = []
    for i in range(10):
        tn = np.sum(cm) - (np.sum(cm[i, :]) + np.sum(cm[:, i]) - cm[i, i])
        fp = np.sum(cm[:, i]) - cm[i, i]
        specificity.append(tn / (tn + fp))
    avg_specificity = np.mean(specificity)

    # Store metrics for each fold
    fold_accuracies.append(accuracy)
    fold_precisions.append(precision)
    fold_recalls.append(recall)
    fold_specificities.append(avg_specificity)
    fold_f1_scores.append(f1)
    
    print(f"Fold {fold + 1} - Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, Specificity: {avg_specificity:.4f}, F1 Score: {f1:.4f}")

# Calculate average metrics across all folds
average_accuracy = np.mean(fold_accuracies)
average_precision = np.mean(fold_precisions)
average_recall = np.mean(fold_recalls)
average_specificity = np.mean(fold_specificities)
average_f1_score = np.mean(fold_f1_scores)

print("\n--- Average Metrics Across All Folds ---")
print(f"Average Accuracy: {average_accuracy:.4f}")
print(f"Average Precision: {average_precision:.4f}")
print(f"Average Recall (Sensitivity): {average_recall:.4f}")
print(f"Average Specificity: {average_specificity:.4f}")
print(f"Average F1 Score: {average_f1_score:.4f}")

# Plotting for each fold
plt.figure(figsize=(10, 6))
plt.plot(range(1, k + 1), fold_accuracies, label="Accuracy", marker='o', color='b')
plt.plot(range(1, k + 1), fold_precisions, label="Precision", marker='o', color='g')
plt.plot(range(1, k + 1), fold_recalls, label="Recall", marker='o', color='r')
plt.plot(range(1, k + 1), fold_specificities, label="Specificity", marker='o', color='purple')
plt.plot(range(1, k + 1), fold_f1_scores, label="F1 Score", marker='o', color='orange')
plt.xlabel('Fold Number')
plt.ylabel('Metric Value')
plt.title('Performance Metrics Across Folds')
plt.legend()
plt.show()
```

---

### Explanation of Additional Code

1. **Calculating Metrics**:
   - `precision_score`, `recall_score`, and `f1_score` are calculated using `average='macro'` to treat each class equally, suitable for multi-class classification.
   - **Specificity**: Specificity is calculated manually using the confusion matrix, as it’s not directly available in `sklearn.metrics` for multi-class problems. Specificity for each class \( i \) is calculated as \( \frac{TN}{TN + FP} \), where `TN` (True Negatives) and `FP` (False Positives) are derived from the confusion matrix.
   
2. **Storing Fold Metrics**:
   - Each fold’s metrics are stored in respective lists, allowing us to compute the average of each metric across all folds at the end.

3. **Visualization**:
   - The plot shows how each metric varies across the folds, providing a visual representation of the model's performance consistency.

---

### Summary of Results

The above code gives the following:
- **Accuracy**, **Precision**, **Recall**, **Specificity**, and **F1 Score** for each fold.
- **Average metrics** across all folds, offering an overall evaluation of model performance.
- **Plot** displaying the consistency and variance of these metrics across the folds.

This approach provides a well-rounded assessment of the model’s performance on noisy data, ensuring its robustness in recognizing digits on a 7-segment display.

## Part B

### Step 1: Load and Preprocess Data

```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Load CSV dataset
data = pd.read_csv("alphabet_dataset.csv")

# Separate features and labels
X = data.iloc[:, :-26].values  # 25 binary values per row
y = data.iloc[:, -26:].values  # One-hot encoded labels (26 columns)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

### Step 2: Define the Feedforward Neural Network Model

We’ll create a model with:
- **Two hidden layers** (with 32 and 16 neurons respectively)
- **Sigmoid activation** function
- **MSE loss** function for training

```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the model
model = Sequential([
    Dense(32, input_dim=25, activation='sigmoid'),  # First hidden layer with 32 neurons
    Dense(16, activation='sigmoid'),                # Second hidden layer with 16 neurons
    Dense(26, activation='sigmoid')                 # Output layer with 26 neurons for A-Z classification
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
```

### Step 3: Train the Model

We’ll train the model using the **training set** and monitor the loss and accuracy during training.

```python
# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=16, validation_split=0.1)
```

### Step 4: Evaluate the Model

After training, we evaluate the model on the test set and calculate metrics like **accuracy**, **precision**, **recall**, and **F1 score**.

```python
# Predict on the test set
y_pred = model.predict(X_test)

# Convert predictions and labels to class labels from one-hot encoding
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)

# Calculate metrics
accuracy = accuracy_score(y_test_classes, y_pred_classes)
precision = precision_score(y_test_classes, y_pred_classes, average='weighted')
recall = recall_score(y_test_classes, y_pred_classes, average='weighted')
f1 = f1_score(y_test_classes, y_pred_classes, average='weighted')

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
```

### Step 5: Plot Training Loss vs. Epochs

To analyze convergence, we’ll plot the training loss and accuracy over epochs.

```python
import matplotlib.pyplot as plt

# Plot training & validation loss
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error Loss')
plt.legend()
plt.title('Training and Validation Loss Over Epochs')
plt.show()

# Plot training & validation accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training and Validation Accuracy Over Epochs')
plt.show()
```

### Step 6: Cross-Validation and Metrics Calculation

For **N-Fold Cross-Validation**, you can wrap the model training in a loop, splitting the data for each fold. Below is an example using **5-Fold Cross Validation**:

```python
from sklearn.model_selection import KFold

# Initialize 5-Fold Cross Validator
kf = KFold(n_splits=5)
fold_accuracy, fold_precision, fold_recall, fold_f1 = [], [], [], []

for train_index, val_index in kf.split(X):
    # Create train/val split for this fold
    X_train_fold, X_val_fold = X[train_index], X[val_index]
    y_train_fold, y_val_fold = y[train_index], y[val_index]
    
    # Define and train the model for each fold
    model = Sequential([
        Dense(32, input_dim=25, activation='sigmoid'),
        Dense(16, activation='sigmoid'),
        Dense(26, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
    model.fit(X_train_fold, y_train_fold, epochs=50, batch_size=16, verbose=0)

    # Evaluate on the validation set
    y_val_pred = model.predict(X_val_fold)
    y_val_pred_classes = np.argmax(y_val_pred, axis=1)
    y_val_classes = np.argmax(y_val_fold, axis=1)

    # Calculate metrics for this fold
    fold_accuracy.append(accuracy_score(y_val_classes, y_val_pred_classes))
    fold_precision.append(precision_score(y_val_classes, y_val_pred_classes, average='weighted'))
    fold_recall.append(recall_score(y_val_classes, y_val_pred_classes, average='weighted'))
    fold_f1.append(f1_score(y_val_classes, y_val_pred_classes, average='weighted'))

# Average metrics across folds
print(f"Cross-Validated Accuracy: {np.mean(fold_accuracy)}")
print(f"Cross-Validated Precision: {np.mean(fold_precision)}")
print(f"Cross-Validated Recall: {np.mean(fold_recall)}")
print(f"Cross-Validated F1 Score: {np.mean(fold_f1)}")
```

### Summary

1. **Model Design**: A feedforward neural network with two hidden layers and sigmoid activation for each layer.
2. **Training and Loss Calculation**: Using Mean Squared Error as the loss function.
3. **Hyperparameter Tuning**: Possible adjustments include the number of neurons, batch size, and learning rate.
4. **Cross-Validation**: 5-Fold Cross-Validation to ensure model stability and performance consistency.
5. **Metrics and Convergence Analysis**: Accuracy, precision, recall, F1 score, and plotting the convergence.

This code provides a full end-to-end approach for building a neural network to recognize alphabets based on a 5x5 grid representation. Adjust hyperparameters as needed for optimal performance.