# Neural Networks

Neural Networks are similar to Perceptrons, but much more powerful — they have multiple layers and multiple nodes. They consist of layers of interconnected nodes (called neurons), each performing a simple computation and passing the result forward. By stacking many layers, neural networks can learn patterns from feature data.



![Neural Network Diagram](Neural-Networks-Architecture.png)

(From GeeksForGeeks.org)

## Basic Components

1. **Neurons**:  
   Each neuron takes one or more inputs and produces an output. A neuron typically:
   - Multiplies each input by a corresponding weight.
   - Sums all these weighted inputs together and adds a bias term.
   - Passes this sum through a nonlinear activation function (e.g., ReLU, sigmoid) to produce the neuron’s output.

2. **Layers**:  
   Neurons are arranged in layers:
   - **Input Layer**: Receives the raw data.
   - **Hidden Layers**: One or more layers that process intermediate representations. Deeper networks (with more hidden layers) can extract more complex features.
   - **Output Layer**: Produces the final prediction or classification score.

3. **Weights and Biases**:  
   The learnable parameters of a neural network are the weights and biases. During training, these parameters are iteratively adjusted to minimize a chosen loss function, thereby improving the model’s performance.

## Math of Neural Networks: Linear Transformations and Nonlinearities

A single layer of neurons without activation functions can be viewed as a linear transformation. Consider one layer with:
- Input vector $ \mathbf{x} $ of shape $ (n,) $
- Weight matrix $ \mathbf{W} $ of shape $ (m \times n) $
- Bias vector $ \mathbf{b} $ of shape $ (m,) $

The computation for the layer’s output $ \mathbf{z} $ (also known as the pre-activation) is:
$$
\mathbf{z} = \mathbf{W}\mathbf{x} + \mathbf{b}
$$

In matrix form, if you have a batch of inputs $ \mathbf{X} $ of shape $ (N \times n) $, where $ N $ is the batch size, the output for the entire batch is:
$$
\mathbf{Z} = \mathbf{X}\mathbf{W}^T + \mathbf{b}
$$
Here $ \mathbf{Z} $ has shape $ (N \times m) $.

After computing $ \mathbf{z} $, we apply a nonlinear activation function $ \sigma(\cdot) $, resulting in the final output $ \mathbf{a} $:
$$
\mathbf{a} = \sigma(\mathbf{z})
$$

There are many activation functions, but we will mostly consider the sigmoid function:
- **Sigmoid**: $ \sigma(z) = \frac{1}{1 + e^{-z}} $

Some others include:
- **ReLU**: $ \sigma(z) = \max(0, z) $
- **Tanh**: $ \sigma(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}} $

## Multiple Layers and Deep Architectures

By stacking multiple layers of the form:
$$
\mathbf{a}^{(1)} = \sigma(\mathbf{W}^{(1)}\mathbf{x} + \mathbf{b}^{(1)})
$$
$$
\mathbf{a}^{(2)} = \sigma(\mathbf{W}^{(2)}\mathbf{a}^{(1)} + \mathbf{b}^{(2)})
$$
$$
\ldots
$$
$$
\mathbf{a}^{(L)} = \sigma(\mathbf{W}^{(L)}\mathbf{a}^{(L-1)} + \mathbf{b}^{(L)})
$$

we can approximate functions that are NOT LINEAR!!

## Training: Adjusting Parameters via Gradient Descent

Training a neural network involves:
1. **Forward Pass**: Compute predictions for given inputs.
2. **Loss Computation**: Compare predictions with true labels to obtain a loss value.
3. **Backward Pass (Backpropagation)**: Compute the gradients of the loss with respect to each weight and bias.
4. **Update Parameters**: Adjust weights and biases in the direction that reduces the loss (often using gradient descent or more sophisticated optimizers like Adam).

Over epochs, the network parameters are refined and the error generally decreases. 

### Be Careful!
One thing about Neural Networks is that you don't want to overfit the data; if you keep the model learning, it will learn the entire dataset since it is able to reproduce nonlinear functions.

---



You can use pretty much any cost function for Neural Networks, but here we will consider the MSE cost function.

## Stochastic Gradient Descent

The steps for Stochastic Gradient Descent in Neural Networks works using backward propogation to find all the errors:

**Definitions**
- $L = $ # of layers
- $\ell = $ layer 
- $w = $ weight
- $b  = $ bias
- $\delta = $ error
- $\sigma = $ the sigmoid function
- $\mathbf{z} = $ pre-activation vector (the result of a linear transformation applied to the input data or the output from the previous layer)
- $\mathbf{a} = $ post-activation vector (the result of applying the non-linear activation function $\sigma(\cdot)$ — or whichever activation function you are using — to the pre-activation vector)
- $\hat{y}^{(i)} = $ our output vector

**Initialize.** Given a feature vector $ x^{(i)} $, let $ a^0 = x^{(i)} $.

**Feedforward.** For $\ell = 1,\ldots,L$:
$
z^\ell = w^\ell a^{\ell-1} + b^\ell \quad \text{and} \quad a^\ell = \sigma(z^\ell).
$

**Output error.** Compute:
$
\delta^L = \nabla_a C \otimes \sigma'(z^L).
$

**Backpropagate.** For $\ell = L-1,\ldots,1$:
$
\delta^\ell = ((w^{\ell+1})^T \delta^{\ell+1}) \otimes \sigma'(z^\ell).
$

**Update gradient.** For $\ell = L,L-1,\ldots,1$:
$
w^\ell \leftarrow w^\ell - a \delta^\ell (a^{\ell-1})^T,
$
$
b^\ell \leftarrow b^\ell - a \delta^\ell.
$


In [45]:
## Convert Data into Grayscale Matrices for Pixel Values

In [None]:
import os
import numpy as np
from PIL import Image
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

root_dir = 'faces_4'
HEIGHT = 30
WIDTH = 32

image_arrays = []
labels_userid = []
labels_pose = []
labels_expression = []
labels_eyes = []

# Load images and extract labels
for subdir, dirs, files in os.walk(root_dir):
    for file_name in files:
        if file_name.endswith('.pgm'):
            # Parse labels from the filename
            parts = file_name.split('_')
            userid = parts[0]
            pose = parts[1]
            expression = parts[2]
            eyes = parts[3]
            # scale_with_ext = parts[4] # Not needed if all are scale=4

            img_path = os.path.join(subdir, file_name)
            with Image.open(img_path).convert('L') as img:
                arr = np.array(img, dtype=np.float32)

                # Optional: Ensure correct size by resizing if needed.
                # img = img.resize((WIDTH, HEIGHT), Image.ANTIALIAS)
                # arr = np.array(img, dtype=np.float32)

                if arr.shape == (HEIGHT, WIDTH):
                    image_arrays.append(arr)
                    labels_userid.append(userid)
                    labels_pose.append(pose)
                    labels_expression.append(expression)
                    labels_eyes.append(eyes)
                else:
                    print(f"Skipping {file_name} due to incorrect size {arr.shape}")

Data shape: (624, 30, 32)
User IDs: (624,)
Poses: (624,)
Expressions: (624,)
Eyes: (624,)


In [47]:
import numpy as np
from sklearn.model_selection import train_test_split

# data_array: shape (M, 32, 30)
data_array = np.load('data_array.npy')

# Suppose you have a labels array of shape (M,)
# Each element corresponds to a label for the respective image in data_array.
# This might be constructed beforehand from filenames or directories:
# labels = ... # e.g. a numpy array like np.array([0, 1, 0, 2, ...]) of length M

# For demonstration, let's assume you already have it:
labels = list(zip(labels_userid, labels_pose, labels_expression, labels_eyes))

# Split the data into train and test sets
train_X, test_X, train_y, test_y = train_test_split(
    data_array, 
    labels, 
    test_size=0.2,     # 20% of data goes to test set
    random_state=42     # For reproducibility
)

print("Training Data Shape (X):", train_X.shape) # (M_train, 32, 30)
print("Training Labels Shape (y):", np.array(train_y).shape) 
print("Test Data Shape (X):", test_X.shape)       # (M_test, 32, 30)
print("Test Labels Shape (y):", np.array(test_y).shape)

print(train_X[0])
for i in range(len(train_X)):
    print(f"Image {i}: Max value = {np.max(train_X[i])}, Min value = {np.min(train_X[i])}")


Training Data Shape (X): (499, 30, 32)
Training Labels Shape (y): (499, 4)
Test Data Shape (X): (125, 30, 32)
Test Labels Shape (y): (125, 4)
[[ 68.  65.  61.  58.  55.  51.  48.  45.  43.  36.  38.  38.  48.  43.
   33.  31.  36.  43.  45.  48.  51.  53.  56.  60.  63.  65.  68.  70.
   71.  71.  71.  71.]
 [ 68.  65.  61.  60.  56.  53.  50.  48.  45.  43.  46.  43.  50.  46.
   43.  36.  45.  50.  51.  53.  56.  58.  61.  65.  66.  66.  70.  70.
   71.  71.  73.  73.]
 [ 68.  66.  63.  61.  58.  55.  51.  50.  48.  46.  51.  48.  48.  51.
   48.  43.  51.  55.  56.  58.  60.  61.  63.  65.  66.  68.  70.  71.
   71.  73.  73.  73.]
 [ 70.  66.  65.  63.  60.  56.  55.  53.  50.  48.  56.  51.  51.  55.
   53.  50.  56.  58.  60.  61.  63.  63.  66.  68.  70.  73.  73.  73.
   73.  73.  75.  75.]
 [ 70.  68.  66.  65.  61.  60.  58.  55.  53.  51.  60.  55.  55.  58.
   56.  58.  65.  63.  63.  63.  66.  68.  68.  71.  73.  75.  73.  75.
   75.  75.  75.  75.]
 [ 71.  68.  66.  66.  

The training and testing features are stored in the variables train_X and test_X, and each image is represented as a (1 x 30 x 32) array. Since there are 4 labels (classifications), the training and testing labels label each image with a (1 x 4) array.

## Image Flattening

Now, we need to flatten each 30 x 32 grayscale vector for each image. This will turn the matrix describing the image's pixels into a 960 column vector.

## One-Hot Encoding

We will take each column vector and turn it into 1's and 0's by rounding each grayscale value up to 1 if it's closer to 1 and 0 if it's closer to 0.

In [48]:
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical  # or np.eye if you prefer manual encoding

# Assume you have:
# data_array: shape (M, 32, 30), pixel values in [0..255]
# labels: shape (M,), each label is an integer class index

# 1. Convert data to binary:
# Threshold: pixels > 128 become 1; else 0
binary_data = (data_array > 128).astype(np.float32)  # Now every pixel is 0.0 or 1.0

# 2. Flatten the images:
# Each image is 32x30 = 960 pixels
M = binary_data.shape[0]
flattened_data = binary_data.reshape(M, -1)  # shape (M, 960)

# 3. Split into train and test sets:
# Assuming labels are integer-encoded categories
train_X, test_X, train_y, test_y = train_test_split(
    flattened_data, 
    labels, 
    test_size=0.2, 
    random_state=42
)

# 4. One-hot encode the labels:
# Suppose your labels are something like [0,1,0,2,...]
num_classes = len(np.unique(labels))
train_y_oh = to_categorical(train_y, num_classes=num_classes)
test_y_oh = to_categorical(test_y, num_classes=num_classes)

print("train_X shape:", train_X.shape)  # (M_train, 960)
print("train_y_oh shape:", train_y_oh.shape)  # (M_train, num_classes)
print("test_X shape:", test_X.shape)    # (M_test, 960)
print("test_y_oh shape:", test_y_oh.shape)   # (M_test, num_classes)


ValueError: invalid literal for int() with base 10: 'steffi'