<a href="https://colab.research.google.com/github/aherre52/MAT422/blob/main/HW_3_7_MAT_422.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **HW 3.7: Neural Networks**

Concepts covered:


* 3.7.1. Mathematical formulation
* 3.7.2. Activation functions
* 3.7.3. Cost function
* 3.7.4. Backpropagation
* 3.7.5. Backpropagation algorithm

# 3.7.1. Mathematical formulation

Artificial neural networks (ANNs) are computational models inspired by the structure of biological neural networks, with layers of interconnected nodes (neurons) that learn from data to perform tasks such as classification or regression. In a basic neural network, inputs $x_1$ and $x_2$ pass through nodes with weights $w_1$ and $w_2$ and a bias $b$, yielding an output $\hat{y}$ through an activation function $\sigma(z)$. The network adjusts its weights and biases using a learning algorithm that minimizes a cost function, gradually improving the output's accuracy with respect to the target values. For a neural network with multiple layers, each node in layer $l$ is calculated by combining values from the previous layer $l-1$ using weights and biases, forming expressions like $z^{(l)} = W^{(l)} a^{(l-1)} + b^{(l)}$, where $W^{(l)}$ is the weight matrix and $b^{(l)}$ is the bias vector. The activation function $\sigma(z)$ is then applied, resulting in the node values $a^{(l)}$ for layer $l$, which allows for complex, non-linear mappings from inputs to outputs in the network.

# Code description

This code calculates the output of a simple neural network layer with two inputs, weights, a bias, and an activation function. It computes the linear combination $$ z = w_1 \cdot x_1 + w_2 \cdot x_2 + b $$ and then applies the sigmoid activation function $$ \hat{y} = \frac{1}{1 + e^{-z}} $$ to produce the final output. The result, $ \hat{y} $, represents the activated output of the layer given the specified parameters.

In [1]:
import numpy as np

# Define the input values arbitrarily, just two
x1 = 1.5
x2 = 2.0

# Define the weights and bias arbitrarily
w1 = 0.6  # Give weight for x1
w2 = 0.4  # Give weight for x2
b = 0.2   # the bias term

# Define the activation function (will use sigmoid in this example)
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# WIth everything set up, calculate the linear combination of
# inputs and weights plus bias
z = w1 * x1 + w2 * x2 + b

# Now apply the activation function using sigmoid
y_hat = sigmoid(z)

print("The output of the neural network layer is:", y_hat)


The output of the neural network layer is: 0.8698915256370021


# 3.7.2. Activation functions and 3.7.3. Cost Function

In neural networks, **activation functions** play a key role by determining the output of each neuron in response to inputs, often tailored to tasks like classification. Activation functions simulate the "firing" mechanism in biological neurons and are denoted by $\sigma$. For a given layer, the activation output is represented by

$$ a^{(l)} = \sigma \left( z^{(l)} \right) = \sigma \left( W^{(l)} a^{(l-1)} + b^{(l)} \right) . $$

Common activation functions include:

* **step function** $ \sigma(x) = \begin{cases} 0, & x < 0 \\ 1, & x \geq 0 \end{cases} $, which is binary and useful for classification tasks.
* **ReLU function** $ \sigma(x) = \max(0, x) $ is widely used in deep networks for its efficiency, as it either allows signals to pass or blocks them entirely.
* **sigmoid function** $ \sigma(x) = \frac{1}{1 + e^{-x}} $ outputs values between 0 and 1, often used for probabilistic interpretations
* **softmax function** transforms a vector into probabilities across multiple classes:

$$ \sigma(z)_k = \frac{e^{z_k}}{\sum_{j=1}^K e^{z_j}} . $$

To train the network, a **cost function** $ J $ is used, such as least squares for regression:

$$ J = \frac{1}{2} \sum_{n=1}^N \sum_{k=1}^K \left( \hat{y}_k^{(n)} - y_k^{(n)} \right)^2 , $$

or cross-entropy for binary classification:

$$ J = - \sum_{n=1}^N \left( y^{(n)} \ln \hat{y}^{(n)} + (1 - y^{(n)}) \ln (1 - \hat{y}^{(n)}) \right) . $$

These functions drive the learning process by optimizing network parameters through methods like gradient descent, enabling the model to accurately capture patterns in the data.

# Code description

This code demonstrates the application of common activation functions used in neural networks, including the step function, ReLU, sigmoid, and softmax, to a sample input array. Each activation function transforms the input values in a distinct way: the step function outputs binary values, ReLU blocks negative values, sigmoid squashes the input between 0 and 1, and softmax converts a vector of values into probabilities.

Additionally, the code calculates the least squares cost function, which measures the difference between predicted and true values, commonly used in regression tasks. This helps visualize both the functionality of activation functions and the evaluation of model performance using a cost function.

In [2]:
import numpy as np

# Activation Functions
def step_function(x):
    return np.where(x >= 0, 1, 0)

def relu(x):
    return np.maximum(0, x)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def softmax(x):
    exp_x = np.exp(x - np.max(x))  # for numerical stability
    return exp_x / np.sum(exp_x, axis=0)

# Sample input
x = np.array([-1.0, 0.0, 1.0])

# Apply activation functions
print("Step Function:", step_function(x))
print("ReLU:", relu(x))
print("Sigmoid:", sigmoid(x))
print("Softmax:", softmax(x))

# Cost Function - Least Squares
def least_squares_cost(y_true, y_pred):
    return 0.5 * np.sum((y_pred - y_true) ** 2)

# Sample true and predicted outputs
y_true = np.array([0.5, 1.0, 0.5])
y_pred = np.array([0.4, 1.2, 0.3])

# Calculate cost
print("Least Squares Cost:", least_squares_cost(y_true, y_pred))


Step Function: [0 1 1]
ReLU: [0. 0. 1.]
Sigmoid: [0.26894142 0.5        0.73105858]
Softmax: [0.09003057 0.24472847 0.66524096]
Least Squares Cost: 0.04499999999999999


# 3.7.4. Backpropagation and 3.7.5. Backpropagation Algorithm

Backpropagation is a key process in neural network training, aimed at minimizing the cost function $J$ with respect to the network parameters, such as weights $W$ and biases $b$. To achieve this, the derivatives of $J$ with respect to these parameters are computed using the chain rule. The quantity $\delta^{(l)}_j = \frac{\partial J}{\partial z^{(l)}_j}$ is introduced, which represents the error at each node in layer $l$. This error is propagated backward through the network, starting from the output layer, and is used to update the weights and biases. The updates are done using gradient descent, where the new weights and biases are calculated as follows:

$$
\text{New } w^{(l)}_{j,j} = \text{Old } w^{(l)}_{j,j} - \beta \delta^{(l)}_j a^{(l-1)}_j
$$
$$
\text{New } b^{(l)}_j = \text{Old } b^{(l)}_j - \beta \delta^{(l)}_j
$$

This process is repeated iteratively, adjusting the weights and biases until the desired accuracy is achieved. The derivatives depend on the chosen activation function, such as ReLU or logistic, which affects the backpropagation process.


The provided code implements a simple neural network with one input layer and one output layer, using the sigmoid activation function. During training, the network performs a forward pass where it calculates the predicted output using the input data and weights. The Mean Squared Error (MSE) cost function is used to measure the difference between the predicted output and the target. Backpropagation is then used to compute the gradients of the cost function with respect to the weights and biases, updating the parameters using gradient descent to minimize the cost. The process is repeated over multiple passes until the model converges to a solution.


In [6]:
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of sigmoid
def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

# Mean Squared Error cost function
def mse_cost(y_pred, y_true):
    return np.mean((y_pred - y_true) ** 2)

# Derivative of MSE cost function
def mse_cost_derivative(y_pred, y_true):
    return 2 * (y_pred - y_true) / y_true.size

# Forward pass function for a simple neural network with one layer
def forward_pass(X, W1, b1):
    z1 = np.dot(X, W1) + b1
    a1 = sigmoid(z1)  # activate output layer
    return z1, a1

# Backpropagation for the network
def backpropagation(X, y, W1, b1, z1, a1, learning_rate=0.1):
    # Compute the error at the output layer
    d_a1 = mse_cost_derivative(a1, y) * sigmoid_derivative(z1)

    # Compute the gradients for weights and biases
    d_W1 = np.dot(X.T, d_a1)
    d_b1 = np.sum(d_a1, axis=0, keepdims=True)

    # Update weights and biases using gradient descent
    W1 -= learning_rate * d_W1
    b1 -= learning_rate * d_b1

    # Return updated parameters
    return W1, b1

# Initialize random weights and biases for a 1-layer neural network
np.random.seed(42)
input_size = 3  # Input size /features
output_size = 1  # Size of output layer

W1 = np.random.randn(input_size, output_size)
b1 = np.zeros((1, output_size))

# Input data (2 samples and 3 features)
X = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])

# Target output (2 samples and a target)
y = np.array([[0.7], [0.9]])

# Training loop (100 iterations)
for passes in range(100):
    # Perform forward pass
    z1, a1 = forward_pass(X, W1, b1)

    # Print cost every 10 iterations
    if passes % 10 == 0:
        cost = mse_cost(a1, y)
        print(f'Pass {passes}, Cost: {cost}')

    # Update parameters using backpropagation
    W1, b1 = backpropagation(X, y, W1, b1, z1, a1)

# Output final predictions after training
z1, a1 = forward_pass(X, W1, b1)
print("\nFinal predictions:")
print(a1)


Pass 0, Cost: 0.048018052021927264
Pass 10, Cost: 0.035606833356200324
Pass 20, Cost: 0.027082366250403618
Pass 30, Cost: 0.021114523358353895
Pass 40, Cost: 0.016850443770895333
Pass 50, Cost: 0.013743216600767867
Pass 60, Cost: 0.01143756228450613
Pass 70, Cost: 0.009698444008282956
Pass 80, Cost: 0.00836725526181136
Pass 90, Cost: 0.007334835667043262

Final predictions:
[[0.69436042]
 [0.7859059 ]]
