# **Artificial Neural Networks**

Neural Networks: A neural network is a computational model inspired by the way biological neural networks in the human brain process information. It consists of interconnected nodes (neurons) that work together to take an input value and compute the desired output.

Emergence of Deep Learning:
The various shortcomings of traditional machine learning (ML) have led to the emergence of deep learning:
1. Inability to process high-dimensional data
2. Manual feature engineering
3. Not ideal for image processing
4. Inefficient with multiple data points

Components of a Neural Network:

1. Layers: hold neurons and pass their outputs to subsequent layers. Each layer performs specific transformations on the data.

2. Weights: are initialized randomly at the beginning and optimized during training to minimize the loss. 

3. Bias: are additional parameters used to adjust the output along with the weighted sum of the inputs.

4. Activation Functions: compute the output of a neuron by applying a non-linear transformation to the weighted sum of its inputs.

Feedforward and Backpropagation

Feedforward: connections between nodes do not form cycles. The process involves moving inputs through the network to produce an output.

Backpropagation: an algorithm for supervised learning of neural networks. It uses gradient descent to minimize the error by adjusting weights in the network.

## **Training a Neural Network in Python from Scratch**

### **Example: Solving the XOR Problem**


### Importing Libraries

In [10]:
import numpy as np

### Defining the XOR Input and Output

In [11]:
# XOR input values
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# XOR output values
y = np.array([[0], [1], [1], [0]])

### Initializing Weights and Bias

In [12]:
# Randomly initializing weights and bias
np.random.seed(42)  # For reproducibility
weights_input_hidden = np.random.rand(2, 2)  # 2 input nodes to 2 hidden nodes
weights_hidden_output = np.random.rand(2, 1)  # 2 hidden nodes to 1 output node
bias_hidden = np.random.rand(1, 2)
bias_output = np.random.rand(1, 1)

Setting the seed ensures that the random numbers generated are the same each time the code is run. This is important for reproducibility, allowing you to obtain the same results every time you run your code.

np.random.rand(2, 2) generates a 2x2 matrix of random numbers between 0 and 1.
This matrix represents the weights connecting the 2 input nodes to the 2 hidden nodes.
Each element in this matrix is a weight associated with a connection between an input node and a hidden node.

np.random.rand(2, 1) generates a 2x1 matrix of random numbers between 0 and 1.
This matrix represents the weights connecting the 2 hidden nodes to the single output node.
Each element in this matrix is a weight associated with a connection between a hidden node and the output node.

np.random.rand(1, 2) generates a 1x2 matrix of random numbers between 0 and 1.
This matrix represents the biases for the 2 hidden nodes.
Each element in this matrix is a bias term added to the input of a hidden node before applying the activation function.

np.random.rand(1, 1) generates a single random number between 0 and 1.
This number represents the bias for the output node.
This bias term is added to the input of the output node before applying the activation function.

### Defining the Activation Function and Its Derivative

In [13]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

### Forward Propagation

In [14]:
def forward_propagation(X, weights_input_hidden, weights_hidden_output, bias_hidden, bias_output):
    # Compute the hidden layer activation
    hidden_input = np.dot(X, weights_input_hidden) + bias_hidden
    hidden_output = sigmoid(hidden_input)
    
    # Compute the output layer activation
    final_input = np.dot(hidden_output, weights_hidden_output) + bias_output
    final_output = sigmoid(final_input)
    
    return hidden_output, final_output

np.dot(X, weights_input_hidden): This computes the dot product of the input data X and the weight matrix.

+ If X has dimensions (m, n) where m is the number of data points and n is the number of input features.

+ If weights_input_hidden has dimensions (n, h) where h is the number of neurons in the hidden layer.

+ The result of np.dot(X, weights_input_hidden) is a matrix of dimensions (m, h).

bias_hidden: The bias vector bias_hidden (with dimensions (1, h)) is added to each row of the result. Broadcasting in NumPy ensures that the bias is added to each data point’s computation.


hidden_input now holds the pre-activation values (weighted sums) for the hidden layer neurons.


The sigmoid(hidden_input) applies the sigmoid activation function element-wise to the hidden_input matrix, resulting in the activations of the hidden layer neurons. The dimensions of hidden_output are (m, h).


np.dot(hidden_output, weights_hidden_output): This computes the dot product of the hidden layer activations hidden_output and the weight matrix weights_hidden_output.

+ If hidden_output has dimensions (m, h).

+ If weights_hidden_output has dimensions (h, o) where o is the number of neurons in the output layer.

+ The result of np.dot(hidden_output, weights_hidden_output) is a matrix of dimensions (m, o).

bias_output: The bias vector bias_output (with dimensions (1, o)) is added to each row of the result.

final_input now holds the pre-activation values (weighted sums) for the output layer neurons.

The sigmoid(final_input) applies the sigmoid activation function element-wise to the final_input matrix, resulting in the activations of the output layer neurons. The dimensions of final_output are (m, o).

The function returns the activations of the hidden layer (hidden_output) and the output layer (final_output).

### Backpropagation Algorithm

In [15]:
def backpropagation(X, y, hidden_output, final_output, weights_input_hidden, weights_hidden_output, bias_hidden, bias_output, learning_rate=0.1):
    # Calculate the output error
    output_error = y - final_output
    output_delta = output_error * sigmoid_derivative(final_output)
    
    # Calculate the hidden layer error
    hidden_error = output_delta.dot(weights_hidden_output.T)
    hidden_delta = hidden_error * sigmoid_derivative(hidden_output)
    
    # Update weights and biases
    weights_hidden_output += hidden_output.T.dot(output_delta) * learning_rate
    bias_output += np.sum(output_delta, axis=0) * learning_rate
    weights_input_hidden += X.T.dot(hidden_delta) * learning_rate
    bias_hidden += np.sum(hidden_delta, axis=0) * learning_rate

Calculate the output error:
output_error = y - final_output: Computes the difference between the actual output (y) and the predicted output (final_output).


Calculate the output delta:
output_delta = output_error * sigmoid_derivative(final_output): Applies the derivative of the sigmoid function to the output error to get the gradient of the error with respect to the output.

Calculate the hidden layer error:
hidden_error = output_delta.dot(weights_hidden_output.T): Computes the error contribution from the output layer to the hidden layer by multiplying the output delta with the transpose of the hidden-to-output weights.

Calculate the hidden delta:
hidden_delta = hidden_error * sigmoid_derivative(hidden_output): Applies the derivative of the sigmoid function to the hidden error to get the gradient of the error with respect to the hidden layer output.

Update weights and biases:

weights_hidden_output += hidden_output.T.dot(output_delta) * learning_rate: Adjusts the weights between the hidden and output layers using the learning rate.

bias_output += np.sum(output_delta, axis=0) * learning_rate: Adjusts the biases of the output layer.

weights_input_hidden += X.T.dot(hidden_delta) * learning_rate: Adjusts the weights between the input and hidden layers.

bias_hidden += np.sum(hidden_delta, axis=0) * learning_rate: Adjusts the biases of the hidden layer.

These steps iteratively minimize the error, improving the neural network's performance over time.

### Training the Neural Network

In [16]:
epochs = 10000
learning_rate = 0.1

for epoch in range(epochs):
    hidden_output, final_output = forward_propagation(X, weights_input_hidden, weights_hidden_output, bias_hidden, bias_output)
    backpropagation(X, y, hidden_output, final_output, weights_input_hidden, weights_hidden_output, bias_hidden, bias_output, learning_rate)
    
    if epoch % 1000 == 0:
        loss = np.mean(np.square(y - final_output))
        print(f'Epoch {epoch}, Loss: {loss}')


Epoch 0, Loss: 0.287974821321425
Epoch 1000, Loss: 0.24943329766543196
Epoch 2000, Loss: 0.24567537147115226
Epoch 3000, Loss: 0.21996241841579695
Epoch 4000, Loss: 0.1621992454420142
Epoch 5000, Loss: 0.05270887579146118
Epoch 6000, Loss: 0.016926012420416685
Epoch 7000, Loss: 0.008917785314199879
Epoch 8000, Loss: 0.005844546663693253
Epoch 9000, Loss: 0.004281790023659584


epochs = 10000: The number of times the entire dataset is passed forward and backward through the neural network.

learning_rate = 0.1: The rate at which the model updates its parameters.

Training Loop:

For each epoch (from 0 to 9999):
Forward Propagation: Compute the outputs of the network.
Backpropagation: Adjust the weights and biases to reduce the error between the predicted and actual outputs.

Loss Calculation:

Every 1000 epochs, calculate the loss (mean squared error between the actual outputs y and the predicted outputs final_output) and print it.

### Testing the Network

In [17]:
# Testing the trained network
test_input = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
_, test_output = forward_propagation(test_input, weights_input_hidden, weights_hidden_output, bias_hidden, bias_output)
print("Predictions:")
print(test_output)

Predictions:
[[0.06028403]
 [0.9444784 ]
 [0.9443732 ]
 [0.05996465]]


### Calculating accuracy

In [18]:
def calculate_accuracy(predictions, labels):
    predictions_binary = (predictions >= 0.5).astype(int)
    accuracy = np.mean(predictions_binary == labels)
    return accuracy


# Calculating final accuracy on the test input
final_accuracy = calculate_accuracy(test_output, y)
print(f'Final Accuracy: {final_accuracy * 100}%')

Final Accuracy: 100.0%
