# Modeling and Simulation 6 - Introduction to Machine Learning

## The Perceptron

![image.png](attachment:fb6cae6e-3635-4d42-ab6a-f4167f167025.png)

The perceptron is an algorithm for learning a binary classifier called a threshold function: a function that maps its input $\mathbf {x}$ (a real-valued vector) to an output value $f(\mathbf {x} )$ (a single binary value): 

$$ f(\mathbf {x} )={\begin{cases}1&{\text{if }}\ \mathbf {w} \cdot \mathbf {x} +b>0,\\0&{\text{otherwise}}\end{cases}} $$

On a more informal level, a perceptron is a simple type of artificial neuron, which is a fundamental building block of artificial neural networks. It is a basic model used for binary classification tasks, where the goal is to separate data points into one of two classes or categories. The perceptron is considered a linear classifier because it draws a linear decision boundary to distinguish between the two classes.

In simple terms, a perceptron takes multiple input values, processes them, and produces a single output value. It does this by performing the following steps:

* Assigning weights: Each input value is multiplied by a corresponding weight. Weights are initially set to random values and are adjusted during the training process. The weights determine the importance of each input in making the final decision.

* Summation: The weighted input values are summed together, and a bias value is added. The bias acts as a threshold and is also adjusted during training. It allows the perceptron to shift the decision boundary.

* Activation: The sum of the weighted inputs and the bias is passed through an activation function, which determines the output of the perceptron. In the case of a simple perceptron, the activation function is usually a binary step function. If the weighted sum is greater than 0, the output is 1, and if it's less than or equal to 0, the output is 0.

Training a perceptron involves adjusting the weights and bias so that the perceptron can make accurate predictions for the given data points. This is done by iteratively updating the weights and bias based on the error between the perceptron's prediction and the target output for each training example, as it is a supervised learning algorithm.

### Supervised Learning

Supervised learning is a type of machine learning where a model is trained to make predictions or decisions based on labeled input-output pairs. In this learning paradigm, the training dataset consists of input data along with their corresponding target outputs or labels. The main goal of supervised learning is to learn a relationship or mapping between the input data and the target outputs, so that the model can make accurate predictions for new, unseen data.

Supervised learning can be broadly categorized into two types of problems:

- Regression: In regression problems, the target output is a continuous numerical value. The goal is to predict this value based on the input features. Examples include predicting house prices based on various attributes, predicting stock prices, or predicting the temperature for a specific day.

- Classification: In classification problems, the target output is a discrete class label. The goal is to assign the input data to one of several predefined categories or classes. Examples include email spam detection, handwritten digit recognition, or medical diagnosis based on patient data.

In this lab, we will focus only on classification.

### Activation Functions

Activation functions are mathematical functions used in artificial neural networks, primarily to introduce non-linearity into the model. They play a crucial role in determining the output of a neuron, based on the weighted sum of its inputs and a bias value. Activation functions help neural networks learn and model complex patterns and relationships in the input data.

Some commonly used activation functions:

1. Sigmoid (Logistic) function:
    The sigmoid function maps input values to a range between 0 and 1. It is widely used for binary classification tasks.

    $$ \sigma(x) = \frac{1}{1 + e^{-x}}$$

2. Hyperbolic Tangent (tanh) function:
    The tanh function maps input values to a range between -1 and 1. It is similar to the sigmoid function but has a wider output range.

    $$ tanh⁡(x)=\frac{e^x - e^{-x}}{e^x + e^{-x}} $$

3. Rectified Linear Unit (ReLU) function:
    The ReLU function is a popular activation function in deep learning, mainly because of its simplicity and efficient computation. It is defined as the maximum of 0 and the input value.

    $$ ReLU(x)=max⁡(0,x) $$

4. Leaky Rectified Linear Unit (Leaky ReLU) function:
    The Leaky ReLU function is a variation of the ReLU function that allows a small, non-zero gradient for negative input values. It is defined by a slope parameter α (typically 0.01) for negative input values.

    $$ LeakyReLU(x)= \begin{cases} x, \text{if } x \geq 0 \\ \alpha x, \text{otherwise} \end{cases}$$

5. Exponential Linear Unit (ELU) function:
    The ELU function is another variation of the ReLU function that has a smooth, non-zero gradient for negative input values. It is defined by a parameter α (typically 1) for negative input values.

    $$ ELU(x)= \begin{cases} x, \text{if } x \geq 0 \\ \alpha (e^x - 1), \text{otherwise} \end{cases} $$
    
6. The binary step function is another activation function commonly used in simpler neural networks like perceptrons. It is a threshold-based function that maps input values to either 0 or 1, depending on whether the input is below or above a specific threshold.

    $$ BinaryStep(x)= \begin{cases} 1, \text{if } x \geq \text{threshold} \\ 0, \text{otherwise} \end{cases} $$

### Task 1

We will create a simple Perceptron-based algorithm, that will take a 2x2 grid of binary values as an input. We will predict whether all cells are 1, all cells are 0 or if there is a mix of 0s and 1s.

We will flatten the 2x2 grid into an array of 4 elements. The output should be:
* 0 if all cells have a value of 0
* 1 if all cells have a value of 1
* 2 if there is a mix

As a consequence, we will need 3 perceptrons, each with 4 inputs, and we should train them such that only one perceptron will be activated for a given input.

In [None]:
import random

class Perceptron:
    def __init__(self, num_inputs):
        self.weights = [random.uniform(-1, 1) for _ in range(num_inputs)]
        self.bias = random.uniform(-1, 1)

    def activate(self, x):
        # TODO: Use a simple binary step activation function with a threshold = 0
        return 0

    def predict(self, inputs):
        # TODO: Calculate the weighted sum of inputs and add the bias
        weighted_sum = 0
        
        return self.activate(weighted_sum)

    # Inputs will be an array containing the 4 values of the grid, target will be the desired prediction value
    def train(self, inputs, target, lr=0.1):
        # TODO: Compute the error of the perceptron's prediction
        
        # TODO: Update all the weights according to the formula w_k+1 = w_k + lr * error * inputs
        
        # TODO: Update the bias according to the formula bias_k+1 = bias_k + lr * error
        return

In [None]:
data = [
    ([0, 0, 0, 0], 0),
    ([1, 1, 1, 1], 1),
    ([0, 1, 0, 1], 2),
    ([1, 0, 1, 0], 2),
    # Add more examples if needed
]

# Epochs are "iterations" in machine-learning speech
num_epochs = 1000
# We'll use 3 perceptrons with 4 inputs each.
perceptrons = [Perceptron(4) for _ in range(3)]

# TODO: Train the perceptrons for `num_epochs` iterations
# hint: You should train each perceptron on the whole data set in each iteration.

def predict_grid(grid):
    # TODO: Return the index of the perceptron with the highest confidence prediction for the given grid
    return 0

In [None]:
# Test the model
test_data = [
    [0, 0, 0, 0],
    [1, 1, 1, 1],
    [0, 1, 0, 1],
    [1, 0, 0, 0],
]

for grid in test_data:
    print(f"Grid: {grid}, Prediction: {predict_grid(grid)}")

## Feedforward Neural Networks

Neural networks, also known as artificial neural networks (ANNs), are a class of machine learning models inspired by the structure and functioning of the biological neural networks found in the human brain. They consist of interconnected nodes or artificial neurons organized in layers, which work together to learn complex patterns and relationships in data.

A neural network typically consists of three main types of layers:

- Input layer: The input layer receives the input data and passes it to the next layer. Each node in the input layer corresponds to a single feature or attribute of the input data.

- Hidden layer(s): These are the layers between the input and output layers. They perform the majority of the computations and transformations in the network. A neural network can have multiple hidden layers, and the number of nodes in each layer can vary. Deep learning models have many hidden layers, allowing them to learn more complex and abstract representations of the data.

- Output layer: The output layer produces the final predictions or decisions made by the neural network. The number of nodes in the output layer depends on the task being performed, such as the number of classes in a classification problem or a single node for a regression problem.

The connections between nodes in the network have associated weights and biases, which are adjusted during the training process. The goal of training a neural network is to find the optimal values for these weights and biases, allowing the network to make accurate predictions or decisions based on the input data.

Neural networks use activation functions, such as sigmoid, ReLU, or tanh, to introduce non-linearity and enable the network to learn complex, non-linear relationships in the data. The learning process in neural networks is typically performed using gradient-based optimization algorithms like gradient descent or its variants, along with backpropagation for computing the gradients efficiently.

Feedforward neural networks are a type of artificial neural network where the connections between nodes do not form any cycles or loops. The information flows in one direction, from the input layer through the hidden layers, and finally to the output layer. This unidirectional flow of information gives them the name "feedforward." Feedforward neural networks are among the simplest and most common types of neural networks used for various machine learning tasks.

Training a feedforward neural network involves two main steps: the forward pass and the backward pass (also known as backpropagation).

- Forward pass: In the forward pass, the input data is passed through the network to produce an output. For each layer in the network, the weighted sum of inputs from the previous layer is calculated, and an activation function is applied to introduce non-linearity. This process is repeated for all layers in the network until the output layer generates the final prediction.

- Backward pass (backpropagation): In the backward pass, the error between the predicted output and the target output is computed, and this error is propagated back through the network to update the weights and biases. The backpropagation algorithm computes the gradient of the error with respect to each weight and bias by applying the chain rule of calculus. Once the gradients are computed, an optimization algorithm, such as gradient descent or its variants (e.g., stochastic gradient descent, Adam, RMSprop), updates the weights and biases using these gradients.

This forward and backward pass process is repeated iteratively for multiple epochs (complete passes through the training dataset) until the neural network converges to an optimal set of weights and biases that minimize the prediction error.

Feedforward neural networks can achieve a wide range of tasks, including:

- Regression: Predicting continuous numerical values based on input features, such as predicting house prices, energy consumption, or sales forecasts.

- Classification: Assigning input data points to one of several predefined classes or categories, such as email spam detection, handwritten digit recognition, or medical diagnosis.

- Feature extraction: Learning meaningful representations of input data, which can be used as input for other machine learning models or for dimensionality reduction.

- Function approximation: Learning an approximation of a function or a mapping between input and output spaces, which can be useful in control systems, robotics, or optimization problems.

Feedforward neural networks have been successfully applied in various domains such as computer vision, natural language processing, speech recognition, and more. However, they have some limitations, such as a lack of memory or context, which make them less suitable for certain tasks, like time series analysis or sequence prediction. For these tasks, other types of neural networks, such as recurrent neural networks (RNNs) or long short-term memory (LSTM) networks, are more appropriate.

### Task 2

We'll use a 2x2 grid of values that can either take 0, 1 or 2. We will try to predict the dominant value in the grid.

The grid will also be flattened into an array and used as an input. The output will be again 0, 1 or 2, associated with the dominant value. If there is a tie between the values, any one can be considered dominant.

We'll consider 4 input neurons, 5 neurons in the hidden layerand 3 output neurons.

In [None]:
import numpy as np

def relu(x):
    # TODO: the standard ReLU function
    # hint: keep in mind you work with np.arrays
    return 0

def relu_derivative(x):
    # TODO: the derivative of the ReLU function, pretty self-explanatory
    return 0

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # We initialize all weights and biases with random values in [-1, 1]
        self.W1 = np.random.uniform(-1, 1, (input_size, hidden_size))
        self.b1 = np.random.uniform(-1, 1, hidden_size)
        self.W2 = np.random.uniform(-1, 1, (hidden_size, output_size))
        self.b2 = np.random.uniform(-1, 1, output_size)

    # This function will "feed forward" the input through the NN. It should return the values of the output neurons.
    def forward(self, x):
        # TODO: Compute the weighted sums for all neurons in the hidden layer
        # hint: z1 = x * W1 + b1
        self.z1 = self.z1
        # TODO: Apply the ReLU activation function on the neurons to compute the signals that should be fed to the next layer
        self.a1 = self.a1
        # TODO: Compute the weighted sums for all neurons in the output layer
        self.z2 = self.z2
        # We will now apply the softmax activation function for the output neurons, as it will "highlight" the output with the highest confidence
        self.a2 = np.exp(self.z2) / np.sum(np.exp(self.z2), axis=0)
        return self.a2

    def train(self, x, y, learning_rate=0.1):
        output = self.forward(x)

        # Backpropagation
        # TODO: Compute the error of the output as a vector
        delta_output = 0
        # TODO: Compute the gradients of the weights and biases in the output layer
        # dW2 = a1' * error
        # db2 = error

        # TODO: Compute the error at the hidden layer:
        # error_hidden = error * W2' * ReLU'(z1)
        delta_hidden = 0
        
        # TODO: Compute the gradients of the weights and biases in the hidden layer, similarily to the output layer

        # TODO: Update weights and biases given the gradients computed and the learning rate factor
        # hint: This is the *conjugate* gradient method, so we will go in the opposite way of the gradient
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2

In [None]:
data = [
    ([0, 0, 0, 0], 0),
    ([1, 1, 1, 1], 1),
    ([2, 2, 2, 2], 2),
    ([0, 1, 0, 1], 0),
    ([1, 2, 1, 2], 1),
    ([2, 0, 2, 0], 2),
    # Add more examples if needed
]

input_size = 4
hidden_size = 5
output_size = 3
num_epochs = 5000
learning_rate = 0.01

nn = NeuralNetwork(input_size, hidden_size, output_size)

# TODO: Train the NN on the training data for `num_epochs` iterations

def predict_grid(grid):
    predictions = nn.forward(grid)
    # TODO: Return the prediction with the maximum confidence value
    return 0

In [None]:
# Test the model
test_data = [
    [0, 0, 0, 0],
    [1, 1, 1, 1],
    [2, 2, 2, 2],
    [0, 1, 0, 1],
    [1, 0, 0, 0],
    [2, 0, 2, 0],
]

for grid in test_data:
    print(f"Grid: {grid}, Prediction: {predict_grid(grid)}")