# Neural Networks

The very first and the simplest neural network model was known as Perceptron, which was designed by Rosenblatt in 1977. Perceptron was inspired by the human nervous system. The working mechanism of perceptron is very similar to that of logistic regression.

## Logistic Regression vs Perceptron

Logistic regression is a binary classification algorithm that uses a linear function followed by a logistic/sigmoid activation function to estimate the probability of belonging to a particular class.

A perceptron is a basic building block of a neural network. It takes a set of input features, applies weights to them, and produces an output using a step function.

![Perceptron vs Logistic Regression](./../../assets/perceptron.jpg)

The above diagram is the generalization of both perceptron and logistic regression together. In both the cases (Perceptron and Logistic Regression) the weighted sum of the input features, along with the bias term, is passed through a activation function to produce the predicted output. The only difference between Perceptron and Logistic Regression is the activation function.

For logistic regression:
    $f(z_i) = \frac{1}{1 + e^{-z_i}}$

For Perceptron:
$f(z_i) = \begin{cases}
    1, & \text{if } z_i > 0 \\
    0, & \text{otherwise}
\end{cases}$



In [1]:
import numpy as np

class Perceptron:
    def __init__(self, num_features):
        self.num_features = num_features
        self.weights = np.zeros(num_features)
        self.bias = 0
        self.activation_function = lambda x: 1 if x >= 0 else 0

    def predict(self, x):
        linear_output = np.dot(self.weights, x) + self.bias
        return self.activation_function(linear_output)

In [2]:
# Create a perceptron with 2 input features and using the step function as the activation function
perceptron = Perceptron(2)

# Set the weights and bias
perceptron.weights = np.array([0.5, -0.5])
perceptron.bias = 0.2

# Make predictions
x = np.array([0.3, -0.7])
prediction = perceptron.predict(x)
print("Perceptron prediction:", prediction)

Perceptron prediction: 1


## Multi-Layer Perceptron

MLP (aka feedforward neural network) is an extension of the perceptron, where multiple layers of neurons are stacked together. It consists of an input layer, one or more hidden layers, and an output layer. Each neuron in a layer is connected to all neurons in the adjacent layers. The hidden layers enable the model to learn more complex representations of the input data.

The architecture of MLP can be represented as follows:

![Multi-Layer Perceptron](./../../assets/mlp.png)

```scss
Input Layer -> Receives the input features and passes them to the hidden layers
Hidden Layer(s) -> Applies a weighted sum of inputs and an activation function to produce an output
Output Layer -> Applies the same process to generate the final output
```

## Activation Functions

An activation function introduces non-linearity into the network, enabling it to learn complex relationships between inputs and outputs. Commonly used activation functions include:

**Sigmoid:** It squashes the weighted sum into the range [0, 1].

$Sigmoid(x) = \frac{1}{1 + e^{-x}}$

**Tanh:** Similar to sigmoid, but squashes the weighted sum into the range [-1, 1].

$Tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$

**ReLU (Rectified Linear Unit):** Sets negative values to 0 and keeps positive values as they are.

$ReLU(x) = max(0, x)$

**Leaky ReLU:** Similar to ReLU, but allows a small negative value for negative inputs.

$Leaky-ReLU(x) = max(0.01*x, x)$

Here are few more activation functions used around, with their corresponding graphs:

![Activation Functions](./../../assets/activation.jpeg)

## Forward Propagation

Forward propagation is the process of computing the output of an MLP given an input. Each neuron takes the weighted sum of its inputs, adds a bias term, and applies an activation function to produce the output.

The forward propagation equations for an MLP with one hidden layer can be represented as follows:

```scss
z1 = activation_function(W1 * x + b1)
z2 = activation_function(W2 * z1 + b2)
...
output = activation_function(W_output * z_last_hidden + b_output)
```

where:

- `x` is the input vector,
- `W1, W2, ..., W_output` are the weight matrices for each layer,
- `b1, b2, ..., b_output` are the bias vectors for each layer,
- `z1, z2, ..., z_last_hidden` are the outputs of each hidden layer, and
- `output` is the final output of the MLP.

In [3]:
import numpy as np

class MLP:
    def __init__(self, input_size, hidden_sizes, output_size, activation='sigmoid'):
        self.input_size = input_size
        self.hidden_sizes = hidden_sizes
        self.output_size = output_size

        self.weights = []
        self.biases = []

        layer_sizes = [input_size] + hidden_sizes + [output_size]

        for i in range(len(layer_sizes) - 1):
            weight_matrix = np.random.randn(layer_sizes[i], layer_sizes[i+1])
            self.weights.append(weight_matrix)
            bias_vector = np.zeros(layer_sizes[i+1])
            self.biases.append(bias_vector)

        if activation == 'sigmoid':
            self.activation_function = self._sigmoid
        elif activation == 'tanh':
            self.activation_function = self._tanh
        elif activation == 'relu':
            self.activation_function = self._relu
        else:
            raise ValueError("Unsupported activation function.")

    def forward(self, x):
        activations = [x]

        for i in range(len(self.weights)):
            linear_output = np.dot(activations[-1], self.weights[i]) + self.biases[i]
            output = self.activation_function(linear_output)
            activations.append(output)

        return activations[-1]

    def _sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def _tanh(self, x):
        return np.tanh(x)

    def _relu(self, x):
        return np.maximum(0, x)

This implementation includes the architecture, forward propagation, and activation functions for the MLP. It supports different activation functions such as sigmoid, tanh, and ReLU.

You can then create an instance of the MLP class and use it for various tasks, including classification and regression.

In [4]:
# Example usage
mlp = MLP(input_size=4, hidden_sizes=[8, 6], output_size=3, activation='sigmoid')
input_data = np.random.randn(10, 4)  # Example input data with 10 samples
output = mlp.forward(input_data)  # Compute the MLP's output
print(output)

[[0.54564303 0.67919588 0.37609245]
 [0.54153638 0.72084591 0.20390913]
 [0.52509919 0.71018972 0.2952373 ]
 [0.53787453 0.68127392 0.34917849]
 [0.58678118 0.63960352 0.29108797]
 [0.5503897  0.65551025 0.3184623 ]
 [0.67536001 0.68319327 0.12238282]
 [0.71297346 0.63467466 0.12780863]
 [0.59128551 0.68059875 0.2756035 ]
 [0.64998348 0.71544514 0.10785669]]


In this example, we create an MLP with an input size of 4, two hidden layers of sizes 8 and 6, and an output size of 3. We then generate random input data and compute the MLP's output using the `forward` method.

## References

- [MultilayerPerceptron: A simple multilayer neural network](https://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/)