<a href="https://colab.research.google.com/github/Raghav1378/Deep-Learning/blob/main/MLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Layer Perceptron (MLP) Theory

A Multi-Layer Perceptron (MLP) is a type of artificial neural network characterized by its feedforward architecture and multiple layers of interconnected nodes. It's a fundamental building block in deep learning.

## Key Theoretical Points:

### 1. Architecture:
* **Input Layer:** The first layer that receives the raw input data. The number of nodes here corresponds to the number of features in your dataset.
* **Hidden Layers:** One or more layers situated between the input and output layers. These layers are crucial for learning complex, non-linear relationships within the data. The number of hidden layers (depth) and the number of neurons in each hidden layer (width) are hyperparameters.
* **Output Layer:** The final layer of the network, responsible for producing the model's prediction. The number of nodes in this layer depends on the task:
    * **Regression:** Typically one node for predicting a continuous value.
    * **Binary Classification:** One node (often with a sigmoid activation for probability).
    * **Multi-class Classification:** One node per class (often with a softmax activation for class probabilities).

### 2. Neurons (Nodes):
Each neuron in a hidden or output layer performs two main operations:
* **Weighted Sum:** It calculates a weighted sum of its inputs from the previous layer, adding a bias term.
    * `Net Input (z) = Σ (weight_i * input_i) + bias`
* **Activation:** It then passes this weighted sum through a non-linear activation function.
    * `Output (a) = Activation_Function(z)`

### 3. Weights and Biases:
* **Weights ($W$):** Numerical values that represent the strength of the connection between neurons in adjacent layers. They determine the importance of each input to a neuron.
* **Biases ($b$):** Additional numerical values added to the weighted sum of inputs. They allow the activation function to be shifted, enabling the network to learn more complex patterns and adjust the output independent of the input values.
* Both weights and biases are the **learnable parameters** of the MLP, meaning they are adjusted during the training process.

### 4. Activation Functions:
These functions introduce non-linearity into the network, enabling MLPs to learn and approximate complex, non-linear relationships in the data. Without them, an MLP, regardless of its depth, would effectively be just a single linear transformation.
* **ReLU (Rectified Linear Unit):** `f(x) = max(0, x)` - Popular in hidden layers for its computational efficiency and ability to mitigate vanishing gradients.
* **Sigmoid:** `f(x) = 1 / (1 + e^(-x))` - Outputs values between 0 and 1, often used in the output layer for binary classification.
* **Tanh (Hyperbolic Tangent):** `f(x) = (e^x - e^(-x)) / (e^x + e^(-x))` - Outputs values between -1 and 1, similar to sigmoid but zero-centered.
* **Softmax:** `f(x_i) = e^(x_i) / Σ e^(x_j)` - Used in the output layer for multi-class classification, converting raw scores into a probability distribution where the sum of probabilities is 1.

### 5. Forward Propagation:
This is the process of feeding the input data through the network, from the input layer, through all hidden layers, to the output layer, to generate a prediction. It's a series of matrix multiplications and activation function applications.

### 6. Loss Function (Cost Function):
A mathematical function that quantifies the difference or "error" between the network's predicted output and the actual target values. The primary goal of training is to minimize this loss.
* **Mean Squared Error (MSE):** Common for regression problems.
* **Binary Cross-Entropy:** Used for binary classification.
* **Categorical Cross-Entropy:** Used for multi-class classification.

### 7. Backpropagation:
The fundamental algorithm for training MLPs. It's an efficient method to compute the gradients of the loss function with respect to each weight and bias in the network.
* It works by propagating the error backwards from the output layer to the input layer, using the chain rule of calculus to determine how much each weight and bias contributed to the overall error.
* These gradients indicate the direction and magnitude by which the parameters should be adjusted to reduce the loss.

### 8. Optimization Algorithm (e.g., Gradient Descent):
Once gradients are computed via backpropagation, an optimizer uses them to update the network's weights and biases.
* **Gradient Descent:** Iteratively adjusts parameters in the direction opposite to the gradient of the loss function.
* **Learning Rate:** A crucial hyperparameter that controls the step size of these updates. A small learning rate leads to slow convergence but potentially better accuracy; a large learning rate can cause overshooting and divergence.
* Common variants: Stochastic Gradient Descent (SGD), Adam, RMSprop, Adagrad.

### 9. Training Process:
The iterative procedure to teach the MLP to perform a task:
1.  **Initialization:** Weights and biases are typically initialized randomly (e.g., with small random values or specific techniques like Xavier/He initialization).
2.  **Iterative Loop (Epochs):** The entire dataset is passed through the network multiple times (epochs).
    * **Forward Pass:** Input data goes through the network, and a prediction is made.
    * **Loss Calculation:** The loss function quantifies the error between prediction and actual target.
    * **Backward Pass (Backpropagation):** Gradients of the loss with respect to all weights and biases are calculated.
    * **Parameter Update:** An optimizer uses the calculated gradients and the learning rate to adjust weights and biases.

### 10. Hyperparameters:
These are parameters whose values are set **before** the training process begins, rather than being learned by the model. Their selection significantly impacts model performance.
* Number of hidden layers
* Number of neurons per hidden layer
* Choice of activation functions
* Learning rate
* Batch size (number of samples processed before updating weights)
* Number of epochs
* Regularization strength (e.g., L1, L2 for `alpha` in `sklearn`)

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import make_classification # To create a synthetic dataset

# -----------------------------------------------------------------------------
# 1. Generate Synthetic Data
# -----------------------------------------------------------------------------
# We'll create a simple binary classification dataset.
# X: features, y: target labels
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

print("Shape of X:", X.shape)
print("Shape of y:", y.shape)

# -----------------------------------------------------------------------------
# 2. Split Data into Training and Testing Sets
# -----------------------------------------------------------------------------
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("\nTraining data shape (X_train, y_train):", X_train.shape, y_train.shape)
print("Testing data shape (X_test, y_test):", X_test.shape, y_test.shape)

# -----------------------------------------------------------------------------
# 3. Initialize and Train the MLP Classifier
# -----------------------------------------------------------------------------
# MLPClassifier parameters:
#   hidden_layer_sizes: Tuple, i-th element represents the number of neurons in the i-th hidden layer.
#                       (e.g., (100,) means one hidden layer with 100 neurons)
#                       (e.g., (50, 20) means two hidden layers with 50 and 20 neurons respectively)
#   activation: Activation function for the hidden layer. ('relu', 'tanh', 'logistic', 'identity')
#   solver: The algorithm for weight optimization. ('adam', 'lbfgs', 'sgd')
#   alpha: L2 regularization term parameter.
#   learning_rate_init: The initial learning rate used.
#   max_iter: Maximum number of iterations (epochs).
#   random_state: Seed for reproducibility.

mlp = MLPClassifier(hidden_layer_sizes=(100, 50), # Two hidden layers: 100 neurons then 50 neurons
                    activation='relu',             # ReLU activation for hidden layers
                    solver='adam',                 # Adam optimizer
                    alpha=0.0001,                  # L2 regularization
                    batch_size='auto',             # Automatically determines batch size
                    learning_rate_init=0.001,      # Initial learning rate
                    max_iter=500,                  # Maximum number of epochs
                    verbose=True,                  # Print progress messages to stdout
                    random_state=42)

print("\nTraining the MLP Classifier...")
mlp.fit(X_train, y_train)
print("Training complete.")

# -----------------------------------------------------------------------------
# 4. Make Predictions and Evaluate the Model
# -----------------------------------------------------------------------------
y_pred = mlp.predict(X_test)

print("\nModel Evaluation:")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# -----------------------------------------------------------------------------
# 5. Inspecting Model Parameters (Optional)
# -----------------------------------------------------------------------------
# print("\nNumber of layers:", mlp.n_layers_)
# print("Output layer activation:", mlp.out_activation_)
# print("Number of iterations (epochs):", mlp.n_iter_)
# print("Loss at the end of training:", mlp.loss_)

In [None]:
print("\nNumber of layers:", mlp.n_layers_)
print("Output layer activation:", mlp.out_activation_)
print("Number of iterations (epochs):", mlp.n_iter_)
print("Loss at the end of training:", mlp.loss_)