## Artificial Neural Network (ANN) / Multi-Layer Perceptron (MLP)

---

### 1. Theoretical Intuition
- An **ANN** is a network of interconnected neurons (nodes), inspired by the human brain.  
- **Multi-Layer Perceptron (MLP)** is an ANN with **one or more hidden layers** between input and output.  
- Each neuron performs a **weighted sum of inputs**, adds a **bias**, and applies an **activation function**.  
- MLPs can solve **non-linear problems** that a single Perceptron cannot.  

---

### 2. Key Pointers
- **Layers in ANN / MLP**:  
  1. **Input Layer**: Receives features from data  
  2. **Hidden Layer(s)**: Perform computations & learn features  
  3. **Output Layer**: Produces prediction / classification  
- **Activation Functions** introduce non-linearity (ReLU, Sigmoid, Tanh, etc.)  
- **Feedforward**: Inputs → hidden layers → output  
- **Backpropagation**: Algorithm to update weights using gradient descent  
- **MLPs are universal function approximators** — can model any continuous function given enough neurons and data  

---

### 3. Use Cases
- Image recognition  
- Speech recognition  
- NLP tasks: sentiment analysis, text classification  
- Fraud detection in finance  
- Any structured/tabular prediction task  

---

### 4. Mathematical Intuition
- **Forward pass for a layer**:

\[
z^{(l)} = W^{(l)} a^{(l-1)} + b^{(l)}
\]

\[
a^{(l)} = \sigma(z^{(l)})
\]

where:  
- \(l\) = layer index  
- \(W^{(l)}\), \(b^{(l)}\) = weights & biases  
- \(a^{(l-1)}\) = activation from previous layer  
- \(\sigma\) = activation function  

- **Output layer** produces final prediction  
- **Backpropagation** uses **chain rule** to compute gradients and update weights:  

\[
W^{(l)} := W^{(l)} - \eta \frac{\partial L}{\partial W^{(l)}}
\]  

where \(L\) is the loss function, \(\eta\) is learning rate  

---

### 5. Interview Q&A

| Question | Answer |
|----------|--------|
| What is an Artificial Neural Network (ANN)? | A network of interconnected neurons that process data similar to the human brain. |
| What is a Multi-Layer Perceptron (MLP)? | An ANN with one or more hidden layers capable of solving non-linear problems. |
| Why do we need hidden layers? | Hidden layers allow the network to learn complex, non-linear patterns in data. |
| What is the role of activation functions in MLP? | Introduce non-linearity, enabling the network to approximate complex functions. |
| What is backpropagation? | Algorithm to compute gradients of loss w.r.t weights and update them using optimization. |
| What is feedforward in an MLP? | The process of passing input through all layers to compute output predictions. |
| Can MLP solve XOR problem? | Yes, unlike a single Perceptron, an MLP with hidden layer can solve XOR. |
| What is the universal approximation theorem? | MLPs with at least one hidden layer can approximate any continuous function given enough neurons. |
| What are common activation functions used? | Sigmoid, Tanh, ReLU, Leaky ReLU, Softmax (output layer for classification). |
| How are weights initialized in MLP? | Usually small random values (Xavier/He initialization) to avoid symmetry issues. |

---

### 6. Code Demo: Simple MLP in PyTorch

```python
import torch
import torch.nn as nn
import torch.optim as optim

# Dummy dataset: y = x1 + x2
X = torch.tensor([[0,0],[0,1],[1,0],[1,1]], dtype=torch.float32)
y = torch.tensor([[0],[1],[1],[2]], dtype=torch.float32)

# Define MLP
class SimpleMLP(nn.Module):
    def __init__(self):
        super(SimpleMLP, self).__init__()
        self.fc1 = nn.Linear(2, 5)   # input -> hidden
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(5, 1)   # hidden -> output
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Initialize
model = SimpleMLP()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Train
for epoch in range(500):
    optimizer.zero_grad()
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()

# Predictions
with torch.no_grad():
    for xi in X:
        print(f"Input: {xi.numpy()}, Predicted: {model(xi).item():.2f}")


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# ----------------------------
# 1️⃣ Prepare dataset (simple sum example)
# Inputs: x1, x2
X = torch.tensor([[0,0],
                  [0,1],
                  [1,0],
                  [1,1]], dtype=torch.float32)

# Outputs: y = x1 + x2
y = torch.tensor([[0],
                  [1],
                  [1],
                  [2]], dtype=torch.float32)

# ----------------------------
# 2️⃣ Define network structure
# Input layer: 2 neurons
# Hidden layer: 5 neurons
# Output layer: 1 neuron

hidden_neurons = 5

# Initialize weights and biases manually for interpretability
W1 = torch.randn((2, hidden_neurons), requires_grad=True)
b1 = torch.randn((hidden_neurons,), requires_grad=True)
W2 = torch.randn((hidden_neurons, 1), requires_grad=True)
b2 = torch.randn((1,), requires_grad=True)

# Learning rate
lr = 0.1

# ----------------------------
# 3️⃣ Training loop (500 epochs)
for epoch in range(500):
    
    # Forward pass
    hidden_input = torch.matmul(X, W1) + b1       # Linear for hidden layer
    hidden_output = torch.relu(hidden_input)      # ReLU activation
    
    output_input = torch.matmul(hidden_output, W2) + b2   # Linear for output layer
    y_pred = output_input                            # No activation (regression)
    
    # Compute loss (Mean Squared Error)
    loss = ((y - y_pred)**2).mean()
    
    # Backward pass
    loss.backward()
    
    # Update weights manually (SGD)
    with torch.no_grad():
        W1 -= lr * W1.grad
        b1 -= lr * b1.grad
        W2 -= lr * W2.grad
        b2 -= lr * b2.grad
    
    # Zero gradients
    W1.grad.zero_()
    b1.grad.zero_()
    W2.grad.zero_()
    b2.grad.zero_()

# ----------------------------
# 4️⃣ Predictions
print("Predictions after training:")
for i, xi in enumerate(X):
    # Forward pass for each input
    h_input = torch.matmul(xi, W1) + b1
    h_output = torch.relu(h_input)
    out = torch.matmul(h_output, W2) + b2
    print(f"Input: {xi.numpy()}, Predicted: {out.item():.2f}, True: {y[i].item()}")


Predictions after training:
Input: [0. 0.], Predicted: 1.00, True: 0.0
Input: [0. 1.], Predicted: 1.00, True: 1.0
Input: [1. 0.], Predicted: 1.00, True: 1.0
Input: [1. 1.], Predicted: 1.00, True: 2.0
