## Neural Networks

Neural networks are capable of learning and identifying patterns directly from data without pre-defined rules. These models consist of interconnected nodes or neurons that process data.

<img src="https://media.geeksforgeeks.org/wp-content/uploads/20251213151009129404/_neural_network.webp" width="500">

- **Input Layer**: This is where the network receives its input data. Each input neuron in the layer corresponds to a feature in the input data.
- **Hidden Layers**: These layers perform most of the computational heavy lifting. A neural network can have one or multiple hidden layers. Each layer consists of units (neurons) that transform the inputs into something that the output layer can use.
- **Output Layer**: The final layer produces the output of the model. The format of these outputs varies depending on the specific task like classification, regression.

#### Components:
- **Neurons**: The basic units that receive inputs, each neuron is governed by a threshold and an activation function.
- **Connections**: Links between neurons that carry information, regulated by weights and biases.
- **Weights and Biases**: These parameters determine the strength and influence of connections.
- **Propagation Functions**: Mechanisms that help process and transfer data across layers of neurons.
- **Learning Rule**: The method that adjusts weights and biases over time to improve accuracy.

#### Neuron:
A single neuron:
- Multiplies inputs by weights ($w$).
- Adds a bias ($b$) to the result.
- Passes the total through an Activation Function.

### Learning process
1. **Input Computation**: Data is fed into the network.
2. **Output Generation**: Based on the current parameters, the network generates an output.
3. **Adjusting weights and biases**: The network refines its output by adjusting weights and biases, gradually improving its performance on diverse tasks.

The learning process is repeated for many iterations over the dataset.

#### Forward Propagation
Data passes through the network in the forward direction, from the input layer through the hidden layers to the output layer. This process is known as forward propagation.
1. Linear Transformation: 
    - Each neuron in a layer receives inputs which are multiplied by the weights associated with the connections. 
    - These products are summed together and a bias is added to the sum.
2. The result of the linear transformation is then passed through an activation function.

#### Backpropagation
After forward propagation, the network evaluates its performance using a loss function which measures the difference between the actual output and the predicted output. The goal of training is to minimize this loss.
- **Loss Calculation**: The network calculates the loss which provides a measure of error in the predictions. The loss function could vary; common choices are mean squared error for regression tasks or cross-entropy loss for classification.
- **Gradient Calculation**: This is to find out how much each part of the output error can be attributed to each weight and bias.
- **Weight Update**: Once the gradients are calculated, the weights and biases are updated using an optimization algorithm like stochastic gradient descent (SGD). The weights are adjusted in the opposite direction of the gradient to minimize the loss. The size of the step taken in each update is determined by the **learning rate**.

### Activation Function
They are used to introduce non-linearity. This allows the network to learn complex patterns (like circles, waves, or faces) rather than just straight lines.

Common Functions:
- **ReLU (Rectified Linear Unit)**: Returns 0 if input is negative, and the input itself if positive. It's the "industry standard" because it's fast and prevents the model from "stalling."
- **Sigmoid**: Squishes values between 0 and 1. Great for binary classification.
- **Softmax**: Used in the final layer for multi-class problems (turns outputs into probabilities that sum to 100%).

#### Output Layer
| Task | Activation Function | Why? |
| :--- | :--- | :--- |
| **Regression** (Predicting Price/Age) | **Linear** (None) | You need the output to be any continuous number $(-\infty, \infty)$. |
| **Binary Classification** (Yes/No) | **Sigmoid** | Squishes the output between 0 and 1, representing a probability. |
| **Multi-class Classification** | **Softmax** | Squishes outputs so they sum to 1.0 (100%) across all classes. |
| **Multi-label Classification** | **Sigmoid** | Each class gets an independent 0-1 score (e.g., an image can be both a 'dog' AND 'indoors'). |

#### Hidden Layer
- **ReLU (Rectified Linear Unit) — The Default**
    - Formula: $f(x) = \max(0, x)$
    - When to use: Use this for almost everything as your starting point.
    - Pros: It’s computationally very fast and helps prevent the vanishing gradient problem for positive values.
    - Cons: "Dying ReLU" — If neurons get knocked into the negative range, they output 0 forever and stop learning.
- **Leaky ReLU / ELU**
    - When to use: If you notice your network has many "dead" neurons (sparsity is too high) or training has stalled.
    - How it works: It allows a tiny, non-zero gradient when the input is negative (e.g., $0.01x$).
- **Tanh (Hyperbolic Tangent)**
    - When to use: In RNNs (Recurrent Neural Networks) or when your data is centered around zero.
    - Why: It outputs values between -1 and 1. Being "zero-centered" helps the next layer learn more efficiently than Sigmoid.
- **Sigmoid (Avoid in hidden layers)**
    - When to use: Almost never in modern deep hidden layers.
    - Why: It causes the **vanishing gradient problem**. As the network gets deeper, the signal becomes so small that the early layers never update their weights.

#### The "Vanishing Gradient" Problem
If the network is too deep and the wrong activation functions are used, the "signal" sent back during backpropagation gets weaker until the weights in the early layers stop updating. The model "stops learning."

**The Fix**: 
- Use ReLU 
- Batch Normalization (re-centering and re-scaling the inputs to each layer).

#### The "Exploding Gradient" Problem
It occurs during the training of deep neural networks when the gradients (the "signals" used to update weights) grow exponentially as they are propagated backward through the layers.

**The Fix**: 
- Gradient Clipping - setting maximum threshold for gradient
- Batch Normalization
- Use LSTM/GRU instead of Standard RNN - Standard RNNs are notoriously prone to exploding gradients because they reuse the exact same weight matrix over and over for every time step. LSTMs use "gates" to control the flow of information, which significantly dampens the effect.

#### Dropout 
To prevent a Neural Network from overfitting, it is a good practice to randomly "turn off" a percentage of neurons during each training step. - It forces the network to be redundant and robust. It can't rely on one "super-neuron" to do all the work; it has to distribute the knowledge.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# 1. Define the Architecture
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNN, self).__init__()
        # Layer 1: Input to Hidden
        self.fc1 = nn.Linear(input_size, hidden_size) 
        # Activation Function
        self.relu = nn.ReLU()
        # Layer 2: Hidden to Output
        self.fc2 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        # Define the path the data takes
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# 2. Initialize Model, Loss, and Optimizer
model = SimpleNN(input_size=10, hidden_size=20, num_classes=2)
criterion = nn.CrossEntropyLoss() # Standard for classification
optimizer = optim.Adam(model.parameters(), lr=0.001) # The "Engine"

# 3. Dummy Training Loop
# In a real scenario, you'd loop over your actual dataset
dummy_input = torch.randn(1, 10)  # One sample with 10 features
target = torch.tensor([1])        # The true label

# Forward pass
output = model(dummy_input)
loss = criterion(output, target)

# Backward pass (The magic part)
optimizer.zero_grad() # Clear old gradients
loss.backward()       # Calculate how to change weights
optimizer.step()      # Update weights

print(f"Loss after one step: {loss.item()}")

In [None]:
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

data = {
    'feature1': [0.1, 0.2, 0.3, 0.4, 0.5],
    'feature2': [0.5, 0.4, 0.3, 0.2, 0.1],
    'label': [0, 0, 1, 1, 1]
}

df = pd.DataFrame(data)
X = df[['feature1', 'feature2']].values
y = df['label'].values

model = Sequential()
model.add(Dense(8, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model by specifying the loss function, optimizer 
# and metrics to evaluate during training. 
# Here binary crossentropy and adam optimizer is used.
model.compile(loss='binary_crossentropy',
              optimizer='adam', metrics=['accuracy'])

model.fit(X, y, epochs=100, batch_size=1, verbose=1)

test_data = np.array([[0.2, 0.4]])
prediction = model.predict(test_data)
predicted_label = (prediction > 0.5).astype(int)

<img src="https://media.geeksforgeeks.org/wp-content/uploads/20251213153112518597/types_of_neural_networks.webp" width="500">

| NN Type | Data Type | Key Strength |
| :--- | :--- | :--- |
| **ANN** | Tabular | Simple and structured patterns. |
| **CNN** | Images | Spatial patterns and feature extraction. |
| **RNN/LSTM** | Sequences | Temporal patterns and "memory." |
| **Transformer** | Sequences | Massive parallelization and "attention." |
| **GAN** | Generative | Creating realistic synthetic data. |

#### ANN (Artificial Neural Networks)
Also known as Multi-Layer Perceptrons (MLP), these are the "classic" neural networks.
- **Structure**: They consist of an input layer, one or more hidden layers, and an output layer. Every neuron in one layer is connected to every neuron in the next (Fully Connected).
- **Best For**: Tabular data (Excel-style data) and simple classification/regression tasks.
- **Limitation**: They don't handle spatial or temporal relationships well. If you pass an image into an ANN, it treats pixels as independent numbers, losing the "shape" of the object.

#### CNN (Convolutional Neural Networks)
CNNs are the gold standard for Computer Vision.
- **How they work**: Instead of looking at every pixel individually, they use Filters (kernels) that slide over the image to detect patterns like edges, textures, and eventually complex objects like faces.
- **Key Layers**:
    - Convolutional Layer: Extracts features.
    - Pooling Layer: Downsamples the image to reduce computation and make the model "translation invariant" (it can recognize a cat whether it's in the top-left or bottom-right corner).
- **Best For**: Image recognition, medical imaging, and video analysis.

#### RNN (Recurrent Neural Networks)
RNNs are designed for Sequential Data—where the order of the data points matters.

- **How they work**: They have a "loop" (recurrence) that allows information to persist. They process inputs one by one while keeping a "hidden state" (memory) of what they saw previously.
- **The Problem**: Standard RNNs have "short-term memory." They struggle to remember things from the beginning of a long sentence (the Vanishing Gradient problem).
- **Best For**: Time-series forecasting (stock prices, weather) and simple speech-to-text.

#### Transformers
- **The Innovation**: Instead of processing words one by one, Transformers look at the entire sentence at once. They use an "Attention Mechanism" to determine which words are most relevant to each other, regardless of how far apart they are.
- **Best For**: Large Language Models (LLMs), chatbots, and high-end translation.

#### GANs (Generative Adversarial Networks)
GANs are used to create to generate data (images, music, art).
- **The Logic**: Two networks are pitted against each other:
    - **The Generator**: Tries to create a fake image.
    - **The Discriminator**: Tries to guess if the image is real or fake.
- **The Result**: As they compete, the Generator becomes an expert at creating incredibly realistic fake data.
- **Best For**: Deepfakes, generating synthetic data, and image-to-image translation.