# ANN 

What is ANN?

An Artificial Neural Network (ANN) is a computational model inspired by the human brain, made of interconnected neurons arranged in layers.

Used for:

Regression → continuous output

Classification → discrete class labels

How ANN Learns

Forward propagation → prediction

Loss calculation

Backpropagation → gradient computation

Weight update → Gradient Descent

ANN for Regression

Output Layer

1 neuron

Linear activation

Loss Functions

MSE

MAE

Huber Loss

ANN for Classification
Binary Classification

Output neuron: 1

Activation: Sigmoid

Loss: Binary Cross-Entropy

Multi-Class Classification

Output neurons: Number of classes

Activation: Softmax

Loss: Categorical Cross-Entropy


| Function | Use                 |
| -------- | ------------------- |
| ReLU     | Hidden layers       |
| Sigmoid  | Binary output       |
| Softmax  | Multi-class         |
| Tanh     | Alternative to ReLU |


------


| Function | Use                 |
| -------- | ------------------- |
| ReLU     | Hidden layers       |
| Sigmoid  | Binary output       |
| Softmax  | Multi-class         |
| Tanh     | Alternative to ReLU |

Regularization Techniques

L1 / L2 Regularization

Dropout

Batch Normalization

Early Stopping


| Aspect              | ANN  | Linear / Tree |
| ------------------- | ---- | ------------- |
| Non-linearity       | High | Limited       |
| Feature engineering | Less | More          |
| Interpretability    | Low  | High          |



-----

##  Activation Functions

| Function | Formula | Use Case | Derivative |
| :--- | :--- | :--- | :--- |
| **ReLU** | $f(x) = \max(0, x)$ | Hidden layers (fast, standard). | 1 ($x>0$), else 0 |
| **Sigmoid**| $f(x) = \frac{1}{1+e^{-x}}$ | Binary classification output. | $f(x)(1-f(x))$ |
| **Tanh** | $f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$ | Hidden layers (zero-centered). | $1 - f(x)^2$ |
| **Softmax** | $f(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$ | Multi-class output. | Vector-based |
| **Linear** | $f(x) = x$ | Regression output. | 1 |


----


### Summary of Deep Learning Optimization

| Technique | Goal | Key Mechanism |
| :--- | :--- | :--- |
| **Backpropagation** | Training | Chain Rule / Gradient Descent |
| **ReLU** | Stability | Prevents Vanishing Gradients |
| **Dropout** | Generalization | Randomly disables neurons |
| **L2 Regularization**| Complexity Control | Penalizes large weight values |
| **Gradient Clipping** | Stability | Caps maximum gradient value |







In [None]:
from tensorflow.keras import layers, models

# Regression ANN
reg_model = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(input_dim,)),
    layers.Dropout(0.2),  # Regularization
    layers.Dense(32, activation='relu'),
    layers.Dense(1)  # Single output for regression
])

# Classification ANN (Binary)
clf_binary = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(input_dim,)),
    layers.BatchNormalization(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(1, activation='sigmoid')  # Binary classification
])

# Classification ANN (Multi-class)
clf_multiclass = models.Sequential([
    layers.Dense(256, activation='relu', input_shape=(input_dim,)),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(num_classes, activation='softmax')  # Multi-class
])

In [None]:
# Binary Classification
binary_model = models.Sequential([
    layers.Dense(256, activation='relu', input_shape=(X_train.shape[1],)),
    layers.BatchNormalization(),
    layers.Dropout(0.4),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')  # Single neuron for binary
])

binary_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy', tf.keras.metrics.AUC(name='auc')]
)

# Multi-class Classification
num_classes = len(np.unique(y_train))
multi_model = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dropout(0.5),
    layers.Dense(256, activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(num_classes, activation='softmax')
])

# For integer labels
multi_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# For one-hot encoded labels
# loss='categorical_crossentropy'

1. Backpropagation Explanation
"Backpropagation computes gradients layer-by-layer using the chain rule:

Forward pass: compute predictions and loss

Backward pass: compute gradient $\frac{\partial L}{\partial w}$ for each weight

Update weights: $w = w - \eta \frac{\partial L}{\partial w}$
The key insight is reusing intermediate computations for efficiency."

2. Vanishing/Exploding Gradients

"Vanishing gradients occur in deep networks with sigmoid/tanh when gradients become extremely small. Solutions:

Use ReLU activation

Batch Normalization stabilizes activations

Residual connections (skip connections)

Gradient clipping for exploding gradients

Exploding gradients happen when gradients grow exponentially, often fixed with gradient clipping."

3. Overfitting Prevention


Dropout: Randomly disable neurons during training

L1/L2 Regularization: Penalize large weights

Early Stopping: Stop when validation loss plateaus

Data Augmentation: Artificially increase training data

Batch Normalization: Reduces internal covariate shift"

4. Batch Size vs Learning Rate
There's a linear scaling rule: When increasing batch size by k, increase learning rate by k to maintain similar gradient noise. However, very large batches may generalize worse (generalization gap).