# 🎯 Advanced Neural Networks

1. Optimizers: Adam, RMSprop, SGD+momentum
2. Regularization: Dropout, BatchNorm
3. Activation functions: ReLU, Leaky ReLU, Swish
4. Exercises + competition + interviews


In [None]:
import numpy as np
import matplotlib.pyplot as plt
print('✅ Advanced NN ready!')


## Optimizers

### Adam (Adaptive Moment Estimation)

$$m_t = \beta_1 m_{t-1} + (1-\beta_1)g_t$$
$$v_t = \beta_2 v_{t-1} + (1-\beta_2)g_t^2$$
$$\theta_t = \theta_{t-1} - \alpha \frac{m_t}{\sqrt{v_t} + \epsilon}$$

**Why**: Adapts learning rate per parameter


In [None]:
class Adam:
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.m = None
        self.v = None
        self.t = 0
    
    def update(self, params, grads):
        if self.m is None:
            self.m = np.zeros_like(params)
            self.v = np.zeros_like(params)
        
        self.t += 1
        self.m = self.beta1 * self.m + (1 - self.beta1) * grads
        self.v = self.beta2 * self.v + (1 - self.beta2) * (grads ** 2)
        
        m_hat = self.m / (1 - self.beta1 ** self.t)
        v_hat = self.v / (1 - self.beta2 ** self.t)
        
        params -= self.lr * m_hat / (np.sqrt(v_hat) + 1e-8)
        return params

print('✅ Adam optimizer!')


## Regularization

### Dropout
Randomly drop neurons during training
- Prevents co-adaptation
- Ensemble effect

### Batch Normalization
Normalize activations per mini-batch
- Faster convergence
- Reduces internal covariate shift


## Interviews

### Q1: Adam vs SGD?
Adam: Faster, auto-adjusts LR
SGD+momentum: Simpler, sometimes better final accuracy

### Q2: Why BatchNorm works?
- Reduces internal covariate shift
- Acts as regularizer
- Allows higher learning rates

### Q3: Dropout rate?
Typical: 0.5 for hidden, 0.2 for input
