# Advanced DL: ResNets, RNNs

	• Residual Learning and Skip Connections
	• Vanishing Gradients and RNNs
	• Gating Mechanisms in LSTM 
	• Python: Custom RNN Implementation


Here’s your complete Advanced Deep Learning module on ResNets and RNNs, including theory and a basic Python implementation of a custom RNN.

⸻

Advanced DL: ResNets, RNNs

⸻

1. Residual Learning and Skip Connections (ResNets)

Problem: As deep networks grow, training error increases due to vanishing/exploding gradients.

Solution: Residual blocks with skip connections:

$$
\mathbf{y} = \mathcal{F}(\mathbf{x}) + \mathbf{x}
$$

Where:
	•	$\mathcal{F}(\mathbf{x})$: Residual mapping (e.g., two conv layers).
	•	$\mathbf{x}$: Identity shortcut.

Benefits:
	•	Easier to optimize.
	•	Allows very deep networks (e.g., ResNet-50, ResNet-101).
	•	Gradient flows directly through the skip path.

⸻

2. Vanishing Gradients and RNNs

Recurrent Neural Networks (RNNs) handle sequential data:

$$
h_t = \tanh(W_{hh} h_{t-1} + W_{xh} x_t + b)
$$

Vanishing Gradient Problem:
	•	Gradients shrink as they propagate back through time.
	•	Makes learning long-term dependencies difficult.

⸻

3. Gating Mechanisms in LSTM

LSTM (Long Short-Term Memory) units solve vanishing gradient with gates:
	•	Forget Gate: What to forget.
	•	Input Gate: What new info to add.
	•	Output Gate: What to output.

Core equations:

$$
\begin{aligned}
f_t &= \sigma(W_f [h_{t-1}, x_t] + b_f) \
i_t &= \sigma(W_i [h_{t-1}, x_t] + b_i) \
\tilde{C}t &= \tanh(W_C [h{t-1}, x_t] + b_C) \
C_t &= f_t \odot C_{t-1} + i_t \odot \tilde{C}t \
o_t &= \sigma(W_o [h{t-1}, x_t] + b_o) \
h_t &= o_t \odot \tanh(C_t)
\end{aligned}
$$

⸻

4. Python: Custom RNN Implementation (from scratch)

import numpy as np

class SimpleRNN:
    def __init__(self, input_size, hidden_size, output_size, seq_len):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.seq_len = seq_len
        
        self.Wxh = np.random.randn(input_size, hidden_size) * 0.1
        self.Whh = np.random.randn(hidden_size, hidden_size) * 0.1
        self.Why = np.random.randn(hidden_size, output_size) * 0.1
        self.bh = np.zeros((1, hidden_size))
        self.by = np.zeros((1, output_size))

    def forward(self, X):
        self.h = np.zeros((X.shape[0], self.hidden_size))
        self.hs = []

        for t in range(self.seq_len):
            x_t = X[:, t, :]
            self.h = np.tanh(x_t @ self.Wxh + self.h @ self.Whh + self.bh)
            self.hs.append(self.h.copy())

        y = self.h @ self.Why + self.by
        return y

# Example: Predict sum of sequence
np.random.seed(0)
seq_len = 5
X = np.random.rand(100, seq_len, 1)
y = np.sum(X, axis=1)

rnn = SimpleRNN(input_size=1, hidden_size=10, output_size=1, seq_len=seq_len)
y_pred = rnn.forward(X)

print("Predicted shape:", y_pred.shape)



⸻

Would you like to follow up with custom LSTM implementation, or build a ResNet block in PyTorch next?