<a href="https://colab.research.google.com/github/apester/IME/blob/main/Lab16_Vanish_Grad.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step 1: Setup and Data Preparation

In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from tensorflow.keras import layers
from tensorflow.keras.optimizers import SGD

# Generate synthetic data
np.random.seed(42)
time_steps = 50
samples = 1000

X = np.random.rand(samples, time_steps, 1)
y = (np.sum(X, axis=1) > (time_steps / 2)).astype(int)

Step 2: Define and Train a Simple RNN Model

In [None]:
model = Sequential()
model.add(layers.Input(shape=(time_steps, 1)))
model.add(SimpleRNN(5, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))

optimizer = SGD(learning_rate=0.01)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

history = model.fit(X, y, epochs=30, batch_size=32, verbose=1)

Step 3: Observing the Vanishing Gradient Effect

In [None]:
import tensorflow as tf

# Define a function to get gradients
inputs = tf.convert_to_tensor(X[:32], dtype=tf.float32)
with tf.GradientTape() as tape:
    preds = model(inputs)
    loss = tf.keras.losses.binary_crossentropy(y[:32], preds)
grads = tape.gradient(loss, model.trainable_weights)

# Observe gradients
for idx, grad in enumerate(grads):
    print(f"Layer {idx+1} gradients mean: {np.mean(np.abs(grad.numpy()))}")

Step 4: Analysis

You will likely see that gradients significantly decrease (vanish) at earlier layers/time steps, making it difficult for the network to learn dependencies spanning many time steps.

Discuss why this prevents Simple RNNs from effectively learning long-term dependencies.