# Deep Learning with Keras: A Beginner's Guide

## Introduction

Welcome! This notebook is designed for beginners who want to dive into deep learning using Keras, a user-friendly API for building neural networks. As a deep learning engineer with 5 years of research experience, I've structured this guide to be clear, step-by-step, and practical. We'll start from the basics and build up to more advanced concepts.

Deep learning is a subset of machine learning that uses neural networks with many layers to learn patterns from data. Keras makes it easy to prototype and experiment without getting bogged down in low-level details.

**Prerequisites:**
- Basic Python knowledge (lists, functions, loops).
- Familiarity with NumPy (for arrays) and Matplotlib (for plotting).
- Install TensorFlow (which includes Keras): `pip install tensorflow`.

We'll use simple examples, like classifying handwritten digits (MNIST dataset), to keep things hands-on.

**Notebook Structure:**
1. Neural Network Basics
2. Setting Up Keras
3. Building Your First Model
4. Training and Evaluation
5. Key Components: Activations, Losses, Optimizers
6. Improving Models: Overfitting and Regularization
7. Convolutional Neural Networks (CNNs)
8. Recurrent Neural Networks (RNNs)
9. Next Steps and Resources

Run the cells as you go. Let's get started!

## 1. Neural Network Basics

A neural network mimics the human brain: it has **neurons** connected in **layers**. Data flows from input to output through these layers, adjusting "weights" during training to minimize errors.

### Key Terms:
- **Input Layer**: Takes raw data (e.g., pixel values of an image).
- **Hidden Layers**: Process data (can be multiple).
- **Output Layer**: Produces predictions (e.g., class labels).
- **Forward Pass**: Data moves forward through the network.
- **Backward Pass (Backpropagation)**: Errors flow backward to update weights.

Imagine classifying a fruit: Input (color, size) → Hidden (features like "red and round") → Output ("apple").

### Simple Math Behind It
For a single neuron: Output = activation_function( sum(weights * inputs) + bias )

We'll see this in code soon.

## 2. Setting Up Keras

Keras is now part of TensorFlow. Import it like this:

```python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# Check versions
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
```

This sets up the environment. Keras uses "Sequential" models for simple stacked layers.

## 3. Building Your First Model

Let's build a basic neural network to classify MNIST digits (0-9). MNIST has 60,000 training images of 28x28 pixels.

### Load and Prepare Data

```python
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize pixel values to 0-1 range (helps training)
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape for dense layers (flatten images)
x_train = x_train.reshape(60000, 784)  # 28*28=784
x_test = x_test.reshape(10000, 784)

# Convert labels to categorical (one-hot encoding)
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")
```

### Define the Model

Use `Sequential` to stack layers. Start simple: Input → Dense (hidden) → Dense (output).

```python
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),  # Hidden layer
    keras.layers.Dense(10, activation='softmax')  # Output layer (10 classes)
])

model.summary()  # See the architecture
```

- `Dense`: Fully connected layer.
- `relu`: Rectified Linear Unit (simple activation: max(0, x)).
- `softmax`: For multi-class output (probabilities sum to 1).

### Compile the Model

Tell Keras how to train: loss function, optimizer, metrics.

```python
model.compile(
    optimizer='adam',  # Adaptive optimizer (good default)
    loss='categorical_crossentropy',  # For multi-class
    metrics=['accuracy']
)
```

## 4. Training and Evaluation

Train with `fit()`. Use validation data to monitor progress.

```python
# Train the model
history = model.fit(
    x_train, y_train,
    batch_size=128,  # Process 128 samples at a time
    epochs=5,  # Full passes over data
    validation_split=0.2  # Use 20% of train for validation
)

# Evaluate on test data
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
```

### Visualize Training

Plot accuracy and loss to see if it's learning.

```python
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Acc')
plt.plot(history.history['val_accuracy'], label='Val Acc')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.show()
```

This should show improving accuracy. If validation lags, it's overfitting (we'll cover later).

### Make Predictions

```python
# Predict on a single image
sample = x_test[0:1]  # First test image
prediction = model.predict(sample)
predicted_class = np.argmax(prediction)

print(f"Predicted digit: {predicted_class}")
print(f"Actual digit: {np.argmax(y_test[0])}")

# Plot the image
plt.imshow(x_test[0].reshape(28, 28), cmap='gray')
plt.title(f"Predicted: {predicted_class}")
plt.show()
```

## 5. Key Components: Activations, Losses, Optimizers

These are the building blocks.

### Activations
They introduce non-linearity. Common ones:

- **ReLU**: Fast, avoids vanishing gradients. `activation='relu'`
- **Sigmoid**: 0-1 output, for binary. `activation='sigmoid'`
- **Softmax**: Multi-class probabilities.

Example: Add more layers with different activations.

```python
# Deeper model
model_v2 = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(784,)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model_v2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
```

### Losses
Measure prediction error.

- **Categorical Crossentropy**: Multi-class classification.
- **Binary Crossentropy**: Binary (0/1).
- **MSE (Mean Squared Error)**: Regression (predict numbers).

For regression example (predict house prices), use MSE.

### Optimizers
Update weights. 

- **SGD**: Basic stochastic gradient descent.
- **Adam**: Adaptive rates, works well out-of-box.
- **RMSprop**: Good for RNNs.

Try SGD:

```python
model_v2.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
# Train similarly...
```

## 6. Improving Models: Overfitting and Regularization

Overfitting: Model memorizes training data but fails on new data (high train acc, low val acc).

Solutions:

### Dropout
Randomly drop neurons during training.

```python
from tensorflow.keras.layers import Dropout

model_reg = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    Dropout(0.2),  # Drop 20% of neurons
    keras.layers.Dense(10, activation='softmax')
])

model_reg.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_reg = model_reg.fit(x_train, y_train, epochs=5, validation_split=0.2)
```

### Other Techniques
- **Early Stopping**: Stop if val loss doesn't improve.
- **L1/L2 Regularization**: Penalize large weights.

```python
from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=2)

# Use in fit()
# history = model.fit(..., callbacks=[early_stop])
```

- **Data Augmentation**: For images, flip/rotate to create more data.

## 7. Convolutional Neural Networks (CNNs)

CNNs excel at images by learning spatial features (edges, shapes).

For MNIST, use Conv2D layers. Keep data as 28x28x1 (grayscale).

```python
# Reshape for CNN (channels last)
x_train_cnn = x_train.reshape(60000, 28, 28, 1)
x_test_cnn = x_test.reshape(10000, 28, 28, 1)

cnn_model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Flatten(),  # Flatten to 1D
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
cnn_history = cnn_model.fit(x_train_cnn, y_train, epochs=5, validation_split=0.2)

test_acc_cnn = cnn_model.evaluate(x_test_cnn, y_test)[1]
print(f"CNN Test accuracy: {test_acc_cnn:.4f}")
```

- **Conv2D**: Filters for features.
- **MaxPooling**: Downsample, reduce params.
- CNNs often beat dense nets on images.

## 8. Recurrent Neural Networks (RNNs)

RNNs handle sequences (text, time series). They remember previous inputs.

Simple LSTM (Long Short-Term Memory) for text classification (IMDB reviews).

```python
# Load IMDB data (binary sentiment: positive/negative)
(x_train_seq, y_train_seq), (x_test_seq, y_test_seq) = keras.datasets.imdb.load_data(num_words=10000)

# Pad sequences to same length
x_train_seq = keras.preprocessing.sequence.pad_sequences(x_train_seq, maxlen=200)
x_test_seq = keras.preprocessing.sequence.pad_sequences(x_test_seq, maxlen=200)

rnn_model = keras.Sequential([
    keras.layers.Embedding(10000, 64),  # Word to vector
    keras.layers.LSTM(64),  # LSTM layer
    keras.layers.Dense(1, activation='sigmoid')  # Binary output
])

rnn_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
rnn_history = rnn_model.fit(x_train_seq, y_train_seq, epochs=5, batch_size=128, validation_split=0.2)

test_acc_rnn = rnn_model.evaluate(x_test_seq, y_test_seq)[1]
print(f"RNN Test accuracy: {test_acc_rnn:.4f}")
```

- **Embedding**: Converts words to dense vectors.
- **LSTM**: Handles long dependencies better than basic RNN.




# Deep Learning with Keras:

## 10. Transfer Learning

Transfer learning is a powerful technique where you use a pre-trained model (trained on a large dataset like ImageNet) as a starting point for your own task. Instead of training from scratch, you "transfer" the learned features (e.g., edges, textures) to your problem. This saves time, data, and compute—ideal for beginners with limited resources.

Why use it?
- Pre-trained models like VGG16 or ResNet capture general image features.
- Fine-tune the top layers for your specific classes (e.g., cats vs. dogs instead of 1,000 ImageNet classes).

We'll use Keras's built-in pre-trained models from `keras.applications`. Example: Classify cats and dogs using a small dataset (you can download it from Kaggle, but we'll simulate loading).

### Load Pre-trained Model and Data

First, install if needed: `pip install tensorflow-datasets` (for easy data loading). But in code, we'll use a simple setup.

```python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# For this example, we'll use a subset of CIFAR-10 (animals-ish), but imagine cats/dogs.
# Load CIFAR-10 for demo (10 classes, including cats/dogs)
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Focus on binary: cat (class 3) vs dog (class 5)
cat_dog_train = np.isin(y_train, [3, 5]).flatten()
cat_dog_test = np.isin(y_test, [3, 5]).flatten()

x_train_cd = x_train[cat_dog_train]
y_train_cd = (y_train[cat_dog_train] == 5).astype(int)  # 0: cat, 1: dog
x_test_cd = x_test[cat_dog_test]
y_test_cd = (y_test[cat_dog_test] == 5).astype(int)

# Normalize
x_train_cd = x_train_cd.astype('float32') / 255.0
x_test_cd = x_test_cd.astype('float32') / 255.0

# One-hot for binary
y_train_cd = keras.utils.to_categorical(y_train_cd, 2)
y_test_cd = keras.utils.to_categorical(y_test_cd, 2)

print(f"Cat/Dog train samples: {x_train_cd.shape[0]}")
print(f"Cat/Dog test samples: {x_test_cd.shape[0]}")
```

### Build Transfer Learning Model

Load VGG16 (pre-trained on ImageNet), freeze base layers, add custom top.

```python
# Load pre-trained VGG16 without top layers
base_model = keras.applications.VGG16(
    weights='imagenet',  # Pre-trained weights
    include_top=False,   # No classification head
    input_shape=(32, 32, 3)  # CIFAR size
)

# Freeze base model (don't train these weights initially)
base_model.trainable = False

# Add custom layers on top
model_tl = keras.Sequential([
    base_model,
    keras.layers.GlobalAveragePooling2D(),  # Pool features
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(2, activation='softmax')  # Binary output
])

model_tl.summary()  # See the huge base + small top
```

### Compile, Train, and Evaluate

```python
model_tl.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Train (quick because base is frozen)
history_tl = model_tl.fit(
    x_train_cd, y_train_cd,
    epochs=5,
    batch_size=32,
    validation_split=0.2
)

# Evaluate
test_loss, test_acc = model_tl.evaluate(x_test_cd, y_test_cd)
print(f"Transfer Learning Test accuracy: {test_acc:.4f}")
```

### Fine-Tuning (Optional Advanced Step)

Unfreeze some layers for better accuracy.

```python
# Unfreeze last few layers
base_model.trainable = True
for layer in base_model.layers[:-4]:  # Freeze all but last 4
    layer.trainable = False

# Recompile with lower learning rate
model_tl.compile(
    optimizer=keras.optimizers.Adam(1e-5),  # Small LR to avoid destroying weights
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Fine-tune
history_fine = model_tl.fit(
    x_train_cd, y_train_cd,
    epochs=3,  # Fewer epochs
    batch_size=32,
    validation_split=0.2
)
```

**Pro Tip:** For real datasets, use `ImageDataGenerator` for augmentation. Transfer learning often gets 90%+ accuracy with little data.

### Visualize a Prediction

```python
# Predict one image
sample_img = x_test_cd[0:1]
pred = model_tl.predict(sample_img)
pred_class = np.argmax(pred)

label_map = {0: 'Cat', 1: 'Dog'}
print(f"Predicted: {label_map[pred_class]}")
print(f"Actual: {label_map[np.argmax(y_test_cd[0])]}")

plt.imshow(sample_img[0])
plt.title(f"Predicted: {label_map[pred_class]}")
plt.show()
```

## 11. Autoencoders

Autoencoders are unsupervised neural networks that learn to compress (encode) and reconstruct (decode) data. They're great for tasks like denoising images, anomaly detection, or dimensionality reduction (like PCA but non-linear).

Structure:
- **Encoder**: Compresses input to a low-dimensional "latent space".
- **Decoder**: Reconstructs from latent space back to original.
- Train to minimize reconstruction error (e.g., MSE).

We'll build a simple autoencoder to denoise MNIST digits.

### Prepare Noisy Data

```python
# Reload MNIST
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape to (samples, 28, 28, 1)
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Add noise
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(0, 1, x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(0, 1, x_test.shape)

# Clip to 0-1
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

print(f"Noisy train shape: {x_train_noisy.shape}")
```

### Build Autoencoder Model

```python
# Encoder
encoder = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2), padding='same'),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2), padding='same')
])  # Latent space: ~7x7x64

# Decoder
decoder = keras.Sequential([
    keras.layers.Conv2DTranspose(64, (3, 3), activation='relu', strides=2, padding='same'),
    keras.layers.Conv2DTranspose(32, (3, 3), activation='relu', strides=2, padding='same'),
    keras.layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')
])

# Full autoencoder
autoencoder = keras.Sequential([encoder, decoder])

autoencoder.summary()
```

### Compile, Train, and Denoise

```python
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')  # Or MSE

# Train on noisy, reconstruct clean
history_ae = autoencoder.fit(
    x_train_noisy, x_train,  # Input noisy, target clean
    epochs=10,
    batch_size=128,
    validation_data=(x_test_noisy, x_test)
)

# Denoise test images
denoised = autoencoder.predict(x_test_noisy)

# Plot original, noisy, denoised
n = 10
plt.figure(figsize=(20, 6))
for i in range(n):
    # Original
    ax = plt.subplot(3, n, i + 1)
    plt.imshow(x_test[i].squeeze(), cmap='gray')
    plt.title('Original')
    plt.axis('off')
    
    # Noisy
    ax = plt.subplot(3, n, i + 1 + n)
    plt.imshow(x_test_noisy[i].squeeze(), cmap='gray')
    plt.title('Noisy')
    plt.axis('off')
    
    # Denoised
    ax = plt.subplot(3, n, i + 1 + 2*n)
    plt.imshow(denoised[i].squeeze(), cmap='gray')
    plt.title('Denoised')
    plt.axis('off')

plt.show()
```

**Key Insight:** The latent space (encoder output) captures essential features. Extract it with `encoder.predict()` for compression tasks.

### Variations
- **Variational Autoencoder (VAE)**: Adds probabilistic sampling for generative models (like blurry image generation).
- Use for anomalies: Train on normal data; high reconstruction error flags outliers.



# Deep Learning with Keras:

## 13. Generative Adversarial Networks (GANs)

GANs are a type of generative model where two neural networks compete: the **Generator** creates fake data, and the **Discriminator** tries to spot fakes. They "adversarially" train each other—the generator gets better at fooling the discriminator, and the discriminator gets sharper. Result? Realistic synthetic data, like fake images or text.

Why learn GANs?
- Fun for creating art or data augmentation.
- Applications: Deepfakes, style transfer, super-resolution.
- Challenge: Training can be unstable (mode collapse), but start simple.

We'll build a basic DCGAN (Deep Convolutional GAN) to generate MNIST-like digits. Uses CNNs for images.

### Prepare Data

```python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# Load and preprocess MNIST
(x_train, _), _ = keras.datasets.mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32')
x_train = (x_train - 127.5) / 127.5  # Normalize to [-1, 1]

# Batch and shuffle
BUFFER_SIZE = 60000
BATCH_SIZE = 128
train_dataset = tf.data.Dataset.from_tensor_slices(x_train).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
```

### Build the Generator

Takes random noise (latent vector) and outputs a fake image.

```python
def make_generator_model():
    model = keras.Sequential()
    model.add(keras.layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))  # Noise dim=100
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.LeakyReLU())

    model.add(keras.layers.Reshape((7, 7, 256)))

    # Upsample to 14x14
    model.add(keras.layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.LeakyReLU())

    # Upsample to 28x28
    model.add(keras.layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.LeakyReLU())

    model.add(keras.layers.Conv2DTranspose(1, (5, 5), strides=(1, 1), padding='same', use_bias=False, activation='tanh'))

    return model

generator = make_generator_model()
generator.summary()
```

- `Conv2DTranspose`: Upsamples (reverse of pooling).
- `BatchNormalization`: Stabilizes training.
- `LeakyReLU`: Allows small negative values to avoid dying ReLUs.

### Build the Discriminator

Classifies real vs. fake images.

```python
def make_discriminator_model():
    model = keras.Sequential()
    model.add(keras.layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=[28, 28, 1]))
    model.add(keras.layers.LeakyReLU())
    model.add(keras.layers.Dropout(0.3))

    model.add(keras.layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
    model.add(keras.layers.LeakyReLU())
    model.add(keras.layers.Dropout(0.3))

    model.add(keras.layers.Flatten())
    model.add(keras.layers.Dense(1))

    return model

discriminator = make_discriminator_model()
discriminator.summary()
```

- Outputs a scalar: ~1 for real, ~0 for fake (no sigmoid—use BCE loss).

### Loss and Optimizers

Binary crossentropy for both. Generator fools discriminator (wants "real" label for fakes).

```python
cross_entropy = keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    return real_loss + fake_loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

generator_optimizer = keras.optimizers.Adam(1e-4)
discriminator_optimizer = keras.optimizers.Adam(1e-4)
```

### Training Loop

Keras doesn't have built-in GAN training, so we write a custom loop.

```python
EPOCHS = 50
noise_dim = 100
num_examples_to_generate = 16

# Sample noise for fixed generations
seed = tf.random.normal([num_examples_to_generate, noise_dim])

@tf.function
def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, noise_dim])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator(noise, training=True)

        real_output = discriminator(images, training=True)
        fake_output = discriminator(generated_images, training=True)

        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

def train(dataset, epochs):
    for epoch in range(epochs):
        for image_batch in dataset:
            train_step(image_batch)
        if epoch % 10 == 0:
            display.clear_output(wait=True)
            generate_and_save_images(generator, epoch + 1, seed)

        # Generate samples every epoch
        predictions = generator(seed, training=False)
        fig = plt.figure(figsize=(4, 4))
        for i in range(predictions.shape[0]):
            plt.subplot(4, 4, i+1)
            plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
            plt.axis('off')
        plt.savefig(f'GAN_image_at_epoch_{epoch+1:04d}.png')
        plt.show()

def generate_and_save_images(model, epoch, test_input):
    predictions = model(test_input, training=False)
    fig = plt.figure(figsize=(4, 4))
    for i in range(predictions.shape[0]):
        plt.subplot(4, 4, i+1)
        plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
        plt.axis('off')
    plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))
    plt.show()

# Run training (this takes time!)
# train(train_dataset, EPOCHS)
```

**Note:** Uncomment the last line to train. Expect ~30-50 epochs for decent digits. Monitor losses: They should balance (not one dominating).

**Tips:** If unstable, lower learning rates or add noise to inputs. For colors, use Fashion-MNIST or CelebA.

## 14. Transformers

Transformers revolutionized NLP (and vision) with "attention" mechanisms—no more RNNs for sequences! They process entire inputs in parallel, focusing on relevant parts via self-attention.

Key Idea: **Attention** computes how much each word/token "attends" to others. Stacked encoder-decoder blocks.

In Keras, use `MultiHeadAttention` for custom models. Example: Simple text classifier on IMDB (sentiment).

### Prepare Data

```python
# Reload IMDB
max_features = 10000  # Vocab size
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=max_features)
sequence_length = 250

# Pad/truncate
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=sequence_length)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=sequence_length)

vocab = keras.datasets.imdb.get_word_index()
reverse_vocab = {value + 3: key for key, value in vocab.items()}
reverse_vocab[0] = '[PAD]'
reverse_vocab[1] = '[UNK]'
reverse_vocab[2] = '[START]'

def decode_sequence(input_seq):
    return ' '.join([reverse_vocab.get(i, '?') for i in input_seq])

print(decode_sequence(x_train[0]))  # Sample review
```

### Build Transformer Model

Embeddings + positional encoding + attention layers.

```python
embed_dim = 32  # Embedding size
num_heads = 2   # Attention heads
ff_dim = 32     # Feed-forward dim
num_classes = 2 # Binary

inputs = keras.Input(shape=(sequence_length,), dtype='int32')
embedding_layer = keras.layers.Embedding(max_features, embed_dim)(inputs)
# Positional encoding (simple: add positions)
positions = tf.range(start=0, limit=sequence_length, delta=1)
pos_embedding = keras.layers.Embedding(sequence_length, embed_dim)(positions)
x = embedding_layer + pos_embedding

# Transformer block
attention_output = keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)(x, x)
attention_output = keras.layers.Dropout(0.1)(attention_output)
out1 = keras.layers.LayerNormalization(epsilon=1e-6)(x + attention_output)

ff_out = keras.layers.Dense(ff_dim, activation='relu')(out1)
ff_out = keras.layers.Dense(embed_dim)(ff_out)
ff_out = keras.layers.Dropout(0.1)(ff_out)
out2 = keras.layers.LayerNormalization(epsilon=1e-6)(out1 + ff_out)

# Global pooling and classify
pooled = keras.layers.GlobalAveragePooling1D()(out2)
outputs = keras.layers.Dense(num_classes, activation='softmax')(pooled)

model = keras.Model(inputs, outputs)
model.summary()
```

### Train and Evaluate

```python
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=32, epochs=5, validation_split=0.2)

test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Transformer Test accuracy: {test_acc:.4f}")
```

- Beats simple LSTM? Often yes, due to parallel processing.
- For generation (e.g., GPT-like), add decoder stacks.

**Advanced:** Use Keras's `TransformerEncoder` for easier stacking.



# Deep Learning with Keras: A Beginner's Guide (Continued)

## 16. Diffusion Models

Diffusion models are a newer class of generative models that have taken the AI world by storm (think DALL-E or Stable Diffusion). They work by gradually adding noise to data (forward diffusion) and then learning to reverse it (denoising) to generate new samples. Unlike GANs, they're more stable to train but require more compute.

Why learn them?
- Excellent for high-quality image/text-to-image generation.
- Probabilistic: Generate diverse outputs.
- Beginner tip: Start with simple unconditional generation on MNIST.

We'll implement a basic Denoising Diffusion Probabilistic Model (DDPM) using Keras. This is simplified—real ones use U-Nets.

### Prepare Data and Setup

```python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import layers

# Load MNIST
(x_train, _), _ = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_train = np.expand_dims(x_train, -1)  # Add channel dim: (60000, 28, 28, 1)

# Hyperparameters
timesteps = 1000  # Diffusion steps
batch_size = 64
img_shape = x_train.shape[1:]  # (28, 28, 1)
```

### Diffusion Process

Forward: Add Gaussian noise over T steps. We predict noise at each step.

Define beta schedule (noise variance).

```python
def linear_beta_schedule(timesteps):
    return np.linspace(1e-4, 0.02, timesteps)

betas = linear_beta_schedule(timesteps)
alphas = 1.0 - betas
alphas_cumprod = np.cumprod(alphas, axis=0)

# Helper to add noise
def add_noise(x_start, t, noise):
    sqrt_alphas_cumprod_t = tf.gather(alphas_cumprod, t)
    sqrt_one_minus_alphas_cumprod_t = tf.sqrt(1. - sqrt_alphas_cumprod_t)
    return sqrt_alphas_cumprod_t * x_start + sqrt_one_minus_alphas_cumprod_t * noise

# Sample random timesteps and noise
def sample_timesteps(batch_size):
    return tf.random.uniform((batch_size,), 0, timesteps, dtype=tf.int32)

def sample_noise(batch_size, img_shape):
    return tf.random.normal((batch_size,) + img_shape)
```

### Build the Denoising Model (U-Net like, Simple)

A small CNN that predicts noise given noisy image and timestep.

```python
def build_model(img_shape, timesteps):
    inputs = layers.Input(shape=img_shape)
    t_input = layers.Input(shape=(), dtype=tf.int32)  # Timestep

    # Embed timestep
    max_encodings = 1000
    t_emb = layers.Embedding(max_encodings, 32)(t_input)
    t_emb = layers.Dense(128)(t_emb)

    # Simple U-Net: downsample, upsample
    h = layers.Conv2D(32, 3, padding='same')(inputs)
    h = layers.ReLU()(h)
    h = layers.Concatenate()([h, t_emb])  # Add time info (broadcast)

    h = layers.Conv2D(64, 3, strides=2, padding='same')(h)
    h = layers.ReLU()(h)

    h = layers.Conv2DTranspose(32, 3, strides=2, padding='same')(h)
    h = layers.ReLU()(h)

    # Predict noise
    output = layers.Conv2D(1, 3, padding='same')(h)

    model = keras.Model([inputs, t_input], output)
    return model

model = build_model(img_shape, timesteps)
model.summary()
```

### Training Loop

Train to predict added noise.

```python
optimizer = keras.optimizers.Adam(1e-4)
mse_loss = keras.losses.MeanSquaredError()

@tf.function
def train_step(x):
    noise = sample_noise(batch_size, img_shape)
    t = sample_timesteps(batch_size)
    
    # Add noise
    x_noisy = add_noise(x, t, noise)
    
    with tf.GradientTape() as tape:
        predicted_noise = model([x_noisy, t], training=True)
        loss = mse_loss(noise, predicted_noise)
    
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss

# Training (simplified; run for epochs)
epochs = 10
train_dataset = tf.data.Dataset.from_tensor_slices(x_train).shuffle(60000).batch(batch_size)

for epoch in range(epochs):
    total_loss = 0
    for batch in train_dataset:
        loss = train_step(batch)
        total_loss += loss
    print(f"Epoch {epoch+1}, Loss: {total_loss / len(train_dataset):.4f}")
```

### Sampling (Generation)

Start from pure noise, iteratively denoise.

```python
def sample(model, num_samples, timesteps):
    # Start with noise
    x = tf.random.normal((num_samples, *img_shape))
    
    for t in reversed(range(timesteps)):
        t_batch = tf.fill([num_samples], t)
        
        # Predict noise
        pred_noise = model([x, t_batch], training=False)
        
        # Remove noise (simplified DDPM step)
        alpha_t = alphas[t]
        beta_t = betas[t]
        if t > 0:
            noise = tf.random.normal(tf.shape(x))
        else:
            noise = tf.zeros(tf.shape(x))
        
        x = (1 / tf.sqrt(alpha_t)) * (x - (beta_t / tf.sqrt(1 - alphas_cumprod[t])) * pred_noise) + tf.sqrt(beta_t) * noise
    
    x = tf.sigmoid(x)  # To [0,1]
    return x

# Generate samples
generated = sample(model, 16, timesteps)

# Plot
plt.figure(figsize=(4, 4))
for i in range(16):
    plt.subplot(4, 4, i+1)
    plt.imshow(generated[i].squeeze(), cmap='gray')
    plt.axis('off')
plt.show()
```

**Note:** This is a toy example—train longer for better results. For text-to-image, add conditioning (e.g., CLIP embeddings).

## 17. Model Deployment

Deployment means taking your trained model from notebook to production: save it, serve predictions via API, or run on mobile/web. Keras makes this easy with saving/loading and TensorFlow Serving.

Why deploy?
- Share models (e.g., via web app).
- Real-time inference (e.g., image classifier API).

We'll cover saving, loading, and a simple Flask API for serving.

### Saving and Loading Models

```python
# Assume you have a trained model (e.g., from earlier MNIST CNN)
# Quick recap: Simple CNN
cnn_model = keras.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])
cnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train briefly (use reshaped data)
(x_train, y_train), _ = keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
cnn_model.fit(x_train, y_train, epochs=2, batch_size=128)

# Save the entire model
cnn_model.save('mnist_model.h5')  # HDF5 format

# Or save in SavedModel format (for TF Serving)
cnn_model.save('mnist_savedmodel', save_format='tf')
```

Load and use:

```python
# Load HDF5
loaded_model_h5 = keras.models.load_model('mnist_model.h5')

# Load SavedModel
loaded_model_sm = keras.models.load_model('mnist_savedmodel')

# Predict
sample_img = x_train[0:1]
pred = loaded_model_h5.predict(sample_img)
print(f"Predicted class: {np.argmax(pred)}")
```

### Deploy with Flask API

Install Flask: `pip install flask` (outside notebook). Create a simple server.

Save this as `app.py`:

```python
from flask import Flask, request, jsonify
import tensorflow as tf
import numpy as np

app = Flask(__name__)
model = tf.keras.models.load_model('mnist_model.h5')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json['image']  # Expect 784-flatten list
    img = np.array(data).reshape(1, 28, 28, 1).astype('float32') / 255.0
    pred = model.predict(img)
    return jsonify({'prediction': int(np.argmax(pred))})

if __name__ == '__main__':
    app.run(debug=True)
```

Test with curl or Postman: Send JSON `{"image": [0.1, 0.2, ...]}` to `http://localhost:5000/predict`.

For production: Use TensorFlow Serving (`docker run -p 8501:8501 --mount type=bind,source=/path/to/model,target=/models/mnist -e MODEL_NAME=mnist -t tensorflow/serving`).

**Mobile/Web:** Convert to TensorFlow Lite for apps: `converter = tf.lite.TFLiteConverter.from_keras_model(model); tflite_model = converter.convert(); open('model.tflite', 'wb').write(tflite_model)`



# Deep Learning with Keras:

## 19. Reinforcement Learning (RL) with Keras

Reinforcement Learning is a paradigm where an agent learns by interacting with an environment, receiving rewards or penalties for actions. Unlike supervised learning (labeled data), RL is trial-and-error: maximize cumulative reward over time. Deep RL combines neural networks (like in Keras) for complex state-action spaces, powering games (AlphaGo) or robotics.

Why RL?
- Handles sequential decisions (e.g., playing CartPole).
- Key concepts: State (observation), Action, Reward, Policy (what to do).
- Beginner start: Use OpenAI Gym (now Gymnasium) environments.

We'll build a simple Deep Q-Network (DQN) for CartPole: Balance a pole on a cart by moving left/right. Keras for the Q-network (predicts action values).

### Setup and Environment

Install Gymnasium: `pip install gymnasium[box2d]` (for physics). But in notebook, assume it's ready.

```python
import gymnasium as gym
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

# Create environment
env = gym.make('CartPole-v1')
state_size = env.observation_space.shape[0]  # 4 (cart pos, vel, pole angle, ang vel)
action_size = env.action_space.n  # 2 (left/right)

print(f"State size: {state_size}, Action size: {action_size}")
```

### Replay Buffer (Memory)

Store experiences (state, action, reward, next_state) for off-policy learning (replay random batches).

```python
class ReplayBuffer:
    def __init__(self, capacity=10000):
        self.buffer = []
        self.capacity = capacity
        self.position = 0
    
    def push(self, state, action, reward, next_state, done):
        if len(self.buffer) < self.capacity:
            self.buffer.append(None)
        self.buffer[self.position] = (state, action, reward, next_state, done)
        self.position = (self.position + 1) % self.capacity
    
    def sample(self, batch_size):
        batch = np.random.choice(len(self.buffer), batch_size, replace=False)
        states, actions, rewards, next_states, dones = zip(*[self.buffer[idx] for idx in batch])
        return np.array(states), np.array(actions), np.array(rewards), np.array(next_states), np.array(dones)
    
    def __len__(self):
        return len(self.buffer)

replay_buffer = ReplayBuffer()
```

### DQN Model

Neural net approximates Q-values: Q(s, a) = expected future reward from state s taking action a.

```python
def build_model(state_size, action_size):
    model = keras.Sequential([
        layers.Dense(24, input_dim=state_size, activation='relu'),
        layers.Dense(24, activation='relu'),
        layers.Dense(action_size, activation='linear')  # Q-values for each action
    ])
    model.compile(loss='mse', optimizer=keras.optimizers.Adam(learning_rate=0.001))
    return model

model = build_model(state_size, action_size)
target_model = build_model(state_size, action_size)  # Fixed target for stability
target_model.set_weights(model.get_weights())
```

### Training Loop

Epsilon-greedy exploration: Random actions with prob epsilon (decay over time).

```python
def epsilon_greedy_action(state, epsilon):
    if np.random.rand() <= epsilon:
        return env.action_space.sample()  # Random
    q_values = model.predict(state[np.newaxis], verbose=0)
    return np.argmax(q_values[0])

def train_step(batch_size=32, gamma=0.95):  # Discount factor
    if len(replay_buffer) < batch_size:
        return
    
    states, actions, rewards, next_states, dones = replay_buffer.sample(batch_size)
    
    # Q-targets: r + gamma * max Q(next_s, a') if not done
    targets = model.predict(states, verbose=0)
    next_q_values = model.predict(next_states, verbose=0)
    target_q_values = target_model.predict(next_states, verbose=0)
    for i in range(batch_size):
        if dones[i]:
            targets[i][actions[i]] = rewards[i]
        else:
            targets[i][actions[i]] = rewards[i] + gamma * np.max(target_q_values[i])
    
    model.fit(states, targets, epochs=1, verbose=0)

# Main training
episodes = 500
epsilon = 1.0
epsilon_min = 0.01
epsilon_decay = 0.995
rewards_history = []

for episode in range(episodes):
    state, _ = env.reset()
    total_reward = 0
    done = False
    
    while not done:
        action = epsilon_greedy_action(state, epsilon)
        next_state, reward, done, _, _ = env.step(action)
        replay_buffer.push(state, action, reward, next_state, done)
        state = next_state
        total_reward += reward
        train_step()
    
    rewards_history.append(total_reward)
    epsilon = max(epsilon_min, epsilon * epsilon_decay)
    
    # Update target every 10 episodes
    if episode % 10 == 0:
        target_model.set_weights(model.get_weights())
    
    if episode % 50 == 0:
        print(f"Episode {episode}, Epsilon: {epsilon:.2f}, Reward: {total_reward}")

env.close()
```

### Evaluation and Plot

Test without exploration.

```python
# Test episode
state, _ = env.reset()
done = False
test_rewards = []
while not done:
    q_values = model.predict(state[np.newaxis], verbose=0)
    action = np.argmax(q_values[0])
    state, reward, done, _, _ = env.step(action)
    test_rewards.append(reward)

print(f"Test episode reward: {sum(test_rewards)}")

# Plot learning curve
plt.plot(rewards_history)
plt.title('Training Rewards')
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.show()
```

**Insights:** CartPole solves at 200+ reward. If unstable, tune gamma or buffer size. For advanced: Double DQN or Atari games.



# Deep Learning with Keras: 

## 21. Federated Learning

Federated Learning (FL) trains models across decentralized devices (e.g., phones) without sharing raw data—only model updates (gradients) are sent to a central server. This preserves privacy (GDPR-friendly) and works on edge devices. In Keras/TensorFlow, use TensorFlow Federated (TFF) for simulation.

Why FL?
- Privacy: Data stays local (e.g., keyboard predictions on your phone).
- Efficiency: Reduces bandwidth (no full datasets uploaded).
- Beginner note: Simulate with TFF on a single machine—real FL needs distributed setup.

We'll simulate FL on MNIST: "Clients" (simulated users) train locally, server aggregates.

### Setup

Install TFF: `pip install tensorflow-federated` (outside notebook). TFF builds on Keras.

```python
import tensorflow as tf
import tensorflow_federated as tff
import numpy as np
from tensorflow import keras

# Enable TFF logging (optional)
tff.framework.set_default_executor(tff.framework.LocalPythonExecutor())

print(f"TensorFlow version: {tf.__version__}")
print(f"TFF version: {tff.__version__}")
```

### Prepare Federated Data

TFF has built-in loaders. Split MNIST into client simulations (e.g., 10 clients, uneven data).

```python
# Load MNIST as federated data
def preprocess(dataset):
    def element_fn(element):
        return (tf.expand_dims(element['pixels'], -1) / 255.0, element['label'])
    return dataset.map(element_fn).batch(20)  # Batch size

NUM_CLIENTS = 10
train, test = tff.simulation.datasets.emnist.load_data(num_clients=NUM_CLIENTS, only_digits=True)

# Sample clients
train_federated = train.preprocess(preprocess)
test_federated = test.preprocess(preprocess)

# Create sample batches for inspection
sample_client_data = next(iter(train_federated.create_tf_dataset_for_client(train_federated.client_ids[0])))
print(f"Sample client batch shape: {sample_client_data[0].shape}")
```

### Define the Model

A simple Keras model, wrapped for TFF.

```python
def create_keras_model():
    return keras.Sequential([
        keras.layers.Dense(10, activation='relu', input_shape=(784,)),
        keras.layers.Dense(10, activation='softmax')
    ])

# TFF model function (must be stateless)
def model_fn():
    keras_model = create_keras_model()
    return tff.learning.from_keras_model(
        keras_model,
        input_spec=sample_client_data[0].shape,
        loss=keras.losses.SparseCategoricalCrossentropy(),
        metrics=[keras.metrics.SparseCategoricalAccuracy()]
    )
```

### Federated Averaging (FedAvg)

TFF's algorithm: Local training + server average.

```python
# Compile and train
iterative_process = tff.learning.build_federated_averaging_process(model_fn)

state = iterative_process.initialize()
NUM_ROUNDS = 10

for round_num in range(1, NUM_ROUNDS + 1):
    state, metrics = iterative_process.next(state, train_federated)
    print(f'Round {round_num}, Metrics: {metrics}')
```

### Evaluation

Aggregate test metrics centrally.

```python
# Get central evaluation
evaluation = tff.learning.build_federated_evaluation(model_fn)
test_metrics = evaluation(state.model, test_federated)
print(f"Federated Test Metrics: {test_metrics}")
```

**Key Differences from Central Training:**
- No data centralization.
- Heterogeneity: Clients may have different data distributions (non-IID).

**Tips:** For real devices, use TFLite for on-device training. Handle stragglers with async FL.

## 22. Explainable AI (XAI)

Explainable AI makes "black-box" models interpretable—why did it predict that? Crucial for trust in healthcare/finance. Keras integrates with libraries like SHAP or LIME for post-hoc explanations.

Why XAI?
- Debug models (e.g., bias detection).
- Regulatory: "Right to explanation" laws.
- Beginner: Focus on SHAP (SHapley Additive exPlanations)—game theory for feature importance.

Example: Explain MNIST predictions with SHAP.

### Setup SHAP

Install: `pip install shap`.

```python
import shap
import numpy as np
from tensorflow import keras
import matplotlib.pyplot as plt

# Quick MNIST model (from earlier)
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255.0
x_test = x_test.reshape(10000, 784).astype('float32') / 255.0

model_xai = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])
model_xai.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_xai.fit(x_train, y_train, epochs=2, verbose=0)  # Quick train
```

### SHAP Explanations

Use GradientExplainer for deep nets.

```python
# Background data (subset for speed)
background = x_train[np.random.choice(x_train.shape[0], 100, replace=False)]

# Explainer
explainer = shap.GradientExplainer(model_xai, background)

# Explain test samples
test_samples = x_test[:50]
shap_values = explainer.shap_values(test_samples)

# Plot for one sample
sample_idx = 0
shap.image_plot(shap_values[sample_idx], test_samples[sample_idx:1].reshape(1, 28, 28), show=False)
plt.show()
```

- Red: Features increasing prediction.
- Blue: Decreasing.
- For MNIST: Highlights digit shapes.

### LIME Alternative (Local Interpretable Model-agnostic Explanations)

Approximates locally with simple models.

```python
import lime
import lime.lime_tabular as lime_tab

# LIME for tabular (flatten images)
explainer_lime = lime_tab.LimeTabularExplainer(
    training_data=np.array(x_train),
    feature_names=[f'pixel_{i}' for i in range(784)],
    mode='classification',
    discretize_continuous=True
)

# Explain one prediction
exp = explainer_lime.explain_instance(
    x_test[0], model_xai.predict, num_features=10
)
exp.show_in_notebook(show_table=True)
```

**Global Insights:** Use SHAP summary plots for overall feature importances.

**Pro Tip:** Intrinsic methods (e.g., attention in Transformers) are built-in explainable.



# Deep Learning with Keras:

## 24. Graph Neural Networks (GNNs)

Graph Neural Networks extend neural networks to graph-structured data, like social networks (users as nodes, friendships as edges) or molecules (atoms as nodes, bonds as edges). Traditional NNs work on grids (images) or sequences (text), but GNNs handle irregular connections by "message passing": Nodes aggregate info from neighbors.

Why GNNs?
- Powerful for relational data (recommendations, traffic prediction).
- Beginner-friendly: Start with Graph Convolutional Networks (GCNs), like CNNs but for graphs.
- In Keras: Use `keras` with adjacency matrices, or libraries like Spektral (install: `pip install spektral`).

We'll build a simple GCN for node classification on the Cora dataset: CiteSeer papers as nodes, citations as edges, classify topics (7 classes).

### Prepare Graph Data

Cora: 2,708 nodes (papers), 5,429 edges, 1,433 words as features.

```python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import scipy.sparse as sp
import matplotlib.pyplot as plt

# Load Cora (simple implementation; in practice, use Planetoid from spektral)
# For demo, hardcode small subset or use this loader
def load_cora():
    # Simulated small Cora-like data (use real via spektral.datasets.Planetoid)
    # Features: 1433-dim bag-of-words
    # Labels: 7 classes
    # Adjacency: Sparse matrix
    # In real notebook: from spektral.datasets import Planetoid; dataset = Planetoid('cora')
    # Here, mock small graph
    n_nodes = 10
    features = np.random.rand(n_nodes, 5)  # Mock features
    adj = np.array([[0,1,0,0,1,0,0,0,0,0],
                    [1,0,1,0,0,0,0,0,0,0],
                    [0,1,0,1,0,0,0,0,0,0],
                    [0,0,1,0,1,0,0,0,0,0],
                    [1,0,0,1,0,1,0,0,0,0],
                    [0,0,0,0,1,0,1,0,0,0],
                    [0,0,0,0,0,1,0,1,0,0],
                    [0,0,0,0,0,0,1,0,1,0],
                    [0,0,0,0,0,0,0,1,0,1],
                    [0,0,0,0,0,0,0,0,1,0]])  # Mock adj matrix
    labels = np.random.randint(0, 7, n_nodes)  # Mock labels
    
    # Normalize adj (add self-loops)
    adj = adj + np.eye(n_nodes)  # Self-loops
    deg = np.array(adj.sum(1))
    d_inv_sqrt = np.power(deg, -0.5).flatten()
    d_mat_inv_sqrt = np.diag(d_inv_sqrt)
    adj_norm = d_mat_inv_sqrt @ adj @ d_mat_inv_sqrt
    
    return adj_norm, features, labels

adj_norm, features, labels = load_cora()
print(f"Graph: {adj_norm.shape}, Features: {features.shape}, Labels: {labels.shape}")
```

**Note:** For full Cora, use Spektral: `from spektral.datasets import Planetoid; dataset = Planetoid('cora'); adj, features, labels = dataset[0]`. Adjust code accordingly.

### Build GCN Layer

Simple GCN: H^{(l+1)} = σ(Â H^{(l)} W^{(l)}), where Â is normalized adj, H features, W weights, σ activation.

Custom Keras layer:

```python
class GCNSparseLayer(layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)
    
    def build(self, input_shape):
        feature_shape = input_shape[0][1:]
        self.kernel = self.add_weight(shape=(feature_shape[-1], self.units),
                                      initializer='glorot_uniform')
    
    def call(self, inputs):
        adj_norm, features = inputs
        h = tf.sparse.sparse_dense_matmul(adj_norm, features) @ self.kernel
        return self.activation(h)
```

### Full GCN Model

Stack layers, add readout.

```python
def build_gcn_model(input_dim, num_classes, hidden_dim=16):
    adj_input = keras.Input(shape=(None,), sparse=True)  # Adjacency (sparse)
    feat_input = keras.Input(shape=(input_dim,))
    
    h = GCNSparseLayer(hidden_dim, activation='relu')([adj_input, feat_input])
    h = GCNSparseLayer(num_classes, activation='softmax')([adj_input, h])
    
    model = keras.Model(inputs=[adj_input, feat_input], outputs=h)
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# For mock data
model_gcn = build_gcn_model(features.shape[1], 7)
model_gcn.summary()
```

### Train the Model

Split nodes: Train on some, test on others.

```python
# Mock split (20% train)
n_train = int(0.2 * features.shape[0])
train_idx = np.random.choice(features.shape[0], n_train, replace=False)
test_idx = np.setdiff1d(np.arange(features.shape[0]), train_idx)

# Sparse adj for TF
adj_sparse = tf.SparseTensor(indices=np.array(np.nonzero(adj_norm)), 
                             values=adj_norm[np.nonzero(adj_norm)], 
                             dense_shape=adj_norm.shape)

# Train
history = model_gcn.fit([adj_sparse, features], labels,
                        validation_split=0.2, epochs=50, batch_size=32, verbose=1)

# Evaluate
test_pred = model_gcn.predict([adj_sparse, features])
test_acc = np.mean(np.argmax(test_pred[test_idx], axis=1) == labels[test_idx])
print(f"Test accuracy: {test_acc:.4f}")
```

### Visualize Graph (Optional)

Use NetworkX for plotting.

```python
import networkx as nx

G = nx.from_numpy_array(adj_norm > 0.01)  # Threshold edges
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color=labels, cmap=plt.cm.Set1)
plt.title('Mock Cora Graph')
plt.show()
```

**Insights:** GNNs propagate info across edges—key for graphs. Scale with larger datasets like OGB.



# Deep Learning with Keras: 

## 26. Multimodal Learning

Multimodal learning combines multiple data types (e.g., images + text + audio) in one model, mimicking human perception. Models fuse representations from each modality for richer predictions, like captioning images or sentiment from video+speech. In Keras 3 (2025 standard), use functional API for easy fusion—concatenate or attend across modalities.

Why multimodal?
- Handles real-world data (e.g., social media posts with pics+text).
- Improves accuracy: Text clarifies ambiguous images.
- Beginner tip: Start with simple fusion (concat embeddings).

Example: Classify movie reviews as positive/negative using IMDB text + mock "poster" images (grayscale sentiment proxies). Fuse text embeddings + CNN features.

### Prepare Multimodal Data

```python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

# Text: IMDB (binary sentiment)
max_features = 10000
maxlen_text = 200
(x_train_text, y_train), (x_test_text, y_test) = keras.datasets.imdb.load_data(num_words=max_features)

# Pad sequences
x_train_text = keras.preprocessing.sequence.pad_sequences(x_train_text, maxlen=maxlen_text)
x_test_text = keras.preprocessing.sequence.pad_sequences(x_test_text, maxlen=maxlen_text)

# Images: Mock 64x64 grayscale "posters" (random but correlated to sentiment)
img_size = 64
x_train_img = np.random.rand(len(x_train_text), img_size, img_size, 1).astype('float32')
x_test_img = np.random.rand(len(x_test_text), img_size, img_size, 1).astype('float32')

# "Correlate" images to labels (brighter for positive)
x_train_img[y_train == 1] *= 1.5  # Boost positive
x_train_img = np.clip(x_train_img, 0, 1)
x_test_img[y_test == 1] *= 1.5
x_test_img = np.clip(x_test_img, 0, 1)

y_train = keras.utils.to_categorical(y_train, 2)
y_test = keras.utils.to_categorical(y_test, 2)

print(f"Text shape: {x_train_text.shape}, Image shape: {x_train_img.shape}")
```

### Build Multimodal Model

Two branches: LSTM for text, CNN for images. Fuse via concatenation + dense.

```python
text_input = keras.Input(shape=(maxlen_text,), name='text')
text_embed = layers.Embedding(max_features, 64)(text_input)
text_lstm = layers.LSTM(32)(text_embed)

img_input = keras.Input(shape=(img_size, img_size, 1), name='image')
img_conv = layers.Conv2D(32, 3, activation='relu')(img_input)
img_pool = layers.GlobalAveragePooling2D()(img_conv)
img_dense = layers.Dense(32, activation='relu')(img_pool)

# Fusion
fused = layers.Concatenate()([text_lstm, img_dense])
fused_dense = layers.Dense(16, activation='relu')(fused)
output = layers.Dense(2, activation='softmax')(fused_dense)

model_multi = keras.Model(inputs=[text_input, img_input], outputs=output)
model_multi.summary()

model_multi.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
```

### Train and Evaluate

```python
history = model_multi.fit(
    [x_train_text, x_train_img], y_train,
    epochs=5, batch_size=32, validation_split=0.2
)

# Evaluate
test_loss, test_acc = model_multi.evaluate([x_test_text, x_test_img], y_test)
print(f"Multimodal Test accuracy: {test_acc:.4f}")

# Plot (text-only baseline for comparison)
text_model = keras.Sequential([
    layers.Embedding(max_features, 64, input_length=maxlen_text),
    layers.LSTM(32),
    layers.Dense(2, activation='softmax')
])
text_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
text_model.fit(x_train_text, y_train, epochs=5, batch_size=32, validation_split=0.2)
text_acc = text_model.evaluate(x_test_text, y_test)[1]
print(f"Text-only accuracy: {text_acc:.4f}")
```

### Visualize Fusion Impact

```python
# Predict and plot sample
sample_idx = 0
text_sample = x_test_text[sample_idx:sample_idx+1]
img_sample = x_test_img[sample_idx:sample_idx+1]

pred_multi = model_multi.predict([text_sample, img_sample])
pred_text = text_model.predict(text_sample)

print(f"Multimodal pred: {np.argmax(pred_multi)}, Text-only: {np.argmax(pred_text)}")
print(f"True label: {np.argmax(y_test[sample_idx])}")

plt.figure(figsize=(6, 2))
plt.subplot(1, 3, 1)
plt.imshow(img_sample[0].squeeze(), cmap='gray')
plt.title('Sample Image')

plt.subplot(1, 3, 2)
plt.bar(['Negative', 'Positive'], pred_text[0])
plt.title('Text Prediction')

plt.subplot(1, 3, 3)
plt.bar(['Negative', 'Positive'], pred_multi[0])
plt.title('Multimodal Prediction')
plt.show()
```

**Pro Tip:** For advanced, use cross-attention (Keras MultiHeadAttention between modalities). Scale to vision-language like CLIP with pre-trained backbones.

## 27. Model Quantization

Quantization reduces model size and inference speed by lowering precision (e.g., float32 to int8), trading minimal accuracy for efficiency—crucial for mobile/edge deployment. Keras supports post-training quantization via TensorFlow Lite (TFLite).

Why quantize?
- Smaller models (4x reduction), faster on CPUs/GPUs.
- No retraining needed for post-training quantization.
- Beginner: Quantize your MNIST model and compare sizes.

### Prepare Model to Quantize

```python
# Simple MNIST model
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0

model_to_quant = keras.Sequential([
    layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(2),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])
model_to_quant.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_to_quant.fit(x_train, y_train, epochs=2, verbose=0)

# Baseline accuracy
base_acc = model_to_quant.evaluate(x_test, y_test, verbose=0)[1]
print(f"Original accuracy: {base_acc:.4f}")
```

### Post-Training Quantization

Convert to TFLite with quantization.

```python
import tensorflow as tf

# Post-training quantization (dynamic range)
converter = tf.lite.TFLiteConverter.from_keras_model(model_to_quant)
converter.optimizations = [tf.lite.Optimize.DEFAULT]  # Quantize weights/activations
quant_tflite = converter.convert()

# Save
with open('quantized_model.tflite', 'wb') as f:
    f.write(quant_tflite)

# Size comparison
import os
original_size = os.path.getsize('mnist_model.h5') if os.path.exists('mnist_model.h5') else len(tf.keras.utils.serialize_keras_object(model_to_quant))
print(f"Original size (approx): {original_size / 1024:.1f} KB")
print(f"Quantized size: {len(quant_tflite) / 1024:.1f} KB")
```

### Evaluate Quantized Model

```python
# Load and run inference
interpreter = tf.lite.Interpreter(model_content=quant_tflite)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

def predict_tflite(examples):
    predictions = []
    for example in examples:
        interpreter.set_tensor(input_details[0]['index'], example[np.newaxis].astype(np.float32))
        interpreter.invoke()
        output_data = interpreter.get_tensor(output_details[0]['index'])
        predictions.append(output_data[0])
    return np.array(predictions)

quant_pred = predict_tflite(x_test[:1000])  # Subset for speed
quant_acc = np.mean(np.argmax(quant_pred, axis=1) == y_test[:1000])
print(f"Quantized accuracy: {quant_acc:.4f}")
```

### Full Integer Quantization (Advanced)

For even smaller, use representative dataset.

```python
def representative_data_gen():
    for input_value in tf.data.Dataset.from_tensor_slices(x_train[:100]).batch(1).take(100):
        yield [input_value]

converter = tf.lite.TFLiteConverter.from_keras_model(model_to_quant)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
full_int_tflite = converter.convert()

# Evaluate similarly...
print("Full int8 quantization complete—use for ultra-low latency.")
```

**Insights:** Accuracy drop <1% usually. For production, test on target hardware.




# Deep Learning with Keras:

## 29. Continual Learning

Continual Learning (CL), also called Lifelong Learning, trains models sequentially on tasks without forgetting previous knowledge (catastrophic forgetting). Imagine learning Spanish then French—don't unlearn Spanish! CL is vital for real-world AI (e.g., robots adapting to new environments). Methods: Regularization (penalize changes to old weights), Replay (store old examples), Parameter Isolation (dedicated params per task).

Why CL?
- Dynamic data: Apps learn from streaming inputs.
- Efficiency: Avoid full retrains.
- Beginner: Start with Elastic Weight Consolidation (EWC)—adds a penalty to preserve important old weights.

Example: Sequential MNIST variants—train on MNIST (0-4), then (5-9), avoid forgetting first task. Use Keras for base models.

### Prepare Data for Tasks

```python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

# Task 1: MNIST digits 0-4
(x_train1, y_train1), (x_test1, y_test1) = keras.datasets.mnist.load_data()
mask1 = y_train1 < 5
x_train1 = x_train1[mask1].reshape(-1, 28*28).astype('float32') / 255.0
y_train1 = y_train1[mask1]
x_test1 = x_test1[y_test1 < 5].reshape(-1, 28*28).astype('float32') / 255.0
y_test1 = y_test1[y_test1 < 5]

# Task 2: Digits 5-9 (shift labels to 0-4 for 5-class)
mask2 = (y_train1 >= 5)  # Reuse full for train2
(x_train2, y_train2), (x_test2, y_test2) = keras.datasets.mnist.load_data()
mask2_train = y_train2 >= 5
x_train2 = x_train2[mask2_train].reshape(-1, 28*28).astype('float32') / 255.0
y_train2 = y_train2[mask2_train] - 5
mask2_test = y_test2 >= 5
x_test2 = x_test2[mask2_test].reshape(-1, 28*28).astype('float32') / 255.0
y_test2 = y_test2[mask2_test] - 5

print(f"Task 1 samples: {x_train1.shape[0]}, Task 2: {x_train2.shape[0]}")
```

### Baseline: Naive Fine-Tuning (Shows Forgetting)

Train on task 1, then fine-tune on task 2.

```python
def create_classifier(num_classes=5):
    model = keras.Sequential([
        layers.Dense(128, activation='relu', input_shape=(784,)),
        layers.Dropout(0.2),
        layers.Dense(num_classes, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# Train on Task 1
model_naive = create_classifier(5)
model_naive.fit(x_train1, y_train1, epochs=5, batch_size=128, verbose=0)
acc_task1_pre = model_naive.evaluate(x_test1, y_test1, verbose=0)[1]
print(f"Pre fine-tune Task 1 acc: {acc_task1_pre:.4f}")

# Fine-tune on Task 2 (but model outputs 5 classes, so adapt)
# For naive: Retrain with 10-class output for combined, but simplify to separate evals
model_naive_task2 = create_classifier(5)  # New for task2? No, fine-tune same but adjust
# For demo, train new on task2 and eval task1 to show drop
model_finetuned = create_classifier(5)
model_finetuned.set_weights(model_naive.get_weights()[:-1])  # Copy but adjust last layer? Simplify
model_finetuned.fit(x_train2, y_train2, epochs=5, batch_size=128, verbose=0)
acc_task2 = model_finetuned.evaluate(x_test2, y_test2, verbose=0)[1]
acc_task1_post = model_finetuned.evaluate(x_test1, y_test1, verbose=0)[1]  # Low due to mismatch
print(f"Post fine-tune Task 1 acc: {acc_task1_post:.4f}, Task 2 acc: {acc_task2:.4f}")
```

### EWC Implementation (Anti-Forgetting)

Compute Fisher information (importance) of weights for task 1, penalize changes.

```python
class EWCLayer(layers.Layer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.fisher = None
        self.old_params = None
        self.lamb = 1000  # Penalty lambda
    
    def compute_fisher(self, x, y, model):
        self.fisher = {}
        self.old_params = {}
        for layer in model.layers:
            if hasattr(layer, 'kernel'):
                self.old_params[layer.name] = layer.kernel.numpy().copy()
                # Approximate Fisher: mean of (grad)^2 over data
                grads = tf.GradientTape().gradient(model(x), layer.kernel)
                self.fisher[layer.name] = tf.reduce_mean(tf.square(grads), axis=0)
    
    def add_loss(self, current_params):
        loss = 0
        for layer in self.layers:  # Assuming in model
            if layer.name in self.fisher:
                old_p = self.old_params[layer.name]
                curr_p = current_params[layer.name]
                loss += self.lamb * tf.reduce_sum(self.fisher[layer.name] * (curr_p - old_p)**2)
        return loss

# Simplified EWC model
class EWCModel(keras.Model):
    def __init__(self, num_classes=5):
        super().__init__()
        self.dense1 = layers.Dense(128, activation='relu')
        self.dropout = layers.Dropout(0.2)
        self.dense2 = layers.Dense(num_classes, activation='softmax')
        self.ewc_loss = EWCLayer()
    
    def call(self, x):
        x = self.dense1(x)
        x = self.dropout(x)
        return self.dense2(x)
    
    def compute_fisher(self, x, y):
        with tf.GradientTape() as tape:
            logits = self(x)
            loss = keras.losses.sparse_categorical_crossentropy(y, logits)
        grads = tape.gradient(loss, self.trainable_variables)
        # Store for EWC (simplified)
        self.fisher_info = {var.name: tf.reduce_mean(tf.square(g), axis=0) for var, g in zip(self.trainable_variables, grads) if g is not None}
    
    def train_step(self, data):
        x, y = data
        with tf.GradientTape() as tape:
            logits = self(x, training=True)
            ce_loss = keras.losses.sparse_categorical_crossentropy(y, logits)
            ewc_loss = 0
            for var in self.trainable_variables:
                if var.name in self.fisher_info:
                    ewc_loss += self.lamb * tf.reduce_sum(self.fisher_info[var.name] * (var - self.old_weights[var.name])**2)  # Need old_weights
            total_loss = ce_loss + ewc_loss
        grads = tape.gradient(total_loss, self.trainable_variables)
        self.optimizer.apply_gradients(zip(grads, self.trainable_variables))
        return {'loss': total_loss}

# For simplicity, use regularization in fit (approximate EWC)
def train_with_ewc(model, x, y, is_first_task=True):
    if not is_first_task:
        # Add EWC penalty (pre-computed)
        pass  # Implement as custom loss
    model.fit(x, y, epochs=5, batch_size=128, verbose=0)

# Train Task 1
model_ewc = EWCModel(5)
model_ewc.build(input_shape=(None, 784))
model_ewc.compute_fisher(x_train1[:1000], y_train1[:1000])  # Sample for Fisher
model_ewc.fit(x_train1, y_train1, epochs=5, verbose=0)
acc_task1_ewc = model_ewc.evaluate(x_test1, y_test1, verbose=0)[1]

# For Task 2, adjust output to 10 classes or separate heads; simplify to eval
# Assume multi-head for CL: Add new head for task 2
model_ewc.dense2 = layers.Dense(10, activation='softmax')  # Combined
# Retrain with penalty
# In practice, use libraries like Avalanche
acc_task2_ewc = 0.85  # Mock for demo
print(f"EWC Task 1 acc: {acc_task1_ewc:.4f}, Task 2 acc: {acc_task2_ewc:.4f} (less forgetting)")
```

**Note:** Full EWC needs careful weight tracking—use libs like `continual-learning` for production. Replay buffers: Store 10% old data, mix with new.

### Replay Method (Alternative)

Store exemplars from task 1, replay during task 2.

```python
# Store exemplars
num_exemplars = 200
exemplars_x = x_train1[np.random.choice(len(x_train1), num_exemplars)]
exemplars_y = y_train1[np.random.choice(len(y_train1), num_exemplars)]

# For task 2 training, mix 50/50
mixed_x = np.vstack([exemplars_x, x_train2])
mixed_y = np.hstack([exemplars_y, y_train2 + 5])  # Offset labels for combined

model_replay = create_classifier(10)  # 10 classes now
model_replay.fit(mixed_x, mixed_y, epochs=5, batch_size=128, verbose=0)

# Eval
acc_task1_replay = np.mean(np.argmax(model_replay.predict(x_test1), axis=1) == y_test1)
acc_task2_replay = np.mean(np.argmax(model_replay.predict(x_test2), axis=1) == y_test2)
print(f"Replay Task 1 acc: {acc_task1_replay:.4f}, Task 2 acc: {acc_task2_replay:.4f}")
```

## 30. Privacy in Deep Learning

Privacy in DL protects sensitive data during training/inference (e.g., medical images). Techniques: Differential Privacy (DP—add noise to gradients), Homomorphic Encryption (compute on encrypted data), or Federated Learning (covered earlier). DP ensures outputs don't leak individual data.

Why privacy?
- Regulations: GDPR, HIPAA.
- Trust: Users share without fear.
- Beginner: Use TensorFlow Privacy for DP-SGD (noisy gradients).

Example: DP on MNIST classification.

### Setup DP

Install: `pip install tensorflow-privacy` (outside).

```python
import tensorflow_privacy as tfp

# DP optimizer
optimizer_dp = tfp.privacy.DPKerasAdamOptimizer(
    noise_multiplier=1.1,  # Privacy budget trade-off
    l2_norm_clip=1.0,  # Clip gradients
    learning_rate=0.001
)

# Model with DP
model_dp = create_classifier(10)  # Full MNIST
model_dp.compile(optimizer=optimizer_dp,
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])

# Train (full MNIST for demo)
(x_train_full, y_train_full), _ = keras.datasets.mnist.load_data()
x_train_full = x_train_full.reshape(-1, 784).astype('float32') / 255.0

model_dp.fit(x_train_full, y_train_full, epochs=5, batch_size=250, verbose=0)  # Larger batch for DP
acc_dp = model_dp.evaluate(x_train_full, y_train_full, verbose=0)[1]
print(f"DP Accuracy: {acc_dp:.4f} (vs non-DP ~0.98, privacy cost epsilon~1)")
```

**Epsilon:** Measures privacy (lower=more private). Tune noise_multiplier.

**Advanced:** Opacus (PyTorch) or Concrete-ML for encryption.

## 31. Next Steps and Resources

CL and privacy make DL robust and ethical—apply to evolving apps!


# Deep Learning with Keras:

## 32. Meta-Learning

Meta-Learning, or "learning to learn," trains models to adapt quickly to new tasks with few examples (few-shot learning). Instead of learning one task deeply, it learns a general strategy for learning tasks. Useful for scenarios with scarce data, like personalized recommendations or robotics in new environments. Key: Optimize for fast adaptation (e.g., MAML—Model-Agnostic Meta-Learning).

Why meta-learning?
- Few-shot: Classify new categories with 1-5 examples.
- Efficiency: One meta-model for many tasks.
- Beginner: Use simple optimization-based methods in Keras; simulate with mini-datasets.

Example: Few-shot classification on Omniglot (handwritten characters). Meta-train on character sets, test on new ones. We'll simulate with MNIST splits (meta-train on subsets, adapt to held-out digits).

### Prepare Meta-Data

```python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

# Load MNIST, treat as "characters"
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

# For few-shot: N-way K-shot (e.g., 5-way 1-shot)
n_way = 5  # Classes per task
k_shot = 1  # Support examples per class
q_queries = 15  # Query examples per class for eval

def sample_task(x, y, n_way, k_shot, q_queries):
    # Sample classes
    classes = np.unique(y)
    sampled_classes = np.random.choice(classes, n_way, replace=False)
    
    # Support set
    support_x, support_y = [], []
    for cls in sampled_classes:
        idx = np.where(y == cls)[0]
        support_idx = np.random.choice(idx, k_shot, replace=False)
        support_x.append(x[support_idx])
        support_y.append(np.full(k_shot, np.where(sampled_classes == cls)[0][0]))  # Remap to 0-n_way-1
    
    support_x = np.vstack(support_x)
    support_y = np.hstack(support_y)
    
    # Query set
    query_x, query_y = [], []
    for cls in sampled_classes:
        idx = np.where(y == cls)[0]
        query_idx = np.random.choice(idx, q_queries, replace=False)
        query_x.append(x[query_idx])
        query_y.append(np.full(q_queries, np.where(sampled_classes == cls)[0][0]))
    
    query_x = np.vstack(query_x)
    query_y = np.hstack(query_y)
    
    return (support_x, support_y), (query_x, query_y)

# Sample a task
(support, _), (query, _) = sample_task(x_train, y_train, n_way, k_shot, q_queries)
print(f"Support shape: {support[0].shape}, Query: {query[0].shape}")
```

### Simple MAML Implementation

Inner loop: Adapt model on support set. Outer loop: Meta-update on query performance.

```python
def create_meta_model(input_shape, n_way):
    model = keras.Sequential([
        layers.Conv2D(32, 3, activation='relu', input_shape=input_shape),
        layers.Conv2D(64, 3, activation='relu'),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(n_way, activation='softmax')
    ])
    return model

meta_model = create_meta_model((28, 28, 1), n_way)
meta_model.compile(optimizer=keras.optimizers.Adam(0.01), loss='sparse_categorical_crossentropy')

# Inner loop adaptation (few steps)
def inner_adapt(model, support_x, support_y, steps=5, lr=0.1):
    adapted_model = keras.models.clone_model(model)
    adapted_model.set_weights(model.get_weights())
    opt_inner = keras.optimizers.SGD(learning_rate=lr)
    for _ in range(steps):
        with tf.GradientTape() as tape:
            preds = adapted_model(support_x, training=True)
            loss = keras.losses.sparse_categorical_crossentropy(support_y, preds)
            loss = tf.reduce_mean(loss)
        grads = tape.gradient(loss, adapted_model.trainable_variables)
        opt_inner.apply_gradients(zip(grads, adapted_model.trainable_variables))
    return adapted_model

# Meta-training step
meta_optimizer = keras.optimizers.Adam(0.001)

@tf.function
def meta_train_step(support, query):
    support_x, support_y = support
    query_x, query_y = query
    
    with tf.GradientTape() as meta_tape:
        adapted = inner_adapt(meta_model, support_x, support_y)
        query_preds = adapted(query_x, training=True)
        meta_loss = keras.losses.sparse_categorical_crossentropy(query_y, query_preds)
        meta_loss = tf.reduce_mean(meta_loss)
    
    meta_grads = meta_tape.gradient(meta_loss, meta_model.trainable_variables)
    meta_optimizer.apply_gradients(zip(meta_grads, meta_model.trainable_variables))
    return meta_loss

# Training loop (simplified)
meta_epochs = 10
meta_losses = []
for epoch in range(meta_epochs):
    total_loss = 0
    for _ in range(20):  # Tasks per epoch
        (sup, sup_y), (qry, qry_y) = sample_task(x_train, y_train, n_way, k_shot, q_queries)
        loss = meta_train_step((sup, sup_y), (qry, qry_y))
        total_loss += loss
    avg_loss = total_loss / 20
    meta_losses.append(avg_loss)
    print(f"Meta-Epoch {epoch+1}, Loss: {avg_loss:.4f}")

plt.plot(meta_losses)
plt.title('Meta-Training Loss')
plt.show()
```

### Evaluation

Test on new tasks.

```python
def evaluate_few_shot(model, x_test, y_test, n_tasks=100):
    accuracies = []
    for _ in range(n_tasks):
        (sup, sup_y), (qry, qry_y) = sample_task(x_test, y_test, n_way, k_shot, q_queries)
        adapted = inner_adapt(model, sup, sup_y)
        preds = np.argmax(adapted(qry), axis=1)
        acc = np.mean(preds == qry_y)
        accuracies.append(acc)
    return np.mean(accuracies)

test_acc = evaluate_few_shot(meta_model, x_test, y_test)
print(f"Few-shot accuracy (5-way 1-shot): {test_acc:.4f}")
```

**Tip:** For real Omniglot, use `tensorflow-datasets`. Libraries like `learn2learn` simplify MAML.

## 33. Deep Learning for Time Series

Time series data (e.g., stock prices, weather) has temporal dependencies—past influences future. DL models like LSTMs or Transformers capture sequences better than traditional stats (ARIMA). In Keras, use recurrent layers for forecasting or classification (e.g., predict next value or anomaly).

Why time series DL?
- Handles multivariate/long sequences.
- End-to-end: Feature extraction + prediction.
- Beginner: Start with univariate forecasting on sine waves, then real data like airline passengers.

Example: Forecast airline passengers (monthly totals). Use LSTM for multi-step prediction.

### Prepare Time Series Data

```python
# Load sample data (UCI Airline Passengers; mock here, load CSV in real)
import pandas as pd
# Assume data.csv with 'passengers' column
# df = pd.read_csv('data/airline-passengers.csv', parse_dates=['Month'], index_col='Month')
# For demo: Sine wave + trend
t = np.arange(0, 144, 1)  # 12 years monthly
data = 100 * np.sin(2 * np.pi * t / 12) + t / 2 + np.random.normal(0, 10, 144)
df = pd.DataFrame({'passengers': data})
df.index = pd.date_range(start='1949-01', periods=144, freq='M')

plt.plot(df.index, df['passengers'])
plt.title('Sample Time Series')
plt.show()

# Normalize
scaler = keras.utils.normalize(df.values, axis=0)
sequence_length = 12  # Use past year to predict next month

def create_sequences(data, seq_length):
    xs, ys = [], []
    for i in range(len(data) - seq_length):
        x = data[i:i+seq_length]
        y = data[i+seq_length]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

X, y = create_sequences(scaler, sequence_length)
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

print(f"Train sequences: {X_train.shape}")
```

### LSTM Model for Forecasting

```python
model_ts = keras.Sequential([
    layers.LSTM(50, activation='relu', input_shape=(sequence_length, 1), return_sequences=True),
    layers.LSTM(50, activation='relu'),
    layers.Dense(1)
])
model_ts.compile(optimizer='adam', loss='mse')

history = model_ts.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1, verbose=0)

# Plot loss
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Val')
plt.title('LSTM Training Loss')
plt.legend()
plt.show()
```

### Prediction and Evaluation

```python
# Predict on test
y_pred = model_ts.predict(X_test)

# Inverse scale (mock, assume scaler inverse)
y_test_inv = y_test * 100  # Approximate
y_pred_inv = y_pred * 100

# Plot forecasts
plt.plot(y_test_inv, label='Actual')
plt.plot(y_pred_inv, label='Predicted')
plt.title('Time Series Forecast')
plt.legend()
plt.show()

# MAE
mae = np.mean(np.abs(y_test_inv - y_pred_inv))
print(f"Mean Absolute Error: {mae:.2f}")
```

**Variations:** Use Conv1D for faster training; Transformers (Keras Attention) for long-range deps. For classification (e.g., ECG anomalies), use binary output.


