# Module 04: Introduction to TensorFlow and Keras

**Difficulty**: ⭐⭐ (Intermediate)
**Estimated Time**: 45-60 minutes
**Prerequisites**: 
- Module 00: Introduction to Neural Networks
- Module 01: Perceptrons and Activation Functions
- Module 02: Backpropagation and Gradient Descent
- Module 03: Building Neural Networks with NumPy
- Python programming and NumPy

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** the TensorFlow ecosystem and its components
2. **Work** with TensorFlow tensors and operations
3. **Build** neural networks using Keras Sequential API
4. **Design** complex architectures with Keras Functional API
5. **Train** models on real datasets (MNIST handwritten digits)
6. **Evaluate** model performance with metrics and visualization
7. **Save** and load trained models for deployment

## 1. Setup and Imports

In [None]:
# Standard libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist, fashion_mnist
from tensorflow.keras.utils import to_categorical, plot_model

# Scikit-learn for utilities
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configure plotting
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("Setup complete!")
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")

## 2. TensorFlow Basics

### What is TensorFlow?

**TensorFlow** is an open-source deep learning framework developed by Google. It provides:
- Efficient tensor operations (CPU and GPU)
- Automatic differentiation (autograd)
- High-level APIs (Keras)
- Production deployment tools (TensorFlow Serving, TensorFlow Lite)
- Distributed training support

### Tensors

A **tensor** is a multi-dimensional array (generalization of matrices):
- **Scalar** (rank-0 tensor): Single number
- **Vector** (rank-1 tensor): 1D array
- **Matrix** (rank-2 tensor): 2D array
- **3D+ Tensor** (rank-3+ tensor): Multi-dimensional arrays

### Keras

**Keras** is a high-level neural networks API that runs on top of TensorFlow. It provides:
- Simple, intuitive interface
- Modular and composable
- Easy prototyping
- Support for CNNs, RNNs, and combinations

In [None]:
# Working with TensorFlow tensors

print("TENSORFLOW TENSOR BASICS")
print("=" * 70)

# Create tensors
scalar = tf.constant(42)
vector = tf.constant([1, 2, 3, 4])
matrix = tf.constant([[1, 2], [3, 4], [5, 6]])
tensor_3d = tf.random.normal(shape=(2, 3, 4))

print("\n1. TENSOR CREATION")
print("-" * 70)
print(f"Scalar (rank-0): {scalar.numpy()}")
print(f"  Shape: {scalar.shape}, Dtype: {scalar.dtype}")

print(f"\nVector (rank-1): {vector.numpy()}")
print(f"  Shape: {vector.shape}, Dtype: {vector.dtype}")

print(f"\nMatrix (rank-2):\n{matrix.numpy()}")
print(f"  Shape: {matrix.shape}, Dtype: {matrix.dtype}")

print(f"\n3D Tensor (rank-3): Shape {tensor_3d.shape}")

# Tensor operations
print("\n2. TENSOR OPERATIONS")
print("-" * 70)

a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[5.0, 6.0], [7.0, 8.0]])

print(f"Tensor a:\n{a.numpy()}")
print(f"\nTensor b:\n{b.numpy()}")

# Element-wise operations
add = tf.add(a, b)
multiply = tf.multiply(a, b)

print(f"\nElement-wise addition:\n{add.numpy()}")
print(f"\nElement-wise multiplication:\n{multiply.numpy()}")

# Matrix multiplication
matmul = tf.matmul(a, b)
print(f"\nMatrix multiplication (a @ b):\n{matmul.numpy()}")

# Shape manipulation
reshaped = tf.reshape(a, [1, 4])
print(f"\nReshaped a from {a.shape} to {reshaped.shape}:\n{reshaped.numpy()}")

# Useful functions
print("\n3. USEFUL FUNCTIONS")
print("-" * 70)

x = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(f"Tensor x:\n{x.numpy()}")
print(f"\nMean: {tf.reduce_mean(x).numpy():.2f}")
print(f"Sum: {tf.reduce_sum(x).numpy():.2f}")
print(f"Max: {tf.reduce_max(x).numpy():.2f}")
print(f"Argmax (axis=1): {tf.argmax(x, axis=1).numpy()}")

print("\n" + "=" * 70)

## 3. Load and Explore MNIST Dataset

**MNIST** (Modified National Institute of Standards and Technology) is a classic dataset of handwritten digits (0-9):
- 60,000 training images
- 10,000 test images
- Each image is 28×28 pixels, grayscale
- 10 classes (digits 0-9)

In [None]:
# Load MNIST dataset
print("Loading MNIST dataset...")
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("\nDataset Information:")
print("=" * 70)
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Image shape: {X_train.shape[1:]}")
print(f"Pixel value range: {X_train.min()} to {X_train.max()}")
print(f"\nClasses: {np.unique(y_train)}")
print(f"Class distribution (train): {np.bincount(y_train)}")

# Visualize sample images
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
axes = axes.ravel()

for i in range(10):
    # Find first occurrence of each digit
    idx = np.where(y_train == i)[0][0]
    axes[i].imshow(X_train[idx], cmap='gray')
    axes[i].set_title(f'Digit: {i}', fontsize=12, weight='bold')
    axes[i].axis('off')

plt.suptitle('MNIST Sample Images (One Per Class)', fontsize=15, weight='bold', y=1.02)
plt.tight_layout()
plt.show()

In [None]:
# Preprocess the data
print("Preprocessing data...")

# Normalize pixel values to [0, 1]
X_train_normalized = X_train.astype('float32') / 255.0
X_test_normalized = X_test.astype('float32') / 255.0

# Flatten images for fully-connected network
# Shape: (num_samples, 28, 28) -> (num_samples, 784)
X_train_flat = X_train_normalized.reshape(-1, 28 * 28)
X_test_flat = X_test_normalized.reshape(-1, 28 * 28)

# Convert labels to one-hot encoding
# Example: 3 -> [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
y_train_onehot = to_categorical(y_train, num_classes=10)
y_test_onehot = to_categorical(y_test, num_classes=10)

print("\nPreprocessing complete!")
print(f"Flattened shape: {X_train_flat.shape}")
print(f"One-hot labels shape: {y_train_onehot.shape}")
print(f"\nExample label conversion:")
print(f"  Original: {y_train[0]}")
print(f"  One-hot: {y_train_onehot[0]}")

## 4. Keras Sequential API

The **Sequential API** is the simplest way to build models in Keras. It's used for linear stacks of layers.

### Building a Model

Two ways to create a Sequential model:

**Method 1: Pass layers to constructor**
```python
model = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])
```

**Method 2: Use .add() method**
```python
model = models.Sequential()
model.add(layers.Dense(128, activation='relu', input_shape=(784,)))
model.add(layers.Dense(10, activation='softmax'))
```

In [None]:
# Build a simple neural network using Sequential API

print("BUILDING MODEL WITH SEQUENTIAL API")
print("=" * 70)

# Create model
model_sequential = models.Sequential([
    # Input layer (implicit) + First hidden layer
    layers.Dense(128, activation='relu', input_shape=(784,), name='hidden_1'),
    
    # Second hidden layer
    layers.Dense(64, activation='relu', name='hidden_2'),
    
    # Output layer (10 classes)
    layers.Dense(10, activation='softmax', name='output')
])

print("\nModel created successfully!")
print("\nModel Architecture:")
model_sequential.summary()

# Compile the model
print("\n" + "=" * 70)
print("COMPILING MODEL")
print("=" * 70)

model_sequential.compile(
    optimizer='adam',                    # Optimizer: Adam (adaptive learning rate)
    loss='categorical_crossentropy',     # Loss: Cross-entropy for multi-class
    metrics=['accuracy']                 # Track accuracy during training
)

print("\nModel compiled!")
print("Optimizer: Adam")
print("Loss: Categorical Cross-Entropy")
print("Metrics: Accuracy")

In [None]:
# Train the model

print("TRAINING MODEL")
print("=" * 70)

history = model_sequential.fit(
    X_train_flat,           # Training data
    y_train_onehot,         # Training labels
    epochs=10,              # Number of epochs
    batch_size=128,         # Batch size
    validation_split=0.2,   # Use 20% of training data for validation
    verbose=1               # Show progress
)

print("\n" + "=" * 70)
print("Training complete!")
print("=" * 70)

In [None]:
# Plot training history

def plot_training_history(history, title="Training History"):
    """
    Plot training and validation loss/accuracy.
    
    Parameters:
    -----------
    history : keras History object
        Training history from model.fit()
    title : str
        Plot title
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Plot loss
    ax1.plot(history.history['loss'], label='Training Loss', linewidth=2)
    ax1.plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
    ax1.set_xlabel('Epoch', fontsize=12, weight='bold')
    ax1.set_ylabel('Loss', fontsize=12, weight='bold')
    ax1.set_title('Model Loss', fontsize=13, weight='bold')
    ax1.legend(fontsize=11)
    ax1.grid(True, alpha=0.3)
    
    # Plot accuracy
    ax2.plot(history.history['accuracy'], label='Training Accuracy', linewidth=2)
    ax2.plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
    ax2.set_xlabel('Epoch', fontsize=12, weight='bold')
    ax2.set_ylabel('Accuracy', fontsize=12, weight='bold')
    ax2.set_title('Model Accuracy', fontsize=13, weight='bold')
    ax2.legend(fontsize=11)
    ax2.grid(True, alpha=0.3)
    
    plt.suptitle(title, fontsize=15, weight='bold', y=1.02)
    plt.tight_layout()
    plt.show()

plot_training_history(history, "MNIST Training - Sequential Model")

In [None]:
# Evaluate on test set

print("EVALUATING ON TEST SET")
print("=" * 70)

test_loss, test_accuracy = model_sequential.evaluate(
    X_test_flat, 
    y_test_onehot,
    verbose=0
)

print(f"\nTest Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")
print("=" * 70)

# Make predictions
predictions = model_sequential.predict(X_test_flat[:10], verbose=0)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = y_test[:10]

print("\nSample Predictions:")
print("-" * 70)
print("True | Predicted | Confidence")
print("-" * 70)
for i in range(10):
    confidence = predictions[i][predicted_classes[i]]
    correct = "✓" if predicted_classes[i] == true_classes[i] else "✗"
    print(f" {true_classes[i]}   |     {predicted_classes[i]}     | {confidence:.4f}   {correct}")
print("-" * 70)

## 5. Keras Functional API

The **Functional API** is more flexible than Sequential. It allows:
- Multi-input and multi-output models
- Shared layers
- Residual connections
- Complex architectures

### Syntax

```python
# Define input
inputs = layers.Input(shape=(784,))

# Build layers (connect them like a graph)
x = layers.Dense(128, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

# Create model
model = models.Model(inputs=inputs, outputs=outputs)
```

In [None]:
# Build model using Functional API

print("BUILDING MODEL WITH FUNCTIONAL API")
print("=" * 70)

# Define input
inputs = layers.Input(shape=(784,), name='input')

# Build network
x = layers.Dense(256, activation='relu', name='hidden_1')(inputs)
x = layers.Dropout(0.3, name='dropout_1')(x)  # Dropout for regularization
x = layers.Dense(128, activation='relu', name='hidden_2')(x)
x = layers.Dropout(0.3, name='dropout_2')(x)
x = layers.Dense(64, activation='relu', name='hidden_3')(x)
outputs = layers.Dense(10, activation='softmax', name='output')(x)

# Create model
model_functional = models.Model(inputs=inputs, outputs=outputs, name='mnist_functional')

print("\nModel created!")
print("\nModel Architecture:")
model_functional.summary()

# Compile
model_functional.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("\nModel compiled and ready for training!")

In [None]:
# Train functional model

print("TRAINING FUNCTIONAL MODEL")
print("=" * 70)

history_functional = model_functional.fit(
    X_train_flat,
    y_train_onehot,
    epochs=10,
    batch_size=128,
    validation_split=0.2,
    verbose=1
)

# Evaluate
test_loss_func, test_accuracy_func = model_functional.evaluate(
    X_test_flat, 
    y_test_onehot,
    verbose=0
)

print("\n" + "=" * 70)
print("RESULTS COMPARISON")
print("=" * 70)
print(f"Sequential Model:  {test_accuracy * 100:.2f}%")
print(f"Functional Model:  {test_accuracy_func * 100:.2f}%")
print("=" * 70)

# Plot history
plot_training_history(history_functional, "MNIST Training - Functional Model (with Dropout)")

## 6. Model Evaluation and Visualization

Let's do a comprehensive evaluation of our trained model.

In [None]:
# Generate predictions for entire test set
y_pred_proba = model_functional.predict(X_test_flat, verbose=0)
y_pred = np.argmax(y_pred_proba, axis=1)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
           xticklabels=range(10), yticklabels=range(10),
           cbar_kws={'label': 'Count'})
plt.xlabel('Predicted Label', fontsize=13, weight='bold')
plt.ylabel('True Label', fontsize=13, weight='bold')
plt.title('Confusion Matrix - MNIST Test Set', fontsize=15, weight='bold')
plt.tight_layout()
plt.show()

# Per-class accuracy
print("\nPER-CLASS ACCURACY")
print("=" * 70)
for digit in range(10):
    mask = y_test == digit
    accuracy = np.mean(y_pred[mask] == digit)
    print(f"Digit {digit}: {accuracy * 100:.2f}%")
print("=" * 70)

In [None]:
# Visualize predictions (correct and incorrect)

def visualize_predictions(X, y_true, y_pred, num_samples=10, incorrect_only=False):
    """
    Visualize model predictions.
    
    Parameters:
    -----------
    X : ndarray
        Images
    y_true : ndarray
        True labels
    y_pred : ndarray
        Predicted labels
    num_samples : int
        Number of samples to show
    incorrect_only : bool
        Show only incorrect predictions
    """
    if incorrect_only:
        # Find incorrect predictions
        incorrect_mask = y_pred != y_true
        indices = np.where(incorrect_mask)[0][:num_samples]
        title = 'Incorrect Predictions'
    else:
        indices = np.random.choice(len(X), num_samples, replace=False)
        title = 'Random Predictions'
    
    fig, axes = plt.subplots(2, 5, figsize=(14, 6))
    axes = axes.ravel()
    
    for i, idx in enumerate(indices):
        axes[i].imshow(X[idx].reshape(28, 28), cmap='gray')
        
        true_label = y_true[idx]
        pred_label = y_pred[idx]
        
        color = 'green' if pred_label == true_label else 'red'
        axes[i].set_title(f'True: {true_label}\nPred: {pred_label}', 
                         fontsize=11, weight='bold', color=color)
        axes[i].axis('off')
    
    plt.suptitle(title, fontsize=15, weight='bold', y=1.02)
    plt.tight_layout()
    plt.show()

# Show random predictions
visualize_predictions(X_test_normalized, y_test, y_pred, num_samples=10)

# Show incorrect predictions
visualize_predictions(X_test_normalized, y_test, y_pred, 
                     num_samples=10, incorrect_only=True)

## 7. Saving and Loading Models

After training, we need to save models for later use or deployment.

In [None]:
import os
import tempfile

# Create temporary directory for saving models
save_dir = tempfile.mkdtemp()
print(f"Saving models to: {save_dir}")

print("\n" + "=" * 70)
print("SAVING MODELS")
print("=" * 70)

# Method 1: Save entire model (architecture + weights + optimizer state)
model_path = os.path.join(save_dir, 'mnist_model.h5')
model_functional.save(model_path)
print(f"\n✓ Full model saved to: {model_path}")
print(f"  File size: {os.path.getsize(model_path) / 1e6:.2f} MB")

# Method 2: Save only weights
weights_path = os.path.join(save_dir, 'model_weights.h5')
model_functional.save_weights(weights_path)
print(f"\n✓ Weights saved to: {weights_path}")
print(f"  File size: {os.path.getsize(weights_path) / 1e6:.2f} MB")

# Method 3: Save architecture as JSON
architecture_path = os.path.join(save_dir, 'model_architecture.json')
with open(architecture_path, 'w') as f:
    f.write(model_functional.to_json())
print(f"\n✓ Architecture saved to: {architecture_path}")

print("\n" + "=" * 70)
print("LOADING MODELS")
print("=" * 70)

# Load full model
loaded_model = models.load_model(model_path)
print("\n✓ Full model loaded successfully")

# Verify it works
test_loss_loaded, test_accuracy_loaded = loaded_model.evaluate(
    X_test_flat, y_test_onehot, verbose=0
)

print("\nVerification:")
print(f"  Original model accuracy: {test_accuracy_func * 100:.2f}%")
print(f"  Loaded model accuracy:   {test_accuracy_loaded * 100:.2f}%")

if abs(test_accuracy_func - test_accuracy_loaded) < 1e-6:
    print("  ✓ Models are identical!")
else:
    print("  ⚠️ Warning: Models differ!")

print("\n" + "=" * 70)

## 8. Exercises

### Exercise 1: Build and Train on Fashion-MNIST

**Fashion-MNIST** is a drop-in replacement for MNIST with clothing images instead of digits:
- Same size (28×28 grayscale)
- 10 classes: T-shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot
- More challenging than MNIST

**Tasks**:
1. Load Fashion-MNIST dataset
2. Preprocess the data
3. Build a neural network (your choice of architecture)
4. Train for at least 10 epochs
5. Achieve >85% test accuracy
6. Visualize some predictions

In [None]:
# Exercise 1: Your solution here

# TODO: Load Fashion-MNIST
# (X_train_fashion, y_train_fashion), (X_test_fashion, y_test_fashion) = fashion_mnist.load_data()

# TODO: Preprocess data (normalize, flatten, one-hot encode)

# TODO: Build model (Sequential or Functional API)

# TODO: Compile and train

# TODO: Evaluate and visualize

In [None]:
# Solution to Exercise 1

# Class names for Fashion-MNIST
fashion_class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                      'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Load data
print("Loading Fashion-MNIST...")
(X_train_fashion, y_train_fashion), (X_test_fashion, y_test_fashion) = fashion_mnist.load_data()

# Preprocess
X_train_fashion_norm = X_train_fashion.astype('float32') / 255.0
X_test_fashion_norm = X_test_fashion.astype('float32') / 255.0
X_train_fashion_flat = X_train_fashion_norm.reshape(-1, 784)
X_test_fashion_flat = X_test_fashion_norm.reshape(-1, 784)
y_train_fashion_oh = to_categorical(y_train_fashion, 10)
y_test_fashion_oh = to_categorical(y_test_fashion, 10)

print(f"Dataset loaded: {X_train_fashion.shape[0]} training samples")

# Build model
model_fashion = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(784,)),
    layers.Dropout(0.3),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model_fashion.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("\nModel architecture:")
model_fashion.summary()

# Train
print("\nTraining...")
history_fashion = model_fashion.fit(
    X_train_fashion_flat,
    y_train_fashion_oh,
    epochs=15,
    batch_size=128,
    validation_split=0.2,
    verbose=1
)

# Evaluate
test_loss_fashion, test_acc_fashion = model_fashion.evaluate(
    X_test_fashion_flat, y_test_fashion_oh, verbose=0
)

print("\n" + "=" * 70)
print(f"Fashion-MNIST Test Accuracy: {test_acc_fashion * 100:.2f}%")
print("=" * 70)

if test_acc_fashion >= 0.85:
    print("✓ Target achieved: >85% accuracy!")

# Plot training history
plot_training_history(history_fashion, "Fashion-MNIST Training")

# Visualize predictions
y_pred_fashion = np.argmax(model_fashion.predict(X_test_fashion_flat, verbose=0), axis=1)

fig, axes = plt.subplots(2, 5, figsize=(14, 6))
axes = axes.ravel()
indices = np.random.choice(len(X_test_fashion), 10, replace=False)

for i, idx in enumerate(indices):
    axes[i].imshow(X_test_fashion_norm[idx], cmap='gray')
    true_label = fashion_class_names[y_test_fashion[idx]]
    pred_label = fashion_class_names[y_pred_fashion[idx]]
    color = 'green' if y_pred_fashion[idx] == y_test_fashion[idx] else 'red'
    axes[i].set_title(f'True: {true_label}\nPred: {pred_label}', 
                     fontsize=9, weight='bold', color=color)
    axes[i].axis('off')

plt.suptitle('Fashion-MNIST Predictions', fontsize=15, weight='bold', y=1.02)
plt.tight_layout()
plt.show()

### Exercise 2: Implement Early Stopping

**Early stopping** is a regularization technique that stops training when validation loss stops improving.

**Tasks**:
1. Research Keras callbacks (look up `EarlyStopping`)
2. Train a model on MNIST with early stopping
3. Set `patience=3` (stop if no improvement for 3 epochs)
4. Monitor validation loss
5. Compare total epochs with and without early stopping

In [None]:
# Exercise 2: Your solution here

from tensorflow.keras.callbacks import EarlyStopping

# TODO: Create EarlyStopping callback
# early_stop = EarlyStopping(...)

# TODO: Build and compile model

# TODO: Train with callbacks=[early_stop]

# TODO: Check how many epochs it trained

In [None]:
# Solution to Exercise 2

from tensorflow.keras.callbacks import EarlyStopping

print("IMPLEMENTING EARLY STOPPING")
print("=" * 70)

# Create callback
early_stop = EarlyStopping(
    monitor='val_loss',      # Monitor validation loss
    patience=3,              # Stop if no improvement for 3 epochs
    restore_best_weights=True,  # Restore weights from best epoch
    verbose=1
)

# Build model
model_early = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(784,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model_early.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Train with early stopping
print("\nTraining with early stopping (max 50 epochs)...")
history_early = model_early.fit(
    X_train_flat,
    y_train_onehot,
    epochs=50,  # Set high number
    batch_size=128,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=1
)

actual_epochs = len(history_early.history['loss'])

print("\n" + "=" * 70)
print(f"Early stopping triggered!")
print(f"  Trained for: {actual_epochs} epochs (out of max 50)")
print(f"  Best validation loss achieved at epoch: {actual_epochs - 3}")
print("=" * 70)

# Evaluate
test_loss_early, test_acc_early = model_early.evaluate(
    X_test_flat, y_test_onehot, verbose=0
)

print(f"\nTest Accuracy: {test_acc_early * 100:.2f}%")

# Plot
plot_training_history(history_early, "Training with Early Stopping")

### Exercise 3: Compare Optimizers

Test different optimizers on MNIST and compare their performance:

**Optimizers to test**:
1. SGD (Stochastic Gradient Descent)
2. RMSprop
3. Adam
4. AdamW (Adam with weight decay)

**Tasks**:
1. Build identical models for each optimizer
2. Train for 10 epochs each
3. Track training time
4. Compare final accuracy and convergence speed
5. Plot loss curves on same graph

In [None]:
# Exercise 3: Your solution here

# TODO: Test different optimizers
# optimizers = ['sgd', 'rmsprop', 'adam', ...]

# TODO: For each optimizer:
#   - Build model
#   - Compile with optimizer
#   - Train and time
#   - Record results

# TODO: Compare and visualize

In [None]:
# Solution to Exercise 3

import time

print("COMPARING OPTIMIZERS")
print("=" * 70)

optimizers = [
    ('SGD', keras.optimizers.SGD(learning_rate=0.01)),
    ('RMSprop', keras.optimizers.RMSprop(learning_rate=0.001)),
    ('Adam', keras.optimizers.Adam(learning_rate=0.001)),
]

results = []
histories = {}

for name, optimizer in optimizers:
    print(f"\nTesting: {name}")
    print("-" * 70)
    
    # Build model
    model = models.Sequential([
        layers.Dense(128, activation='relu', input_shape=(784,)),
        layers.Dense(64, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    
    model.compile(
        optimizer=optimizer,
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Train and time
    start_time = time.time()
    history = model.fit(
        X_train_flat,
        y_train_onehot,
        epochs=10,
        batch_size=128,
        validation_split=0.2,
        verbose=0
    )
    training_time = time.time() - start_time
    
    # Evaluate
    test_loss, test_acc = model.evaluate(X_test_flat, y_test_onehot, verbose=0)
    
    results.append({
        'name': name,
        'test_acc': test_acc,
        'time': training_time,
        'final_val_loss': history.history['val_loss'][-1]
    })
    
    histories[name] = history
    
    print(f"Test Accuracy: {test_acc * 100:.2f}%")
    print(f"Training Time: {training_time:.2f}s")

# Summary table
print("\n" + "=" * 80)
print("OPTIMIZER COMPARISON")
print("=" * 80)
print(f"{'Optimizer':<12} | {'Test Acc':<10} | {'Val Loss':<10} | {'Time (s)':<10}")
print("-" * 80)
for r in results:
    print(f"{r['name']:<12} | {r['test_acc']*100:>9.2f}% | {r['final_val_loss']:>9.4f} | {r['time']:>9.2f}s")
print("=" * 80)

# Plot comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

for name, history in histories.items():
    ax1.plot(history.history['loss'], label=f'{name} (train)', linewidth=2)
    ax2.plot(history.history['val_loss'], label=f'{name} (val)', linewidth=2)

ax1.set_xlabel('Epoch', fontsize=12, weight='bold')
ax1.set_ylabel('Loss', fontsize=12, weight='bold')
ax1.set_title('Training Loss', fontsize=13, weight='bold')
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)

ax2.set_xlabel('Epoch', fontsize=12, weight='bold')
ax2.set_ylabel('Loss', fontsize=12, weight='bold')
ax2.set_title('Validation Loss', fontsize=13, weight='bold')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

plt.suptitle('Optimizer Comparison', fontsize=15, weight='bold', y=1.02)
plt.tight_layout()
plt.show()

best = max(results, key=lambda x: x['test_acc'])
fastest = min(results, key=lambda x: x['time'])

print(f"\nBest Accuracy: {best['name']} ({best['test_acc']*100:.2f}%)")
print(f"Fastest Training: {fastest['name']} ({fastest['time']:.2f}s)")

## 9. Summary

Congratulations! You've learned to use TensorFlow and Keras, the industry-standard tools for deep learning. This is a major milestone in your deep learning journey!

### Key Accomplishments

1. **TensorFlow Fundamentals**
   - Understood tensors and operations
   - Learned automatic differentiation
   - Explored the TensorFlow ecosystem

2. **Keras APIs**
   - **Sequential API**: Simple, linear architectures
   - **Functional API**: Complex, flexible architectures
   - Model compilation and training

3. **Real-World Application**
   - Trained on MNIST (98%+ accuracy)
   - Handled Fashion-MNIST (85%+ accuracy)
   - Production-ready workflows

4. **Best Practices**
   - Data preprocessing and normalization
   - Model evaluation and metrics
   - Saving/loading models
   - Using callbacks (early stopping)

### Technical Insights

**Sequential vs Functional API**:
- Sequential: Simple, one input → one output, linear stack
- Functional: Flexible, multiple inputs/outputs, complex graphs

**Model Compilation**:
```python
model.compile(
    optimizer='adam',                # How to update weights
    loss='categorical_crossentropy', # What to minimize
    metrics=['accuracy']             # What to track
)
```

**Training**:
```python
model.fit(
    X_train, y_train,
    epochs=10,                # Number of full passes through data
    batch_size=128,           # Samples per gradient update
    validation_split=0.2      # Portion for validation
)
```

### Important Concepts

- **Epochs**: Complete pass through training data
- **Batch Size**: Number of samples processed before weight update
- **Validation Split**: Data held out for monitoring overfitting
- **Callbacks**: Functions called during training (early stopping, checkpoints)
- **One-Hot Encoding**: Converting class labels to binary vectors
- **Dropout**: Regularization by randomly dropping neurons

### Comparison: NumPy vs Keras

| Aspect | NumPy (Module 03) | Keras (Module 04) |
|--------|------------------|------------------|
| **Code Length** | ~200 lines | ~10 lines |
| **Training Speed** | Slow (CPU only) | Fast (GPU support) |
| **Debugging** | Full control | Higher level |
| **Learning Value** | High (understand internals) | High (practical skills) |
| **Production Use** | Research/Education | Industry Standard |

### What's Next?

Continue your deep learning journey:

**Next Modules** (Advanced Topics):
- **Module 05**: Convolutional Neural Networks (CNNs)
- **Module 06**: Recurrent Neural Networks (RNNs)
- **Module 07**: Advanced Architectures (ResNet, Transformers)
- **Module 08**: Transfer Learning and Fine-Tuning
- **Module 09**: Model Optimization and Deployment

**Skills to Develop**:
- Hyperparameter tuning
- Data augmentation
- Regularization techniques
- Custom layers and losses
- Distributed training

### Additional Resources

**Official Documentation**:
- TensorFlow Guide: https://www.tensorflow.org/guide
- Keras API Reference: https://keras.io/api/
- TensorFlow Tutorials: https://www.tensorflow.org/tutorials

**Books**:
- "Deep Learning with Python" by François Chollet (Keras creator)
- "Hands-On Machine Learning" by Aurélien Géron

**Courses**:
- TensorFlow in Practice Specialization (Coursera)
- Fast.ai Practical Deep Learning
- DeepLearning.AI TensorFlow Developer Certificate

**Practice**:
- Kaggle competitions and datasets
- Build projects: Image classifier, chatbot, recommendation system
- Contribute to open-source ML projects

---

**Congratulations!** You've completed the Deep Learning Fundamentals module. You now have:
- Theoretical understanding (Modules 00-02)
- Implementation skills (Module 03)
- Professional tools expertise (Module 04)

You're ready to build real-world deep learning applications!