# Introduction to Deep Learning

**Interactive Notebook** - Section 5: Deep Learning Fundamentals

Welcome to the fascinating world of Deep Learning! This notebook will guide you through the fundamental concepts of neural networks and deep learning with hands-on exercises and interactive visualizations.

## 🎯 Learning Objectives

By the end of this notebook, you will:
- ✅ Understand the basic concepts of neural networks
- ✅ Learn about perceptrons, activation functions, and backpropagation
- ✅ Build your first neural networks using TensorFlow and PyTorch
- ✅ Implement deep learning models for real-world problems
- ✅ Understand training techniques and optimization
- ✅ Evaluate and tune deep learning models

## 📋 Prerequisites

- Completion of "Introduction to Machine Learning" notebook
- Understanding of basic ML concepts (supervised learning, model evaluation)
- Basic calculus and linear algebra knowledge
- Python programming skills

**Estimated Time**: 3-4 hours

## 🔧 Setup and Installation

Let's set up our environment with the necessary deep learning libraries.

In [None]:
# Install required packages
!pip install -q tensorflow torch torchvision matplotlib seaborn numpy pandas scikit-learn ipywidgets

# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_moons, make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import ipywidgets as widgets
from IPython.display import display, Markdown
import warnings
warnings.filterwarnings('ignore')

# Import deep learning frameworks
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)
torch.manual_seed(42)

# Set visualization style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ Deep learning environment setup complete!")
print(f"TensorFlow version: {tf.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"GPU available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")

## 🧠 What is Deep Learning?

Deep Learning is a subset of machine learning based on artificial neural networks. These networks are inspired by the human brain and can learn complex patterns from large amounts of data.

### Key Concepts:

1. **Neurons**: Basic computational units that receive inputs, apply weights, and produce outputs
2. **Layers**: Collections of neurons (input, hidden, output layers)
3. **Activation Functions**: Non-linear functions that enable neural networks to learn complex patterns
4. **Backpropagation**: Algorithm for training neural networks by updating weights
5. **Gradient Descent**: Optimization algorithm to minimize the loss function

### Types of Neural Networks:
- **Feedforward Neural Networks (FNN)**: Basic neural network architecture
- **Convolutional Neural Networks (CNN)**: Specialized for image processing
- **Recurrent Neural Networks (RNN)**: Specialized for sequential data
- **Transformers**: State-of-the-art architecture for NLP tasks

In [None]:
# Visualize neural network architectures
def visualize_neural_networks():
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle('Neural Network Architectures', fontsize=16, fontweight='bold')

    # Single Perceptron
    ax = axes[0, 0]
    ax.scatter([0], [0], s=200, c='lightblue', marker='o', label='Input')
    ax.scatter([1], [0], s=200, c='lightgreen', marker='o', label='Neuron')
    ax.scatter([2], [0], s=200, c='lightcoral', marker='o', label='Output')
    ax.arrow(0.15, 0, 0.7, 0, head_width=0.05, head_length=0.05, fc='black', ec='black')
    ax.arrow(1.15, 0, 0.7, 0, head_width=0.05, head_length=0.05, fc='black', ec='black')
    ax.text(0, -0.3, 'x₁', ha='center', fontsize=12)
    ax.text(1, -0.3, 'w₁', ha='center', fontsize=12)
    ax.text(2, -0.3, 'y', ha='center', fontsize=12)
    ax.set_xlim(-0.5, 2.5)
    ax.set_ylim(-0.5, 0.5)
    ax.set_title('Single Perceptron')
    ax.legend()
    ax.axis('off')

    # Multi-layer Perceptron
    ax = axes[0, 1]
    layers = [3, 4, 2]  # Input, hidden, output
    layer_positions = [0, 1, 2]
    colors = ['lightblue', 'lightgreen', 'lightcoral']

    for i, (n_neurons, x_pos, color) in enumerate(zip(layers, layer_positions, colors)):
        y_positions = np.linspace(-1, 1, n_neurons)
        for y_pos in y_positions:
            ax.scatter(x_pos, y_pos, s=200, c=color, marker='o')

        # Draw connections
        if i < len(layers) - 1:
            next_y_positions = np.linspace(-1, 1, layers[i + 1])
            for y1 in y_positions:
                for y2 in next_y_positions:
                    ax.arrow(x_pos + 0.1, y1, 0.8, y2 - y1, 
                            head_width=0.02, head_length=0.05, 
                            fc='gray', ec='gray', alpha=0.3)

    ax.set_xlim(-0.5, 2.5)
    ax.set_ylim(-1.5, 1.5)
    ax.set_title('Multi-layer Perceptron')
    ax.axis('off')

    # CNN Architecture
    ax = axes[1, 0]
    # Input image
    ax.add_patch(plt.Rectangle((0, 0), 0.5, 1, fill=True, color='lightblue', alpha=0.7))
    ax.text(0.25, 1.1, 'Input\n32x32', ha='center', fontsize=10)
    # Convolutional layers
    ax.add_patch(plt.Rectangle((0.8, 0), 0.4, 1, fill=True, color='lightgreen', alpha=0.7))
    ax.text(1.0, 1.1, 'Conv\n28x28', ha='center', fontsize=10)
    ax.add_patch(plt.Rectangle((1.4, 0), 0.4, 1, fill=True, color='lightgreen', alpha=0.7))
    ax.text(1.6, 1.1, 'Conv\n24x24', ha='center', fontsize=10)
    # Pooling
    ax.add_patch(plt.Rectangle((2.0, 0), 0.4, 1, fill=True, color='yellow', alpha=0.7))
    ax.text(2.2, 1.1, 'Pool\n12x12', ha='center', fontsize=10)
    # Fully connected
    ax.add_patch(plt.Rectangle((2.6, 0.2), 0.4, 0.6, fill=True, color='lightcoral', alpha=0.7))
    ax.text(2.8, 1.1, 'FC\n128', ha='center', fontsize=10)
    # Output
    ax.add_patch(plt.Rectangle((3.2, 0.3), 0.4, 0.4, fill=True, color='orange', alpha=0.7))
    ax.text(3.4, 1.1, 'Out\n10', ha='center', fontsize=10)

    ax.set_xlim(-0.2, 3.8)
    ax.set_ylim(-0.2, 1.3)
    ax.set_title('CNN Architecture')
    ax.axis('off')

    # RNN Architecture
    ax = axes[1, 1]
    # Input sequence
    for i in range(4):
        ax.scatter(0, i*0.3, s=100, c='lightblue', marker='s')
        ax.text(-0.1, i*0.3, f'x{i+1}', ha='right', va='center')
    # RNN cells
    for i in range(4):
        circle = plt.Circle((0.5, i*0.3), 0.08, fill=True, color='lightgreen', alpha=0.7)
        ax.add_patch(circle)
        ax.text(0.5, i*0.3, 'RNN', ha='center', va='center', fontsize=8)
        # Connections
        if i > 0:
            ax.arrow(0.5, (i-1)*0.3, 0, 0.25, head_width=0.02, head_length=0.02, fc='red', ec='red')
        ax.arrow(0.15, i*0.3, 0.25, 0, head_width=0.02, head_length=0.02, fc='black', ec='black')
        ax.arrow(0.58, i*0.3, 0.25, 0, head_width=0.02, head_length=0.02, fc='black', ec='black')
    # Output
    for i in range(4):
        ax.scatter(1, i*0.3, s=100, c='lightcoral', marker='s')
        ax.text(1.1, i*0.3, f'y{i+1}', ha='left', va='center')

    ax.set_xlim(-0.3, 1.3)
    ax.set_ylim(-0.2, 1.2)
    ax.set_title('RNN Architecture')
    ax.axis('off')

    plt.tight_layout()
    plt.show()

visualize_neural_networks()

## 🔢 Activation Functions

Activation functions introduce non-linearity into neural networks, allowing them to learn complex patterns. Let's explore the most common activation functions.

In [None]:
# Explore activation functions
def explore_activation_functions():
    x = np.linspace(-5, 5, 1000)

    # Define activation functions
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))

    def tanh(x):
        return np.tanh(x)

    def relu(x):
        return np.maximum(0, x)

    def leaky_relu(x, alpha=0.01):
        return np.where(x > 0, x, alpha * x)

    def elu(x, alpha=1.0):
        return np.where(x > 0, x, alpha * (np.exp(x) - 1))

    # Create subplots
    fig, axes = plt.subplots(2, 3, figsize=(18, 10))
    fig.suptitle('Activation Functions', fontsize=16, fontweight='bold')

    # Sigmoid
    ax = axes[0, 0]
    y = sigmoid(x)
    ax.plot(x, y, 'b-', linewidth=2, label='sigmoid(x)')
    ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    ax.axvline(x=0, color='k', linestyle='--', alpha=0.3)
    ax.set_title('Sigmoid')
    ax.set_xlabel('x')
    ax.set_ylabel('sigmoid(x)')
    ax.grid(True, alpha=0.3)
    ax.legend()

    # Tanh
    ax = axes[0, 1]
    y = tanh(x)
    ax.plot(x, y, 'g-', linewidth=2, label='tanh(x)')
    ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    ax.axvline(x=0, color='k', linestyle='--', alpha=0.3)
    ax.set_title('Hyperbolic Tangent (tanh)')
    ax.set_xlabel('x')
    ax.set_ylabel('tanh(x)')
    ax.grid(True, alpha=0.3)
    ax.legend()

    # ReLU
    ax = axes[0, 2]
    y = relu(x)
    ax.plot(x, y, 'r-', linewidth=2, label='ReLU(x)')
    ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    ax.axvline(x=0, color='k', linestyle='--', alpha=0.3)
    ax.set_title('Rectified Linear Unit (ReLU)')
    ax.set_xlabel('x')
    ax.set_ylabel('ReLU(x)')
    ax.grid(True, alpha=0.3)
    ax.legend()

    # Leaky ReLU
    ax = axes[1, 0]
    y = leaky_relu(x)
    ax.plot(x, y, 'm-', linewidth=2, label='Leaky ReLU(x)')
    ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    ax.axvline(x=0, color='k', linestyle='--', alpha=0.3)
    ax.set_title('Leaky ReLU')
    ax.set_xlabel('x')
    ax.set_ylabel('Leaky ReLU(x)')
    ax.grid(True, alpha=0.3)
    ax.legend()

    # ELU
    ax = axes[1, 1]
    y = elu(x)
    ax.plot(x, y, 'c-', linewidth=2, label='ELU(x)')
    ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    ax.axvline(x=0, color='k', linestyle='--', alpha=0.3)
    ax.set_title('Exponential Linear Unit (ELU)')
    ax.set_xlabel('x')
    ax.set_ylabel('ELU(x)')
    ax.grid(True, alpha=0.3)
    ax.legend()

    # Derivatives
    ax = axes[1, 2]
    dy_sigmoid = sigmoid(x) * (1 - sigmoid(x))
    dy_tanh = 1 - tanh(x)**2
    dy_relu = np.where(x > 0, 1, 0)

    ax.plot(x, dy_sigmoid, 'b--', alpha=0.7, label="sigmoid'")
    ax.plot(x, dy_tanh, 'g--', alpha=0.7, label="tanh'")
    ax.plot(x, dy_relu, 'r--', alpha=0.7, label="ReLU'")
    ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    ax.axvline(x=0, color='k', linestyle='--', alpha=0.3)
    ax.set_title('Derivatives')
    ax.set_xlabel('x')
    ax.set_ylabel("f'(x)")
    ax.grid(True, alpha=0.3)
    ax.legend()

    plt.tight_layout()
    plt.show()

    # Print properties
    print("📊 Activation Function Properties:")
    print("="*50)
    print(f"{'Function':<15} {'Range':<10} {'Vanishing':<10} {'Computing':<12}")
    print("="*50)
    print(f"{'Sigmoid':<15} {'(0,1)':<10} {'Yes':<10} {'Expensive':<12}")
    print(f"{'Tanh':<15} {'(-1,1)':<10} {'Yes':<10} {'Expensive':<12}")
    print(f"{'ReLU':<15} {'[0,∞)':<10} {'No':<10} {'Cheap':<12}")
    print(f"{'Leaky ReLU':<15} {'(-∞,∞)':<10} {'No':<10} {'Cheap':<12}")
    print(f"{'ELU':<15} {'(-α,∞)':<10} {'No':<10} {'Expensive':<12}")

explore_activation_functions()

## 🏗️ Building Your First Neural Network with TensorFlow

Let's build our first neural network using TensorFlow/Keras to solve a classification problem.

In [None]:
# Generate synthetic dataset
def generate_complex_dataset():
    # Generate different types of datasets
    X_moons, y_moons = make_moons(n_samples=1000, noise=0.2, random_state=42)
    X_circles, y_circles = make_circles(n_samples=1000, noise=0.1, factor=0.5, random_state=42)
    X_class, y_class = make_classification(n_samples=1000, n_features=2, n_redundant=0, 
                                          n_informative=2, random_state=42, n_clusters_per_class=1)

    # Visualize datasets
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    fig.suptitle('Synthetic Datasets for Neural Network Training', fontsize=16, fontweight='bold')

    datasets = [(X_moons, y_moons, 'Moons'), (X_circles, y_circles, 'Circles'), (X_class, y_class, 'Linear')]

    for i, (X, y, title) in enumerate(datasets):
        ax = axes[i]
        scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', alpha=0.7)
        ax.set_title(title)
        ax.set_xlabel('Feature 1')
        ax.set_ylabel('Feature 2')
        ax.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    return X_moons, y_moons  # We'll use the moons dataset for our first model

X, y = generate_complex_dataset()

# Split and scale the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"✅ Dataset prepared:")
print(f"Training set: {X_train_scaled.shape}")
print(f"Test set: {X_test_scaled.shape}")
print(f"Classes: {np.unique(y)}")

In [None]:
# Build TensorFlow/Keras neural network
def build_tensorflow_model(input_shape, hidden_layers=[64, 32], activation='relu'):
    model = keras.Sequential()
    
    # Input layer
    model.add(layers.InputLayer(input_shape=input_shape))
    
    # Hidden layers
    for units in hidden_layers:
        model.add(layers.Dense(units, activation=activation))
        model.add(layers.Dropout(0.2))  # Regularization
    
    # Output layer
    model.add(layers.Dense(1, activation='sigmoid'))
    
    return model

# Create and compile the model
model = build_tensorflow_model(input_shape=(2,), hidden_layers=[64, 32, 16])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Display model summary
print("🏗️ Model Architecture:")
model.summary()

# Train the model
print("\n🚀 Training the model...")
history = model.fit(
    X_train_scaled, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

In [None]:
# Visualize training history
def plot_training_history(history):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Plot training & validation accuracy
    ax1.plot(history.history['accuracy'], label='Training Accuracy')
    ax1.plot(history.history['val_accuracy'], label='Validation Accuracy')
    ax1.set_title('Model Accuracy')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Accuracy')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Plot training & validation loss
    ax2.plot(history.history['loss'], label='Training Loss')
    ax2.plot(history.history['val_loss'], label='Validation Loss')
    ax2.set_title('Model Loss')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Loss')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

plot_training_history(history)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"\n📊 Test Results:")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print(f"Test Loss: {test_loss:.4f}")

In [None]:
# Visualize decision boundary
def plot_decision_boundary(model, X, y, title="Decision Boundary"):
    # Create mesh grid
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                          np.arange(y_min, y_max, 0.1))
    
    # Make predictions
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot decision boundary
    plt.figure(figsize=(10, 8))
    plt.contourf(xx, yy, Z, alpha=0.4, cmap='viridis')
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', alpha=0.8, edgecolors='black')
    plt.title(title, fontsize=14, fontweight='bold')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.colorbar(label='Class Probability')
    plt.grid(True, alpha=0.3)
    plt.show()

plot_decision_boundary(model, X_test_scaled, y_test, "Neural Network Decision Boundary")

## 🔥 Building the Same Model with PyTorch

Now let's implement the same neural network using PyTorch to understand the differences between the frameworks.

In [None]:
# PyTorch Neural Network Implementation
class PyTorchNN(nn.Module):
    def __init__(self, input_size, hidden_sizes=[64, 32, 16]):
        super(PyTorchNN, self).__init__()
        
        # Build layers dynamically
        layers = []
        prev_size = input_size
        
        for hidden_size in hidden_sizes:
            layers.append(nn.Linear(prev_size, hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(0.2))
            prev_size = hidden_size
        
        # Output layer
        layers.append(nn.Linear(prev_size, 1))
        layers.append(nn.Sigmoid())
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.network(x)

# Convert data to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train.reshape(-1, 1))
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test.reshape(-1, 1))

# Create data loaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Initialize model, loss function, and optimizer
pytorch_model = PyTorchNN(input_size=2)
criterion = nn.BCELoss()
optimizer = optim.Adam(pytorch_model.parameters(), lr=0.001)

print("🔥 PyTorch Model Architecture:")
print(pytorch_model)

# Training loop
epochs = 100
train_losses = []
train_accuracies = []

print("\n🚀 Training PyTorch model...")
for epoch in range(epochs):
    pytorch_model.train()
    epoch_loss = 0
    correct = 0
    total = 0
    
    for batch_X, batch_y in train_loader:
        # Forward pass
        outputs = pytorch_model(batch_X)
        loss = criterion(outputs, batch_y)
        
        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        predicted = (outputs > 0.5).float()
        total += batch_y.size(0)
        correct += (predicted == batch_y).sum().item()
    
    avg_loss = epoch_loss / len(train_loader)
    accuracy = correct / total
    train_losses.append(avg_loss)
    train_accuracies.append(accuracy)
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}, Accuracy: {accuracy:.4f}")

In [None]:
# Evaluate PyTorch model
pytorch_model.eval()
with torch.no_grad():
    test_outputs = pytorch_model(X_test_tensor)
    test_predictions = (test_outputs > 0.5).float()
    test_accuracy = (test_predictions == y_test_tensor).float().mean()
    test_loss = criterion(test_outputs, y_test_tensor)

print(f"\n📊 PyTorch Test Results:")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print(f"Test Loss: {test_loss:.4f}")

# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

ax1.plot(train_losses, label='Training Loss')
ax1.set_title('PyTorch Training Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(train_accuracies, label='Training Accuracy')
ax2.set_title('PyTorch Training Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Compare TensorFlow vs PyTorch results
print("\n🏆 Framework Comparison:")
print(f"TensorFlow Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print(f"PyTorch Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print("\n📝 Key Differences:")
print("• TensorFlow: Higher-level API, easier for beginners")
print("• PyTorch: More flexible, preferred for research")
print("• TensorFlow: Better production deployment")
print("• PyTorch: More Pythonic, dynamic computation graphs")

## 🧪 Interactive Exercise: Hyperparameter Tuning

Let's experiment with different hyperparameters and see how they affect model performance!

In [None]:
# Interactive hyperparameter tuning
def interactive_hyperparameter_tuning():
    # Create widgets
    learning_rate_slider = widgets.FloatLogSlider(
        value=0.001, base=10, min=-4, max=-1, step=0.1,
        description='Learning Rate:', style={'description_width': 'initial'}
    )
    
    hidden_units_slider = widgets.IntSlider(
        value=32, min=8, max=128, step=8,
        description='Hidden Units:', style={'description_width': 'initial'}
    )
    
    layers_slider = widgets.IntSlider(
        value=2, min=1, max=4, step=1,
        description='Hidden Layers:', style={'description_width': 'initial'}
    )
    
    dropout_slider = widgets.FloatSlider(
        value=0.2, min=0.0, max=0.5, step=0.05,
        description='Dropout Rate:', style={'description_width': 'initial'}
    )
    
    epochs_slider = widgets.IntSlider(
        value=50, min=10, max=200, step=10,
        description='Epochs:', style={'description_width': 'initial'}
    )
    
    def train_model_with_params(learning_rate, hidden_units, layers, dropout, epochs):
        # Build model with specified parameters
        model = keras.Sequential()
        model.add(layers.InputLayer(input_shape=(2,)))
        
        for i in range(layers):
            model.add(layers.Dense(hidden_units, activation='relu'))
            if dropout > 0:
                model.add(layers.Dropout(dropout))
        
        model.add(layers.Dense(1, activation='sigmoid'))
        
        # Compile with custom learning rate
        optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
        model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
        
        # Train model
        history = model.fit(
            X_train_scaled, y_train,
            epochs=epochs,
            batch_size=32,
            validation_split=0.2,
            verbose=0
        )
        
        # Evaluate
        test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
        
        # Display results
        print(f"\n🎯 Results with LR={learning_rate:.4f}, Hidden={hidden_units}, Layers={layers}, Dropout={dropout:.2f}, Epochs={epochs}")
        print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
        print(f"Test Loss: {test_loss:.4f}")
        print(f"Final Training Accuracy: {history.history['accuracy'][-1]:.4f}")
        print(f"Final Validation Accuracy: {history.history['val_accuracy'][-1]:.4f}")
        
        # Plot training history
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        
        ax1.plot(history.history['accuracy'], label='Training')
        ax1.plot(history.history['val_accuracy'], label='Validation')
        ax1.set_title('Model Accuracy')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Accuracy')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        ax2.plot(history.history['loss'], label='Training')
        ax2.plot(history.history['val_loss'], label='Validation')
        ax2.set_title('Model Loss')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Loss')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Check for overfitting
        train_acc = history.history['accuracy'][-1]
        val_acc = history.history['val_accuracy'][-1]
        overfitting_gap = train_acc - val_acc
        
        if overfitting_gap > 0.1:
            print(f"⚠️  Potential overfitting detected! Gap: {overfitting_gap:.4f}")
        elif overfitting_gap < 0:
            print(f"✅ Model is well-fit. Gap: {overfitting_gap:.4f}")
        else:
            print(f"📊 Model performance is good. Gap: {overfitting_gap:.4f}")
    
    # Create interactive widget
    interactive_plot = widgets.interactive(
        train_model_with_params,
        learning_rate=learning_rate_slider,
        hidden_units=hidden_units_slider,
        layers=layers_slider,
        dropout=dropout_slider,
        epochs=epochs_slider
    )
    
    display(interactive_plot)

interactive_hyperparameter_tuning()

## 🎯 Hands-on Challenge

Now it's your turn to apply what you've learned! Complete the following challenges:

In [None]:
# Challenge 1: Multi-class classification
def challenge_1():
    print("🎯 Challenge 1: Multi-class Classification")
    print("="*50)
    
    # Generate multi-class dataset
    X_multi, y_multi = make_classification(
        n_samples=2000, n_features=4, n_classes=4, n_informative=4,
        n_redundant=0, n_clusters_per_class=1, random_state=42
    )
    
    # Split and scale data
    X_train, X_test, y_train, y_test = train_test_split(
        X_multi, y_multi, test_size=0.2, random_state=42, stratify=y_multi
    )
    
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # TODO: Complete the following tasks
    
    # Task 1: Build a multi-class neural network
    model_multi = keras.Sequential([
        layers.InputLayer(input_shape=(4,)),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(32, activation='relu'),
        layers.Dense(4, activation='softmax')  # 4 classes with softmax
    ])
    
    model_multi.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',  # For integer labels
        metrics=['accuracy']
    )
    
    print("\n✅ Task 1: Multi-class model built successfully!")
    print(f"Model architecture: {model_multi.summary()}")
    
    # Task 2: Train the model
    history = model_multi.fit(
        X_train_scaled, y_train,
        epochs=100,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    print("\n✅ Task 2: Model training complete!")
    
    # Task 3: Evaluate the model
    test_loss, test_accuracy = model_multi.evaluate(X_test_scaled, y_test, verbose=0)
    print(f"\n📊 Test Results:")
    print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
    print(f"Test Loss: {test_loss:.4f}")
    
    # Task 4: Make predictions and analyze results
    y_pred = model_multi.predict(X_test_scaled)
    y_pred_classes = np.argmax(y_pred, axis=1)
    
    # Confusion matrix
    cm = confusion_matrix(y_test, y_pred_classes)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title('Confusion Matrix - Multi-class Classification')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.show()
    
    # Classification report
    print("\n📋 Classification Report:")
    print(classification_report(y_test, y_pred_classes))
    
    return test_accuracy

challenge_1()

In [None]:
# Challenge 2: Regression with Neural Networks
def challenge_2():
    print("📈 Challenge 2: Neural Network Regression")
    print("="*50)
    
    # Generate regression dataset
    from sklearn.datasets import make_regression
    X_reg, y_reg = make_regression(
        n_samples=2000, n_features=6, n_targets=1, noise=0.1, random_state=42
    )
    
    # Split and scale data
    X_train, X_test, y_train, y_test = train_test_split(
        X_reg, y_reg, test_size=0.2, random_state=42
    )
    
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Scale target variable
    y_scaler = StandardScaler()
    y_train_scaled = y_scaler.fit_transform(y_train.reshape(-1, 1)).flatten()
    y_test_scaled = y_scaler.transform(y_test.reshape(-1, 1)).flatten()
    
    # TODO: Complete the following tasks
    
    # Task 1: Build regression neural network
    model_reg = keras.Sequential([
        layers.InputLayer(input_shape=(6,)),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(32, activation='relu'),
        layers.Dense(1, activation='linear')  # Linear activation for regression
    ])
    
    model_reg.compile(
        optimizer='adam',
        loss='mse',  # Mean squared error
        metrics=['mae']  # Mean absolute error
    )
    
    print("\n✅ Task 1: Regression model built successfully!")
    
    # Task 2: Train the model
    history = model_reg.fit(
        X_train_scaled, y_train_scaled,
        epochs=100,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    print("\n✅ Task 2: Model training complete!")
    
    # Task 3: Evaluate the model
    test_loss, test_mae = model_reg.evaluate(X_test_scaled, y_test_scaled, verbose=0)
    print(f"\n📊 Test Results (scaled):")
    print(f"Test MSE: {test_loss:.4f}")
    print(f"Test MAE: {test_mae:.4f}")
    
    # Make predictions and inverse transform
    y_pred_scaled = model_reg.predict(X_test_scaled)
    y_pred = y_scaler.inverse_transform(y_pred_scaled)
    
    # Calculate metrics on original scale
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    mse = mean_squared_error(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    print(f"\n📊 Test Results (original scale):")
    print(f"MSE: {mse:.4f}")
    print(f"MAE: {mae:.4f}")
    print(f"R² Score: {r2:.4f}")
    
    # Plot predictions vs actual
    plt.figure(figsize=(10, 6))
    plt.scatter(y_test, y_pred, alpha=0.6)
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
    plt.xlabel('Actual Values')
    plt.ylabel('Predicted Values')
    plt.title('Predictions vs Actual Values')
    plt.grid(True, alpha=0.3)
    plt.show()
    
    # Plot training history
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    
    ax1.plot(history.history['loss'], label='Training')
    ax1.plot(history.history['val_loss'], label='Validation')
    ax1.set_title('Model Loss (MSE)')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    ax2.plot(history.history['mae'], label='Training')
    ax2.plot(history.history['val_mae'], label='Validation')
    ax2.set_title('Model MAE')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('MAE')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return r2

challenge_2()

In [None]:
# Challenge 3: Model Optimization Experiment
def challenge_3():
    print("🔧 Challenge 3: Model Optimization Experiment")
    print("="*50)
    
    # Use the circles dataset from earlier
    X_circles, y_circles = make_circles(n_samples=1000, noise=0.1, factor=0.5, random_state=42)
    X_train, X_test, y_train, y_test = train_test_split(X_circles, y_circles, test_size=0.2, random_state=42, stratify=y_circles)
    
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Different architectures to test
    architectures = [
        {'name': 'Shallow', 'layers': [32, 16], 'dropout': 0.1},
        {'name': 'Medium', 'layers': [64, 32, 16], 'dropout': 0.2},
        {'name': 'Deep', 'layers': [128, 64, 32, 16], 'dropout': 0.3},
        {'name': 'Wide', 'layers': [256, 128], 'dropout': 0.2},
    ]
    
    results = []
    
    for arch in architectures:
        print(f"\n🏗️ Training {arch['name']} architecture...")
        
        # Build model
        model = keras.Sequential()
        model.add(layers.InputLayer(input_shape=(2,)))
        
        for units in arch['layers']:
            model.add(layers.Dense(units, activation='relu'))
            model.add(layers.Dropout(arch['dropout']))
        
        model.add(layers.Dense(1, activation='sigmoid'))
        
        model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
        
        # Train model
        history = model.fit(
            X_train_scaled, y_train,
            epochs=100,
            batch_size=32,
            validation_split=0.2,
            verbose=0
        )
        
        # Evaluate
        test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
        
        # Store results
        results.append({
            'architecture': arch['name'],
            'layers': arch['layers'],
            'dropout': arch['dropout'],
            'test_accuracy': test_accuracy,
            'test_loss': test_loss,
            'final_train_accuracy': history.history['accuracy'][-1],
            'final_val_accuracy': history.history['val_accuracy'][-1],
            'overfitting_gap': history.history['accuracy'][-1] - history.history['val_accuracy'][-1]
        })
    
    # Display results
    df_results = pd.DataFrame(results)
    print("\n📊 Architecture Comparison Results:")
    display(df_results[['architecture', 'test_accuracy', 'test_loss', 'overfitting_gap']])
    
    # Find best architecture
    best_idx = df_results['test_accuracy'].idxmax()
    best_arch = df_results.loc[best_idx]
    print(f"\n🏆 Best Architecture: {best_arch['architecture']}")
    print(f"Test Accuracy: {best_arch['test_accuracy']:.4f}")
    print(f"Overfitting Gap: {best_arch['overfitting_gap']:.4f}")
    
    # Visualize results
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
    
    # Test accuracy comparison
    bars1 = ax1.bar(df_results['architecture'], df_results['test_accuracy'], color='lightblue')
    ax1.set_title('Test Accuracy by Architecture')
    ax1.set_ylabel('Accuracy')
    ax1.set_ylim(0, 1)
    for bar, acc in zip(bars1, df_results['test_accuracy']):
        ax1.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.01,
                f'{acc:.3f}', ha='center', va='bottom')
    
    # Overfitting gap comparison
    bars2 = ax2.bar(df_results['architecture'], df_results['overfitting_gap'], color='lightcoral')
    ax2.set_title('Overfitting Gap by Architecture')
    ax2.set_ylabel('Train - Val Accuracy Gap')
    ax2.axhline(y=0.1, color='red', linestyle='--', label='Overfitting Threshold')
    ax2.legend()
    
    # Test loss comparison
    bars3 = ax3.bar(df_results['architecture'], df_results['test_loss'], color='lightgreen')
    ax3.set_title('Test Loss by Architecture')
    ax3.set_ylabel('Loss')
    
    # Number of parameters estimation
    param_counts = []
    for arch in architectures:
        # Estimate parameters: input * units + units + units * next_units + ...
        total_params = 2 * arch['layers'][0] + arch['layers'][0]  # Input to first hidden
        for i in range(len(arch['layers']) - 1):
            total_params += arch['layers'][i] * arch['layers'][i + 1] + arch['layers'][i + 1]
        total_params += arch['layers'][-1] + 1  # Last hidden to output
        param_counts.append(total_params)
    
    ax4.scatter(param_counts, df_results['test_accuracy'], s=100, alpha=0.7)
    for i, arch in enumerate(architectures):
        ax4.annotate(arch['name'], (param_counts[i], df_results['test_accuracy'][i]),
                     xytext=(5, 5), textcoords='offset points')
    ax4.set_xlabel('Estimated Parameters')
    ax4.set_ylabel('Test Accuracy')
    ax4.set_title('Accuracy vs Model Complexity')
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return df_results

challenge_3()

## 🎓 Summary and Key Takeaways

### What You've Learned:

1. **Deep Learning Fundamentals**:
   - Neural network architectures (perceptrons, MLPs, CNNs, RNNs)
   - Activation functions and their properties
   - Forward propagation and backpropagation

2. **Framework Proficiency**:
   - TensorFlow/Keras implementation
   - PyTorch implementation
   - Understanding the differences and use cases

3. **Model Building**:
   - Binary classification with neural networks
   - Multi-class classification
   - Regression with neural networks
   - Proper model architecture design

4. **Training and Optimization**:
   - Loss functions and optimizers
   - Hyperparameter tuning
   - Overfitting prevention (dropout, regularization)
   - Model evaluation and validation

5. **Practical Skills**:
   - Data preprocessing for neural networks
   - Training history visualization
   - Decision boundary visualization
   - Model comparison and selection

### 🚀 Next Steps:

- **Continue to next notebook**: "Advanced Deep Learning Techniques" for CNNs, RNNs, and transformers
- **Practice with real datasets**: MNIST, CIFAR-10, ImageNet
- **Learn about transfer learning**: Using pre-trained models
- **Explore advanced topics**: Attention mechanisms, generative models

### 📚 Additional Resources:

- [Deep Learning by Ian Goodfellow](https://www.deeplearningbook.org/)
- [TensorFlow Tutorials](https://www.tensorflow.org/tutorials)
- [PyTorch Tutorials](https://pytorch.org/tutorials/)
- [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)

**Congratulations on completing your first Deep Learning notebook! 🎉**
You've taken a significant step into the exciting world of neural networks and deep learning!