# Week 8: Introduction to Neural Networks and Deep Learning

## Learning Objectives:
- Understand neural network fundamentals
- Understand how to use TensorFlow/Keras
- Understand pros/cons of standard Neural Networks, CNN, RNN, LSTM, GNN

## Topics Covered:
- Perceptron and multi-layer perceptrons (lightly)
- Activation functions (lightly)
- Backpropagation algorithm (lightly)
- Gradient descent optimization (lightly)
- Introduction to TensorFlow/Keras
- Convolutional Neural Networks (CNN) basics
- Recurrent Neural Networks (RNN) basics
- Long-Short Term Memory Neural Networks (LSTM) basics
- Graph Neural Network (GNN) basics

In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import make_classification, make_regression
import warnings
warnings.filterwarnings('ignore')

# Deep Learning libraries
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers, models, optimizers
    from tensorflow.keras.datasets import mnist, cifar10
    from tensorflow.keras.utils import to_categorical
    print(f"TensorFlow version: {tf.__version__}")
    print("TensorFlow/Keras available")
except ImportError:
    print("TensorFlow not available - install with: pip install tensorflow")
    tf = None
    keras = None

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

# Set random seeds for reproducibility
np.random.seed(42)
if tf is not None:
    tf.random.set_seed(42)

print("Libraries imported successfully!")

## 1. Neural Network Fundamentals

Neural networks are inspired by the human brain and consist of interconnected nodes (neurons) that process information.

### Key Components:
- **Neurons**: Basic processing units
- **Weights**: Connection strengths between neurons
- **Biases**: Threshold adjustments
- **Activation Functions**: Non-linear transformations
- **Layers**: Groups of neurons

### Types of Neural Networks:
1. **Perceptron**: Single neuron, linear classifier
2. **Multi-layer Perceptron (MLP)**: Fully connected layers
3. **Convolutional Neural Networks (CNN)**: For image processing
4. **Recurrent Neural Networks (RNN)**: For sequential data
5. **Long Short-Term Memory (LSTM)**: Advanced RNN
6. **Graph Neural Networks (GNN)**: For graph-structured data

In [None]:
# Visualize a simple neural network
def plot_neural_network():
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    
    # Define network structure
    layers = [3, 4, 4, 2]  # Input, Hidden1, Hidden2, Output
    layer_names = ['Input\nLayer', 'Hidden\nLayer 1', 'Hidden\nLayer 2', 'Output\nLayer']
    
    # Calculate positions
    max_neurons = max(layers)
    layer_positions = np.arange(len(layers))
    
    # Plot neurons
    for i, (layer_size, layer_name) in enumerate(zip(layers, layer_names)):
        neuron_positions = np.linspace(0, max_neurons-1, layer_size)
        neuron_positions = neuron_positions - np.mean(neuron_positions) + (max_neurons-1)/2
        
        for j, pos in enumerate(neuron_positions):
            circle = plt.Circle((layer_positions[i], pos), 0.15, 
                              color='lightblue', ec='black', linewidth=1.5)
            ax.add_patch(circle)
            
            # Add connections to next layer
            if i < len(layers) - 1:
                next_layer_positions = np.linspace(0, max_neurons-1, layers[i+1])
                next_layer_positions = next_layer_positions - np.mean(next_layer_positions) + (max_neurons-1)/2
                
                for next_pos in next_layer_positions:
                    ax.plot([layer_positions[i] + 0.15, layer_positions[i+1] - 0.15], 
                           [pos, next_pos], 'k-', alpha=0.3, linewidth=0.8)
        
        # Add layer labels
        ax.text(layer_positions[i], -0.8, layer_name, 
                ha='center', va='top', fontsize=12, fontweight='bold')
    
    ax.set_xlim(-0.5, len(layers)-0.5)
    ax.set_ylim(-1.2, max_neurons-0.5)
    ax.set_aspect('equal')
    ax.axis('off')
    ax.set_title('Multi-Layer Perceptron (MLP) Architecture', fontsize=16, fontweight='bold')
    
    plt.tight_layout()
    plt.show()

plot_neural_network()

## 2. Activation Functions

Activation functions introduce non-linearity into neural networks, allowing them to learn complex patterns.

### Common Activation Functions:
- **Sigmoid**: σ(x) = 1/(1+e^(-x)) - Outputs between 0 and 1
- **Tanh**: tanh(x) = (e^x - e^(-x))/(e^x + e^(-x)) - Outputs between -1 and 1
- **ReLU**: f(x) = max(0, x) - Most popular, simple and effective
- **Leaky ReLU**: f(x) = max(0.01x, x) - Addresses dying ReLU problem
- **Softmax**: For multi-class classification output

In [None]:
# Visualize activation functions
def plot_activation_functions():
    x = np.linspace(-5, 5, 1000)
    
    # Define activation functions
    sigmoid = 1 / (1 + np.exp(-x))
    tanh = np.tanh(x)
    relu = np.maximum(0, x)
    leaky_relu = np.maximum(0.01 * x, x)
    
    # Plot
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # Sigmoid
    axes[0, 0].plot(x, sigmoid, 'b-', linewidth=2, label='Sigmoid')
    axes[0, 0].set_title('Sigmoid Activation')
    axes[0, 0].set_xlabel('Input')
    axes[0, 0].set_ylabel('Output')
    axes[0, 0].grid(True, alpha=0.3)
    axes[0, 0].axhline(y=0, color='k', linestyle='-', alpha=0.3)
    axes[0, 0].axvline(x=0, color='k', linestyle='-', alpha=0.3)
    
    # Tanh
    axes[0, 1].plot(x, tanh, 'r-', linewidth=2, label='Tanh')
    axes[0, 1].set_title('Tanh Activation')
    axes[0, 1].set_xlabel('Input')
    axes[0, 1].set_ylabel('Output')
    axes[0, 1].grid(True, alpha=0.3)
    axes[0, 1].axhline(y=0, color='k', linestyle='-', alpha=0.3)
    axes[0, 1].axvline(x=0, color='k', linestyle='-', alpha=0.3)
    
    # ReLU
    axes[1, 0].plot(x, relu, 'g-', linewidth=2, label='ReLU')
    axes[1, 0].set_title('ReLU Activation')
    axes[1, 0].set_xlabel('Input')
    axes[1, 0].set_ylabel('Output')
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].axhline(y=0, color='k', linestyle='-', alpha=0.3)
    axes[1, 0].axvline(x=0, color='k', linestyle='-', alpha=0.3)
    
    # Leaky ReLU
    axes[1, 1].plot(x, leaky_relu, 'purple', linewidth=2, label='Leaky ReLU')
    axes[1, 1].set_title('Leaky ReLU Activation')
    axes[1, 1].set_xlabel('Input')
    axes[1, 1].set_ylabel('Output')
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].axhline(y=0, color='k', linestyle='-', alpha=0.3)
    axes[1, 1].axvline(x=0, color='k', linestyle='-', alpha=0.3)
    
    plt.tight_layout()
    plt.show()

plot_activation_functions()

# Print characteristics
print("Activation Function Characteristics:")
print("- Sigmoid: Smooth, bounded [0,1], vanishing gradient problem")
print("- Tanh: Smooth, bounded [-1,1], zero-centered, vanishing gradient problem")
print("- ReLU: Simple, unbounded, efficient, dying ReLU problem")
print("- Leaky ReLU: Addresses dying ReLU, allows small negative values")

## 3. Building Neural Networks with TensorFlow/Keras

TensorFlow is a popular deep learning framework, and Keras is its high-level API that makes building neural networks intuitive.

### Key Components:
- **Sequential Model**: Linear stack of layers
- **Functional API**: More flexible model building
- **Layers**: Dense, Conv2D, LSTM, etc.
- **Optimizers**: Adam, SGD, RMSprop
- **Loss Functions**: MSE, categorical_crossentropy, binary_crossentropy
- **Metrics**: Accuracy, precision, recall

In [None]:
if tf is not None:
    # Create a simple dataset for demonstration
    print("=== CREATING NEURAL NETWORK WITH TENSORFLOW/KERAS ===")
    
    # Generate synthetic classification data
    X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, 
                              n_redundant=5, n_classes=3, random_state=42)
    
    # Split and scale data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Convert labels to categorical
    y_train_cat = to_categorical(y_train, num_classes=3)
    y_test_cat = to_categorical(y_test, num_classes=3)
    
    print(f"Training data shape: {X_train_scaled.shape}")
    print(f"Training labels shape: {y_train_cat.shape}")
    print(f"Number of classes: {len(np.unique(y))}")
    
    # Build a simple neural network
    model = models.Sequential([
        layers.Dense(64, activation='relu', input_shape=(20,)),
        layers.Dropout(0.3),  # Regularization
        layers.Dense(32, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(3, activation='softmax')  # Output layer
    ])
    
    # Compile the model
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    
    # Display model architecture
    print("\nModel Architecture:")
    model.summary()
    
else:
    print("TensorFlow not available - skipping neural network examples")

In [None]:
if tf is not None:
    # Train the model
    print("=== TRAINING THE NEURAL NETWORK ===")
    
    # Train with validation split
    history = model.fit(X_train_scaled, y_train_cat,
                       epochs=50,
                       batch_size=32,
                       validation_split=0.2,
                       verbose=1)
    
    # Evaluate on test set
    test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test_cat, verbose=0)
    print(f"\nTest Accuracy: {test_accuracy:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    
    # Make predictions
    y_pred_proba = model.predict(X_test_scaled)
    y_pred = np.argmax(y_pred_proba, axis=1)
    
    # Classification report
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred))
    
else:
    print("TensorFlow not available - skipping training")

In [None]:
if tf is not None:
    # Plot training history
    def plot_training_history(history):
        fig, axes = plt.subplots(1, 2, figsize=(15, 5))
        
        # Plot training & validation accuracy
        axes[0].plot(history.history['accuracy'], label='Training Accuracy')
        axes[0].plot(history.history['val_accuracy'], label='Validation Accuracy')
        axes[0].set_title('Model Accuracy')
        axes[0].set_xlabel('Epoch')
        axes[0].set_ylabel('Accuracy')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Plot training & validation loss
        axes[1].plot(history.history['loss'], label='Training Loss')
        axes[1].plot(history.history['val_loss'], label='Validation Loss')
        axes[1].set_title('Model Loss')
        axes[1].set_xlabel('Epoch')
        axes[1].set_ylabel('Loss')
        axes[1].legend()
        axes[1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    plot_training_history(history)
    
    # Plot confusion matrix
    cm = confusion_matrix(y_test, y_pred)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=['Class 0', 'Class 1', 'Class 2'],
                yticklabels=['Class 0', 'Class 1', 'Class 2'])
    plt.title('Confusion Matrix')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.show()
    
else:
    print("TensorFlow not available - skipping visualization")

## 4. Convolutional Neural Networks (CNNs)

CNNs are specialized neural networks for processing grid-like data such as images.

### Key Components:
- **Convolutional Layers**: Apply filters to detect features
- **Pooling Layers**: Reduce spatial dimensions
- **Filters/Kernels**: Small matrices that detect patterns
- **Feature Maps**: Outputs of convolutional operations
- **Padding**: Control output size
- **Stride**: Step size of convolution

### Advantages:
- Translation invariance
- Parameter sharing
- Hierarchical feature learning
- Excellent for image recognition

In [None]:
if tf is not None:
    # Load and preprocess MNIST dataset
    print("=== CONVOLUTIONAL NEURAL NETWORK (CNN) ===")
    
    # Load MNIST data
    (X_train_img, y_train_img), (X_test_img, y_test_img) = mnist.load_data()
    
    # Normalize pixel values
    X_train_img = X_train_img.astype('float32') / 255.0
    X_test_img = X_test_img.astype('float32') / 255.0
    
    # Reshape for CNN (add channel dimension)
    X_train_img = X_train_img.reshape(-1, 28, 28, 1)
    X_test_img = X_test_img.reshape(-1, 28, 28, 1)
    
    # Convert labels to categorical
    y_train_img = to_categorical(y_train_img, 10)
    y_test_img = to_categorical(y_test_img, 10)
    
    print(f"Training images shape: {X_train_img.shape}")
    print(f"Training labels shape: {y_train_img.shape}")
    
    # Visualize some samples
    fig, axes = plt.subplots(2, 5, figsize=(12, 6))
    for i in range(10):
        row = i // 5
        col = i % 5
        axes[row, col].imshow(X_train_img[i].reshape(28, 28), cmap='gray')
        axes[row, col].set_title(f'Label: {np.argmax(y_train_img[i])}')
        axes[row, col].axis('off')
    plt.suptitle('Sample MNIST Images')
    plt.tight_layout()
    plt.show()
    
else:
    print("TensorFlow not available - skipping CNN example")

In [None]:
if tf is not None:
    # Build CNN model
    cnn_model = models.Sequential([
        # First convolutional block
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        layers.MaxPooling2D((2, 2)),
        
        # Second convolutional block
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        
        # Third convolutional block
        layers.Conv2D(64, (3, 3), activation='relu'),
        
        # Flatten and dense layers
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ])
    
    # Compile the CNN
    cnn_model.compile(optimizer='adam',
                      loss='categorical_crossentropy',
                      metrics=['accuracy'])
    
    # Display model architecture
    print("\nCNN Model Architecture:")
    cnn_model.summary()
    
    # Train the CNN (using a subset for speed)
    print("\nTraining CNN...")
    cnn_history = cnn_model.fit(X_train_img[:5000], y_train_img[:5000],
                                epochs=10,
                                batch_size=128,
                                validation_split=0.2,
                                verbose=1)
    
    # Evaluate CNN
    cnn_test_loss, cnn_test_accuracy = cnn_model.evaluate(X_test_img[:1000], y_test_img[:1000], verbose=0)
    print(f"\nCNN Test Accuracy: {cnn_test_accuracy:.4f}")
    
else:
    print("TensorFlow not available - skipping CNN training")

## 5. Recurrent Neural Networks (RNNs)

RNNs are designed for sequential data where the order of inputs matters.

### Key Concepts:
- **Memory**: Hidden state carries information from previous time steps
- **Sequential Processing**: Process one element at a time
- **Shared Parameters**: Same weights used at each time step
- **Variable Length**: Can handle sequences of different lengths

### Applications:
- Natural Language Processing
- Time series prediction
- Speech recognition
- Machine translation

### Limitations:
- Vanishing gradient problem
- Difficulty capturing long-term dependencies
- Sequential processing (not parallelizable)

In [None]:
if tf is not None:
    # Generate synthetic time series data
    print("=== RECURRENT NEURAL NETWORK (RNN) ===")
    
    # Create sine wave time series
    def generate_time_series(n_samples=1000, sequence_length=50):
        X = []
        y = []
        
        for i in range(n_samples):
            # Generate sine wave with some noise
            start = np.random.uniform(0, 4*np.pi)
            t = np.linspace(start, start + 2*np.pi, sequence_length + 1)
            series = np.sin(t) + 0.1 * np.random.normal(0, 1, sequence_length + 1)
            
            # Use first 'sequence_length' points as input, next point as target
            X.append(series[:-1])
            y.append(series[-1])
        
        return np.array(X), np.array(y)
    
    # Generate time series data
    X_ts, y_ts = generate_time_series(n_samples=1000, sequence_length=20)
    
    # Reshape for RNN (samples, time_steps, features)
    X_ts = X_ts.reshape(X_ts.shape[0], X_ts.shape[1], 1)
    
    # Split data
    X_train_ts, X_test_ts, y_train_ts, y_test_ts = train_test_split(X_ts, y_ts, test_size=0.2, random_state=42)
    
    print(f"Time series data shape: {X_train_ts.shape}")
    print(f"Time series labels shape: {y_train_ts.shape}")
    
    # Visualize sample time series
    plt.figure(figsize=(12, 6))
    for i in range(3):
        plt.subplot(1, 3, i+1)
        plt.plot(X_ts[i].flatten(), 'b-', label='Input sequence')
        plt.axhline(y=y_ts[i], color='r', linestyle='--', label='Target')
        plt.title(f'Time Series {i+1}')
        plt.xlabel('Time Step')
        plt.ylabel('Value')
        plt.legend()
        plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
else:
    print("TensorFlow not available - skipping RNN example")

In [None]:
if tf is not None:
    # Build simple RNN model
    rnn_model = models.Sequential([
        layers.SimpleRNN(50, activation='relu', input_shape=(20, 1)),
        layers.Dense(25, activation='relu'),
        layers.Dense(1)
    ])
    
    # Compile RNN
    rnn_model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    
    print("\nRNN Model Architecture:")
    rnn_model.summary()
    
    # Train RNN
    print("\nTraining RNN...")
    rnn_history = rnn_model.fit(X_train_ts, y_train_ts,
                               epochs=20,
                               batch_size=32,
                               validation_split=0.2,
                               verbose=1)
    
    # Evaluate RNN
    rnn_test_loss, rnn_test_mae = rnn_model.evaluate(X_test_ts, y_test_ts, verbose=0)
    print(f"\nRNN Test Loss: {rnn_test_loss:.4f}")
    print(f"RNN Test MAE: {rnn_test_mae:.4f}")
    
else:
    print("TensorFlow not available - skipping RNN training")

## 6. Long Short-Term Memory (LSTM) Networks

LSTMs are a special type of RNN designed to overcome the vanishing gradient problem and capture long-term dependencies.

### Key Components:
- **Cell State**: Long-term memory
- **Hidden State**: Short-term memory
- **Gates**: Control information flow
  - **Forget Gate**: Decides what to forget from cell state
  - **Input Gate**: Decides what new information to store
  - **Output Gate**: Decides what to output

### Advantages over RNNs:
- Better at capturing long-term dependencies
- Reduced vanishing gradient problem
- More stable training
- Better performance on complex sequences

In [None]:
if tf is not None:
    # Build LSTM model
    print("=== LONG SHORT-TERM MEMORY (LSTM) ===")
    
    lstm_model = models.Sequential([
        layers.LSTM(50, activation='relu', input_shape=(20, 1)),
        layers.Dense(25, activation='relu'),
        layers.Dense(1)
    ])
    
    # Compile LSTM
    lstm_model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    
    print("\nLSTM Model Architecture:")
    lstm_model.summary()
    
    # Train LSTM
    print("\nTraining LSTM...")
    lstm_history = lstm_model.fit(X_train_ts, y_train_ts,
                                 epochs=20,
                                 batch_size=32,
                                 validation_split=0.2,
                                 verbose=1)
    
    # Evaluate LSTM
    lstm_test_loss, lstm_test_mae = lstm_model.evaluate(X_test_ts, y_test_ts, verbose=0)
    print(f"\nLSTM Test Loss: {lstm_test_loss:.4f}")
    print(f"LSTM Test MAE: {lstm_test_mae:.4f}")
    
    # Compare RNN vs LSTM
    print(f"\nComparison:")
    print(f"RNN Test MAE: {rnn_test_mae:.4f}")
    print(f"LSTM Test MAE: {lstm_test_mae:.4f}")
    print(f"LSTM improvement: {((rnn_test_mae - lstm_test_mae) / rnn_test_mae * 100):.1f}%")
    
else:
    print("TensorFlow not available - skipping LSTM example")

## 7. Graph Neural Networks (GNNs) - Basics

GNNs are designed to work with graph-structured data where relationships between entities are as important as the entities themselves.

### Key Concepts:
- **Nodes**: Entities in the graph
- **Edges**: Relationships between entities
- **Node Features**: Properties of entities
- **Edge Features**: Properties of relationships
- **Message Passing**: Nodes exchange information with neighbors
- **Aggregation**: Combining information from neighbors

### Applications:
- Social network analysis
- Molecular property prediction
- Recommendation systems
- Knowledge graphs
- Traffic prediction

### Advantages:
- Naturally handles non-Euclidean data
- Captures relational information
- Flexible for various graph sizes
- Good for semi-supervised learning

In [None]:
# Simple GNN concept demonstration (without specialized libraries)
print("=== GRAPH NEURAL NETWORK (GNN) CONCEPTS ===")

# Create a simple graph representation
# Node features: [node_id, feature1, feature2]
# Edge list: [(source, target), ...]

# Example: Simple social network
nodes = {
    0: {'features': [1.0, 0.5], 'label': 'Person A'},
    1: {'features': [0.8, 0.3], 'label': 'Person B'},
    2: {'features': [0.2, 0.9], 'label': 'Person C'},
    3: {'features': [0.6, 0.7], 'label': 'Person D'},
    4: {'features': [0.4, 0.1], 'label': 'Person E'}
}

edges = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4)]

# Create adjacency matrix
n_nodes = len(nodes)
adj_matrix = np.zeros((n_nodes, n_nodes))
for source, target in edges:
    adj_matrix[source, target] = 1
    adj_matrix[target, source] = 1  # Undirected graph

# Extract node features
node_features = np.array([nodes[i]['features'] for i in range(n_nodes)])

print(f"Number of nodes: {n_nodes}")
print(f"Number of edges: {len(edges)}")
print(f"Node features shape: {node_features.shape}")
print(f"Adjacency matrix shape: {adj_matrix.shape}")

# Visualize the graph
plt.figure(figsize=(10, 8))

# Plot nodes
pos = {0: (0, 1), 1: (1, 2), 2: (2, 1), 3: (1, 0), 4: (0, -1)}
for node_id, (x, y) in pos.items():
    plt.scatter(x, y, s=300, c='lightblue', edgecolors='black', linewidth=2)
    plt.text(x, y, nodes[node_id]['label'], ha='center', va='center', fontweight='bold')

# Plot edges
for source, target in edges:
    x1, y1 = pos[source]
    x2, y2 = pos[target]
    plt.plot([x1, x2], [y1, y2], 'k-', alpha=0.5, linewidth=2)

plt.title('Sample Graph Structure')
plt.axis('equal')
plt.grid(True, alpha=0.3)
plt.show()

print("\nGNN Processing Steps:")
print("1. Initialize node features")
print("2. For each layer:")
print("   - Aggregate features from neighbors")
print("   - Update node features using aggregated information")
print("3. Use final node features for prediction")

# Simple message passing example
print("\nSimple Message Passing Example:")
print("Original node features:")
for i, features in enumerate(node_features):
    print(f"Node {i}: {features}")

# Aggregate neighbors (mean aggregation)
new_features = np.zeros_like(node_features)
for i in range(n_nodes):
    neighbors = np.where(adj_matrix[i] == 1)[0]
    if len(neighbors) > 0:
        # Average of neighbor features
        neighbor_features = node_features[neighbors]
        aggregated = np.mean(neighbor_features, axis=0)
        # Combine with own features
        new_features[i] = 0.5 * node_features[i] + 0.5 * aggregated
    else:
        new_features[i] = node_features[i]

print("\nAfter one message passing step:")
for i, features in enumerate(new_features):
    print(f"Node {i}: {features}")

## 8. Neural Network Comparison

Let's compare the different types of neural networks we've covered.

In [None]:
# Neural Network Comparison
print("=== NEURAL NETWORK COMPARISON ===")

comparison_data = {
    'Network Type': ['MLP', 'CNN', 'RNN', 'LSTM', 'GNN'],
    'Best For': [
        'Tabular data, general classification',
        'Images, spatial data',
        'Sequential data, time series',
        'Long sequences, NLP',
        'Graph data, relationships'
    ],
    'Key Strength': [
        'Universal approximator',
        'Translation invariance',
        'Handles sequences',
        'Long-term memory',
        'Relational reasoning'
    ],
    'Main Weakness': [
        'No structure awareness',
        'Large parameter count',
        'Vanishing gradients',
        'Computational complexity',
        'Complex implementation'
    ],
    'Typical Use Cases': [
        'Fraud detection, credit scoring',
        'Image classification, object detection',
        'Stock prediction, simple NLP',
        'Machine translation, chatbots',
        'Social networks, drug discovery'
    ]
}

comparison_df = pd.DataFrame(comparison_data)

print("\nNeural Network Comparison:")
for idx, row in comparison_df.iterrows():
    print(f"\n{row['Network Type']}:")
    print(f"  Best For: {row['Best For']}")
    print(f"  Key Strength: {row['Key Strength']}")
    print(f"  Main Weakness: {row['Main Weakness']}")
    print(f"  Typical Use Cases: {row['Typical Use Cases']}")

# Create a visual comparison
fig, ax = plt.subplots(figsize=(14, 8))

# Create a heatmap-like visualization
categories = ['Tabular Data', 'Image Data', 'Sequential Data', 'Graph Data', 'Interpretability']
networks = ['MLP', 'CNN', 'RNN', 'LSTM', 'GNN']

# Suitability scores (1-5 scale)
scores = np.array([
    [5, 3, 3, 2, 4],  # MLP
    [3, 5, 2, 1, 3],  # CNN
    [2, 2, 4, 1, 3],  # RNN
    [2, 2, 5, 1, 2],  # LSTM
    [3, 2, 3, 5, 2]   # GNN
])

im = ax.imshow(scores, cmap='YlOrRd', aspect='auto')

# Add labels
ax.set_xticks(range(len(categories)))
ax.set_yticks(range(len(networks)))
ax.set_xticklabels(categories)
ax.set_yticklabels(networks)

# Add text annotations
for i in range(len(networks)):
    for j in range(len(categories)):
        text = ax.text(j, i, scores[i, j], ha='center', va='center', 
                      color='white' if scores[i, j] > 3 else 'black', fontweight='bold')

ax.set_title('Neural Network Suitability by Data Type (1-5 Scale)', fontsize=14, fontweight='bold')
plt.colorbar(im, ax=ax, label='Suitability Score')
plt.tight_layout()
plt.show()

## 9. Summary

Congratulations! You've completed your introduction to neural networks and deep learning. Here's what you learned:

### Key Concepts Mastered:
1. **Neural Network Fundamentals**: Neurons, weights, biases, activation functions
2. **Multi-layer Perceptrons (MLPs)**: Fully connected networks for general tasks
3. **Convolutional Neural Networks (CNNs)**: Specialized for image processing
4. **Recurrent Neural Networks (RNNs)**: For sequential data processing
5. **Long Short-Term Memory (LSTM)**: Advanced RNNs for long-term dependencies
6. **Graph Neural Networks (GNNs)**: For graph-structured data
7. **TensorFlow/Keras**: Deep learning framework and high-level API

### Key Skills Acquired:
- Understanding different neural network architectures
- Building and training neural networks with TensorFlow/Keras
- Choosing appropriate architectures for different data types
- Understanding the strengths and weaknesses of each approach
- Implementing basic deep learning workflows
- Evaluating neural network performance

### When to Use Each Architecture:
- **MLP**: Tabular data, general classification/regression
- **CNN**: Images, spatial data, pattern recognition
- **RNN**: Sequential data, simple time series
- **LSTM**: Complex sequences, NLP, long-term dependencies
- **GNN**: Graph data, social networks, molecular structures

### Best Practices:
- Start with simpler architectures before moving to complex ones
- Always normalize/scale your input data
- Use appropriate activation functions for your task
- Monitor training to avoid overfitting
- Use dropout and regularization techniques
- Experiment with different optimizers and learning rates

### Real-world Applications:
- **Computer Vision**: Image classification, object detection, medical imaging
- **Natural Language Processing**: Translation, sentiment analysis, chatbots
- **Time Series**: Stock prediction, weather forecasting, sensor data
- **Recommendation Systems**: Content filtering, collaborative filtering
- **Game Playing**: Chess, Go, video games
- **Scientific Research**: Drug discovery, protein folding, climate modeling

### Next Steps:
In the next week, we'll dive deeper into modern AI applications, focusing on large language models, transformers, and their applications in natural language processing.