# Tutorial19: GNN for Network Traffic Prediction

In this tutorial, we will explore how to use Graph Neural Networks (GNNs) for network traffic prediction. This is a practical application of GNNs in the field of computer networks and telecommunications.

## Outline

1. Introduction: GNNs in Network Applications
2. Dataset Preparation: Simulating Network Traffic Data
3. Graph Construction: Modeling Network Topology as a Graph
4. Model Implementation: GCN and GAT for Traffic Prediction
5. Training and Evaluation
6. Visualization of Results

## 1. Introduction: GNNs in Network Applications

### Why GNNs for Networks?

Computer networks are naturally represented as graphs:
- **Nodes**: Routers, switches, servers, or network devices
- **Edges**: Physical or logical connections between devices
- **Node features**: Device properties, current load, capacity
- **Edge features**: Bandwidth, latency, link utilization

### Applications of GNNs in Networking

1. **Traffic Prediction**: Forecasting network load and congestion
2. **Fault Detection**: Identifying anomalies and failures
3. **Network Optimization**: Routing optimization, resource allocation
4. **Quality of Service (QoS)**: Predicting latency, packet loss
5. **Security**: Intrusion detection, anomaly detection

In this tutorial, we'll focus on **traffic prediction** - predicting the traffic load on network links based on the network topology and historical traffic patterns.

In [None]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(f"PyTorch version: {torch.__version__}")

# Install PyG if not available
!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-geometric

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv, GATConv
from torch_geometric.utils import to_networkx
import matplotlib.pyplot as plt
import networkx as nx
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

## 2. Dataset Preparation: Simulating Network Traffic Data

We'll create a synthetic network traffic dataset that simulates a realistic network scenario. The dataset includes:
- A network topology (graph structure)
- Node features (device properties)
- Traffic measurements as labels

### Network Topology

We'll simulate a small ISP network with:
- Core routers (high capacity)
- Edge routers (medium capacity)
- Access nodes (connecting end users)

In [None]:
class NetworkTopologyGenerator:
    """
    Generates a synthetic network topology with realistic properties.
    The network has a hierarchical structure: Core -> Edge -> Access
    """
    
    def __init__(self, n_core=3, n_edge=6, n_access=12):
        self.n_core = n_core
        self.n_edge = n_edge
        self.n_access = n_access
        self.n_total = n_core + n_edge + n_access
        
    def generate(self):
        """
        Generate the network topology and return edge_index.
        """
        edges = []
        
        # Core nodes are fully connected (mesh topology)
        for i in range(self.n_core):
            for j in range(i + 1, self.n_core):
                edges.append([i, j])
                edges.append([j, i])  # Bidirectional
        
        # Each edge node connects to 2 core nodes
        for i in range(self.n_edge):
            edge_node = self.n_core + i
            # Connect to 2 random core nodes
            core_connections = np.random.choice(self.n_core, 2, replace=False)
            for core in core_connections:
                edges.append([edge_node, core])
                edges.append([core, edge_node])
        
        # Each access node connects to 1-2 edge nodes
        for i in range(self.n_access):
            access_node = self.n_core + self.n_edge + i
            # Connect to 1-2 edge nodes
            n_connections = np.random.randint(1, 3)
            edge_connections = np.random.choice(
                range(self.n_core, self.n_core + self.n_edge), 
                n_connections, 
                replace=False
            )
            for edge in edge_connections:
                edges.append([access_node, edge])
                edges.append([edge, access_node])
        
        edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
        return edge_index
    
    def get_node_types(self):
        """
        Return node types: 0=core, 1=edge, 2=access
        """
        types = [0] * self.n_core + [1] * self.n_edge + [2] * self.n_access
        return torch.tensor(types, dtype=torch.long)

In [None]:
# Generate the network topology
topology_gen = NetworkTopologyGenerator(n_core=3, n_edge=6, n_access=12)
edge_index = topology_gen.generate()
node_types = topology_gen.get_node_types()

print(f"Total nodes: {topology_gen.n_total}")
print(f"  - Core nodes: {topology_gen.n_core}")
print(f"  - Edge nodes: {topology_gen.n_edge}")
print(f"  - Access nodes: {topology_gen.n_access}")
print(f"Total edges: {edge_index.shape[1]}")

### Node Features

For each node, we'll create features that represent:
1. **Node type** (one-hot encoded): Core, Edge, or Access
2. **Capacity**: Maximum bandwidth the node can handle
3. **Degree centrality**: Number of connections (normalized)
4. **Historical average load**: Past traffic patterns

In [None]:
def create_node_features(n_total, node_types, edge_index):
    """
    Create node features for the network.
    
    Features:
    - One-hot encoding of node type (3 features)
    - Capacity (1 feature)
    - Degree centrality (1 feature)
    - Historical average load (1 feature)
    """
    features = []
    
    # Calculate degree for each node
    degrees = torch.zeros(n_total)
    for i in range(edge_index.shape[1]):
        degrees[edge_index[0, i]] += 1
    degrees = degrees / degrees.max()  # Normalize
    
    for i in range(n_total):
        node_type = node_types[i].item()
        
        # One-hot encoding of node type
        type_onehot = [0, 0, 0]
        type_onehot[node_type] = 1
        
        # Capacity based on node type
        # Core: 100 Gbps, Edge: 40 Gbps, Access: 10 Gbps (normalized)
        capacity_map = {0: 1.0, 1: 0.4, 2: 0.1}
        capacity = capacity_map[node_type]
        
        # Add some noise to capacity
        capacity += np.random.normal(0, 0.05)
        capacity = max(0.05, capacity)  # Ensure positive
        
        # Historical average load (simulated)
        # Core nodes typically have higher utilization
        base_load = {0: 0.7, 1: 0.5, 2: 0.3}[node_type]
        hist_load = base_load + np.random.normal(0, 0.1)
        hist_load = np.clip(hist_load, 0, 1)
        
        feature = type_onehot + [capacity, degrees[i].item(), hist_load]
        features.append(feature)
    
    return torch.tensor(features, dtype=torch.float32)

# Create node features
node_features = create_node_features(topology_gen.n_total, node_types, edge_index)
print(f"Node features shape: {node_features.shape}")
print(f"Feature dimensions: {node_features.shape[1]} (3 type + 1 capacity + 1 degree + 1 hist_load)")

### Traffic Labels (Target Variable)

We'll simulate traffic load as our prediction target. The traffic depends on:
- Node type (core nodes handle more traffic)
- Node capacity
- Connectivity (more connections = more traffic)

In [None]:
def generate_traffic_labels(node_features, node_types, edge_index, n_samples=100):
    """
    Generate traffic labels (target variable) for multiple time steps.
    Traffic is influenced by node features and network structure.
    """
    n_nodes = node_features.shape[0]
    all_features = []
    all_labels = []
    
    for t in range(n_samples):
        # Time-varying factor (simulates daily patterns)
        time_factor = 0.5 + 0.5 * np.sin(2 * np.pi * t / 24)  # 24-hour cycle
        
        labels = []
        features_t = node_features.clone()
        
        for i in range(n_nodes):
            node_type = node_types[i].item()
            capacity = node_features[i, 3].item()
            degree = node_features[i, 4].item()
            hist_load = node_features[i, 5].item()
            
            # Traffic formula: combination of features + time factor + noise
            base_traffic = 0.3 * capacity + 0.3 * degree + 0.2 * hist_load
            traffic = base_traffic * time_factor
            traffic += np.random.normal(0, 0.05)  # Add noise
            traffic = np.clip(traffic, 0, 1)  # Normalize to [0, 1]
            
            labels.append(traffic)
            
            # Update the historical load feature for this time step
            features_t[i, 5] = traffic
        
        all_features.append(features_t)
        all_labels.append(torch.tensor(labels, dtype=torch.float32))
    
    return all_features, all_labels

# Generate traffic data for 100 time steps
all_features, all_labels = generate_traffic_labels(
    node_features, node_types, edge_index, n_samples=100
)

print(f"Generated {len(all_labels)} time steps of traffic data")
print(f"Each time step has {len(all_labels[0])} node traffic values")

## 3. Graph Construction: Creating PyG Data Objects

Now we'll create PyTorch Geometric `Data` objects from our network data.

In [None]:
def create_pyg_data(features, labels, edge_index):
    """
    Create a PyTorch Geometric Data object.
    """
    data = Data(
        x=features,
        edge_index=edge_index,
        y=labels.unsqueeze(1)  # Shape: [n_nodes, 1]
    )
    return data

# Create dataset
dataset = [create_pyg_data(f, l, edge_index) for f, l in zip(all_features, all_labels)]

# Split into train/val/test
train_data = dataset[:70]
val_data = dataset[70:85]
test_data = dataset[85:]

print(f"Training samples: {len(train_data)}")
print(f"Validation samples: {len(val_data)}")
print(f"Test samples: {len(test_data)}")
print(f"\nSample data object: {dataset[0]}")

### Visualize the Network Topology

In [None]:
def visualize_network(data, node_types, title="Network Topology"):
    """
    Visualize the network graph with node types colored differently.
    """
    G = to_networkx(data, to_undirected=True)
    
    # Color mapping for node types
    color_map = {0: '#FF6B6B', 1: '#4ECDC4', 2: '#95E1D3'}  # Core=red, Edge=teal, Access=light green
    node_colors = [color_map[node_types[i].item()] for i in range(len(node_types))]
    
    # Size mapping for node types (core nodes are larger)
    size_map = {0: 800, 1: 500, 2: 300}
    node_sizes = [size_map[node_types[i].item()] for i in range(len(node_types))]
    
    plt.figure(figsize=(12, 8))
    pos = nx.spring_layout(G, seed=42, k=2)
    
    nx.draw(G, pos, 
            node_color=node_colors, 
            node_size=node_sizes,
            with_labels=True,
            font_size=10,
            font_weight='bold',
            edge_color='gray',
            alpha=0.9)
    
    # Add legend
    legend_elements = [
        plt.scatter([], [], c='#FF6B6B', s=200, label='Core Router'),
        plt.scatter([], [], c='#4ECDC4', s=150, label='Edge Router'),
        plt.scatter([], [], c='#95E1D3', s=100, label='Access Node')
    ]
    plt.legend(handles=legend_elements, loc='upper left')
    plt.title(title)
    plt.tight_layout()
    plt.show()

# Visualize the network
visualize_network(dataset[0], node_types, "ISP Network Topology")

## 4. Model Implementation: GCN and GAT for Traffic Prediction

We'll implement two models:
1. **GCN (Graph Convolutional Network)**: Simple and effective baseline
2. **GAT (Graph Attention Network)**: Uses attention mechanism to weight neighbor importance

In [None]:
class GCNTrafficPredictor(nn.Module):
    """
    Graph Convolutional Network for traffic prediction.
    
    Architecture:
    - 2 GCN layers with ReLU activation
    - Dropout for regularization
    - Final linear layer for regression
    """
    
    def __init__(self, in_channels, hidden_channels, out_channels=1, dropout=0.3):
        super(GCNTrafficPredictor, self).__init__()
        
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, hidden_channels)
        self.lin = nn.Linear(hidden_channels, out_channels)
        self.dropout = dropout
        
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        # First GCN layer
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=self.dropout, training=self.training)
        
        # Second GCN layer
        x = self.conv2(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=self.dropout, training=self.training)
        
        # Output layer
        x = self.lin(x)
        
        return x

In [None]:
class GATTrafficPredictor(nn.Module):
    """
    Graph Attention Network for traffic prediction.
    
    Architecture:
    - 2 GAT layers with multi-head attention
    - ELU activation
    - Dropout for regularization
    - Final linear layer for regression
    """
    
    def __init__(self, in_channels, hidden_channels, out_channels=1, heads=4, dropout=0.3):
        super(GATTrafficPredictor, self).__init__()
        
        self.conv1 = GATConv(in_channels, hidden_channels, heads=heads, dropout=dropout)
        self.conv2 = GATConv(hidden_channels * heads, hidden_channels, heads=1, dropout=dropout)
        self.lin = nn.Linear(hidden_channels, out_channels)
        self.dropout = dropout
        
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        # First GAT layer
        x = self.conv1(x, edge_index)
        x = F.elu(x)
        x = F.dropout(x, p=self.dropout, training=self.training)
        
        # Second GAT layer
        x = self.conv2(x, edge_index)
        x = F.elu(x)
        x = F.dropout(x, p=self.dropout, training=self.training)
        
        # Output layer
        x = self.lin(x)
        
        return x

In [None]:
# Initialize models
in_channels = node_features.shape[1]  # 6 features
hidden_channels = 32

gcn_model = GCNTrafficPredictor(in_channels, hidden_channels)
gat_model = GATTrafficPredictor(in_channels, hidden_channels)

print("GCN Model:")
print(gcn_model)
print(f"\nTotal parameters: {sum(p.numel() for p in gcn_model.parameters())}")

print("\n" + "="*50 + "\n")

print("GAT Model:")
print(gat_model)
print(f"\nTotal parameters: {sum(p.numel() for p in gat_model.parameters())}")

## 5. Training and Evaluation

We'll train both models and compare their performance.

In [None]:
def train_model(model, train_data, val_data, epochs=200, lr=0.01):
    """
    Train the model and return training history.
    """
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=5e-4)
    criterion = nn.MSELoss()
    
    train_losses = []
    val_losses = []
    
    for epoch in range(epochs):
        # Training
        model.train()
        total_train_loss = 0
        
        for data in train_data:
            optimizer.zero_grad()
            out = model(data)
            loss = criterion(out, data.y)
            loss.backward()
            optimizer.step()
            total_train_loss += loss.item()
        
        avg_train_loss = total_train_loss / len(train_data)
        train_losses.append(avg_train_loss)
        
        # Validation
        model.eval()
        total_val_loss = 0
        
        with torch.no_grad():
            for data in val_data:
                out = model(data)
                loss = criterion(out, data.y)
                total_val_loss += loss.item()
        
        avg_val_loss = total_val_loss / len(val_data)
        val_losses.append(avg_val_loss)
        
        if (epoch + 1) % 50 == 0:
            print(f"Epoch {epoch+1}/{epochs} - Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}")
    
    return train_losses, val_losses

In [None]:
print("Training GCN Model...")
print("=" * 50)
gcn_train_losses, gcn_val_losses = train_model(gcn_model, train_data, val_data, epochs=200)

In [None]:
print("\nTraining GAT Model...")
print("=" * 50)
gat_train_losses, gat_val_losses = train_model(gat_model, train_data, val_data, epochs=200)

### Evaluation on Test Set

In [None]:
def evaluate_model(model, test_data):
    """
    Evaluate the model on test data and return metrics.
    """
    model.eval()
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for data in test_data:
            out = model(data)
            all_preds.append(out.numpy())
            all_labels.append(data.y.numpy())
    
    preds = np.concatenate(all_preds).flatten()
    labels = np.concatenate(all_labels).flatten()
    
    mse = mean_squared_error(labels, preds)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(labels, preds)
    r2 = r2_score(labels, preds)
    
    return {
        'MSE': mse,
        'RMSE': rmse,
        'MAE': mae,
        'R2': r2,
        'predictions': preds,
        'labels': labels
    }

# Evaluate both models
gcn_results = evaluate_model(gcn_model, test_data)
gat_results = evaluate_model(gat_model, test_data)

print("Test Set Evaluation Results")
print("=" * 50)
print(f"\n{'Metric':<10} {'GCN':>12} {'GAT':>12}")
print("-" * 35)
print(f"{'MSE':<10} {gcn_results['MSE']:>12.6f} {gat_results['MSE']:>12.6f}")
print(f"{'RMSE':<10} {gcn_results['RMSE']:>12.6f} {gat_results['RMSE']:>12.6f}")
print(f"{'MAE':<10} {gcn_results['MAE']:>12.6f} {gat_results['MAE']:>12.6f}")
print(f"{'R2':<10} {gcn_results['R2']:>12.6f} {gat_results['R2']:>12.6f}")

## 6. Visualization of Results

In [None]:
# Plot training curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# GCN training curves
axes[0].plot(gcn_train_losses, label='Train Loss', color='blue')
axes[0].plot(gcn_val_losses, label='Val Loss', color='orange')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('MSE Loss')
axes[0].set_title('GCN Training Curves')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# GAT training curves
axes[1].plot(gat_train_losses, label='Train Loss', color='blue')
axes[1].plot(gat_val_losses, label='Val Loss', color='orange')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('MSE Loss')
axes[1].set_title('GAT Training Curves')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Prediction vs Actual scatter plots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# GCN predictions
axes[0].scatter(gcn_results['labels'], gcn_results['predictions'], alpha=0.5, c='blue')
axes[0].plot([0, 1], [0, 1], 'r--', label='Perfect Prediction')
axes[0].set_xlabel('Actual Traffic')
axes[0].set_ylabel('Predicted Traffic')
axes[0].set_title(f'GCN: Predicted vs Actual (R² = {gcn_results["R2"]:.4f})')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# GAT predictions
axes[1].scatter(gat_results['labels'], gat_results['predictions'], alpha=0.5, c='green')
axes[1].plot([0, 1], [0, 1], 'r--', label='Perfect Prediction')
axes[1].set_xlabel('Actual Traffic')
axes[1].set_ylabel('Predicted Traffic')
axes[1].set_title(f'GAT: Predicted vs Actual (R² = {gat_results["R2"]:.4f})')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Visualize predictions on the network graph
def visualize_predictions(data, predictions, node_types, title="Network Traffic Predictions"):
    """
    Visualize traffic predictions on the network graph.
    Node color intensity represents predicted traffic load.
    """
    G = to_networkx(data, to_undirected=True)
    
    # Use predictions as node colors (traffic intensity)
    node_colors = predictions.flatten()
    
    # Size based on node type
    size_map = {0: 800, 1: 500, 2: 300}
    node_sizes = [size_map[node_types[i].item()] for i in range(len(node_types))]
    
    plt.figure(figsize=(12, 8))
    pos = nx.spring_layout(G, seed=42, k=2)
    
    nodes = nx.draw_networkx_nodes(G, pos, 
                                   node_color=node_colors, 
                                   node_size=node_sizes,
                                   cmap=plt.cm.RdYlGn_r,
                                   vmin=0, vmax=1)
    nx.draw_networkx_edges(G, pos, edge_color='gray', alpha=0.5)
    nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold')
    
    plt.colorbar(nodes, label='Traffic Load (0=Low, 1=High)')
    plt.title(title)
    plt.tight_layout()
    plt.show()

# Visualize GAT predictions for the first test sample
gat_model.eval()
with torch.no_grad():
    test_pred = gat_model(test_data[0]).numpy()

visualize_predictions(test_data[0], test_pred, node_types, 
                      "GAT Traffic Predictions on Network")

## Summary

In this tutorial, we demonstrated how to use Graph Neural Networks for network traffic prediction:

### Key Takeaways

1. **Graph Representation**: Network topology naturally maps to graphs, with nodes representing network devices and edges representing connections.

2. **Node Features**: We used device properties (type, capacity, connectivity, historical load) as node features that influence traffic patterns.

3. **GNN Models**: Both GCN and GAT can effectively capture the relational structure of networks:
   - **GCN** uses fixed aggregation weights based on node degrees
   - **GAT** learns attention weights to focus on more important neighbors

4. **Applications**: This approach can be extended to:
   - Real network datasets (e.g., Abilene, GEANT)
   - Time-series forecasting with temporal GNNs
   - Anomaly detection for network security
   - QoS prediction in SDN/NFV environments

### Further Reading

- [RouteNet: Learning to Route](https://arxiv.org/abs/1901.08113) - GNN for network modeling
- [Graph Neural Networks for Communication Networks](https://ieeexplore.ieee.org/document/9269987) - Survey paper
- [PyG Documentation](https://pytorch-geometric.readthedocs.io/) - Official PyTorch Geometric docs

## Exercise

Try the following modifications to extend this tutorial:

1. **Different topology**: Generate a random network using `networkx.random_graphs` and compare GNN performance

2. **Edge features**: Add edge attributes (bandwidth, latency) and use `GATConv` with edge features

3. **Multi-step prediction**: Modify the model to predict traffic for multiple future time steps

4. **Real dataset**: Download and use a real network traffic dataset like the Abilene network