# Lightweight Intent-Aware Network Slicing for 5G

**Team:** Hamsavardhini S (23PD13)  

## Project Overview
5G networks face significant challenges in efficiently allocating resources and managing slice admission due to dynamic traffic patterns, heterogeneous service requirements (eMBB, URLLC, mIoT), and strict Quality of Service (QoS) and Service Level Agreement (SLA) constraints. Traditional slicing mechanisms often lack real-time adaptability and generalization across different network topologies.  

This project presents a **lightweight, AI-driven network slicing framework** that enables **intent-aware, real-time slicing decisions**. The framework integrates multiple AI components:

1. **Natural Language Processing (NLP)** – Extracts user intent and QoS requirements from textual service requests.  
2. **Graph Neural Networks (GNN)** – Predicts slice-level KPIs such as latency, jitter, and packet loss.  
3. **Reinforcement Learning (RL)** – Performs adaptive resource allocation and slice admission control.  
4. **Explainable AI (XAI)** – Provides transparency for GNN predictions and RL decisions.  

## Key Objectives
- Translate natural language service requests into structured slice requirements.
- Simulate network topologies, traffic patterns, and dynamic slice states.
- Predict slice KPIs accurately for informed resource allocation.
- Enable real-time slice admission and bandwidth allocation using RL agents.
- Ensure SLA compliance and provide interpretable explanations for network decisions.

## Workflow Summary
1. **Module 1: Intent Parser (NLP Layer)** – Converts user requests into structured JSON slice intents.  
2. **Module 2: Topology & Traffic Simulator** – Builds heterogeneous network graphs and simulates dynamic traffic.  
3. **Module 3: KPI Predictor (GNN)** – Predicts slice KPIs using a lightweight graph neural network.  
4. **Module 4: Slice Admission & Resource Allocation (RL Agent)** – Makes real-time decisions on slice acceptance and resource assignment.  
5. **Module 5: Slice Lifecycle Manager** – Monitors active slices, triggers scaling, and logs performance metrics.  
6. **Module 6: Explainability Layer** – Provides interpretable insights into GNN predictions and RL decisions.

## Evaluation Metrics
- SLA Satisfaction & Delay Satisfaction Ratio  
- Bandwidth Utilization  
- Fairness Index (Jain’s)  
- Admission Rate & Policy Efficiency Score  
- Detection of Under-/Over-Provisioning  

This notebook documents the full implementation workflow of the framework, along with simulation results, KPI predictions, and evaluation outcomes.

*Install and import Dependencies*

In [None]:
!pip install --upgrade --force-reinstall "numpy==1.26.4"

In [None]:
!pip install -q torch torchvision torchaudio torch-geometric dgl
!pip install -q scikit-learn networkx
!pip install -q transformers sentencepiece captum
!pip install -q gymnasium stable-baselines3
!pip install -q matplotlib seaborn
!python -m spacy download en_core_web_sm

In [None]:
import re, json, math, random, time
import numpy as np
import torch, shap, spacy
import torch.nn as nn
import torch.nn.functional as F
import networkx as nx
import gymnasium as gym

from collections import defaultdict
from transformers import pipeline
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from torch_geometric.data import HeteroData
from torch_geometric.nn import HeteroConv, GATConv
from captum.attr import IntegratedGradients
from stable_baselines3 import PPO

## Module 1: Intent Parser (LLM-Based)

**Purpose:**  
Convert a natural language service request into a structured JSON object representing 5G slice requirements. This allows the slicing framework to interpret user intent and QoS needs in a machine-readable format.

**Input:**  
- Natural language request string, e.g.,  
  _"low-latency video streaming for 1000 users with 10ms delay and 5ms jitter"_

**Process:**  
1. Use a Large Language Model (LLM) to extract key slice parameters.  
2. Extract the following fields: `slice_type` (eMBB, URLLC, mIoT), `latency` (ms), `jitter` (ms), `bandwidth` (Mbps), `priority` (low, medium, high), and `user_count`.  
3. Apply fallback rules if the LLM output is malformed:  
   - Detect keywords (e.g., "video" → eMBB, "iot" → mIoT, "latency" → URLLC).  
   - Extract numeric values from text for latency, jitter, and user count.  
   - Assign default bandwidth and priority based on heuristics.

**Output:**  
Structured JSON object representing slice intent. Example:  
```json
{
  "slice_type": "eMBB",
  "latency": 10,
  "jitter": 5,
  "bandwidth": 100,
  "priority": "high",
  "user_count": 1000
}

In [None]:
llm_parser = pipeline("text2text-generation", model="google/flan-t5-small", max_new_tokens=128)

def parse_intent_llm(request_text):
    prompt = f"""
      You are an expert intent parser for 5G slicing.
      Extract the following fields and return only a JSON object:

      - slice_type (eMBB, URLLC, mIoT)
      - latency (in ms)
      - jitter (in ms)
      - bandwidth (in Mbps)
      - priority (low, medium, high)
      - user_count (number of users)

      Example:
      Request: "low-latency video streaming for 1000 users with 10ms delay and 5ms jitter"
      Output: {{"slice_type": "eMBB", "latency": 10, "jitter": 5, "bandwidth": 100, "priority": "high", "user_count": 1000}}

      Now parse:
      Request: "{request_text}"
      Output:
      """
    response = llm_parser(prompt)[0]["generated_text"]

    match = re.search(r"\{.*\}", response, re.DOTALL)
    if match:
        json_str = match.group(0)
        try:
            return json.loads(json_str)
        except json.JSONDecodeError:
            pass

    intent = {}
    t = request_text.lower()
    if "video" in t: intent["slice_type"] = "eMBB"
    elif "iot" in t: intent["slice_type"] = "mIoT"
    else: intent["slice_type"] = "URLLC" if "latency" in t else "eMBB"
    intent["latency"] = int(re.search(r"(\d+)ms delay", t).group(1)) if "delay" in t else 50
    intent["jitter"] = int(re.search(r"(\d+)ms jitter", t).group(1)) if "jitter" in t else 5
    intent["user_count"] = int(re.search(r"(\d+) users", t).group(1)) if "users" in t else 100
    intent["bandwidth"] = 100
    intent["priority"] = "high" if "low-latency" in t else "medium"

    return intent

## Module 2: Topology, Traffic Simulator & Resource State Generator

**Purpose:**  
Simulate a 5G network environment by generating network topologies, traffic flows, and resource states. Prepares heterogeneous graph representations and KPI metrics for downstream GNN-based predictions and RL-based slice management.

**Input:**  
- Structured intent JSON from Module 1  
- Number of network nodes (optional)  
- Random seed for reproducibility (optional)  
- Routing strategy (e.g., shortest path, ECMP)

**Process:**  
1. **Topology Generation:**  
   - Create synthetic network topologies using a directed multigraph.  
   - Assign routers as nodes and links as edges with attributes like bandwidth, base latency, buffer size, and ports.  
   - Optionally load a real-world network graph.

2. **Traffic Derivation:**  
   - Generate flows for each slice based on user count and bandwidth in the intent JSON.  
   - Simulate multiple flows per slice with time-varying bandwidth to capture burstiness.  
   - Assign slice-specific parameters (e.g., buffer size, service rate factors) based on slice type (eMBB, URLLC, mIoT).

3. **Routing & KPI Calculation:**  
   - Compute routing paths using shortest-path or ECMP algorithms.  
   - Calculate per-flow metrics:  
     - **Delay:** sum of base link latency + queueing delay (MM1 approximation).  
     - **Jitter:** standard deviation of time-varying delays.  
     - **Packet loss:** percentage of lost packets based on service rate vs arrival rate.  
     - **Delta:** difference between target latency and achieved path delay.  
   - Compile flow-level metrics including priority, slice type, and user count.

4. **Heterogeneous Graph Construction:**  
   - Build a graph with node types: `link`, `queue`, `flow`.  
   - Connect nodes via edges representing dependencies: `link->flow`, `queue->flow`, and `flow->flow` (shared paths).  
   - Normalize node features for GNN input (e.g., Min-Max or Standard scaling).  

**Output:**  
- Network topology (graph) with node and link attributes  
- Routing paths for each flow  
- Flow-level KPI metrics (delay, jitter, loss, delta, bandwidth, priority, user count, slice type)  
- Heterogeneous graph (`pyg_graph`) ready for GNN input

**Remarks:**  
- Captures realistic network behavior including dynamic traffic, burstiness, and queueing effects.  
- Output serves as the input for **Module 3: KPI Predictor (GNN)**.  
- Supports multiple routing strategies and slice types for flexible simulation.  


In [None]:
def generate_topology(num_nodes=20, seed=42, real_graph=None):
    random.seed(seed); np.random.seed(seed)
    G = nx.MultiDiGraph()
    if real_graph:
        G = real_graph.copy()
    else:
        base = nx.erdos_renyi_graph(n=num_nodes, p=0.15, seed=seed)
        for u, v in base.edges():
            for a, b in [(u, v), (v, u)]:
                bw = random.choice([50, 100, 200, 500])  # Mbps
                latency = round(random.uniform(0.5, 5.0), 3)  # ms
                port = random.randint(0, 3)
                buffer = random.randint(50, 500)  # packets
                G.add_edge(a, b, bandwidth=bw, latency_base=latency, port=port, buffer=buffer)
    return G

def mm1_delay(mu, lam):
    if lam >= mu: return float('inf')
    return 1.0 / (mu - lam)

def derive_traffic(G, intents, packet_size=1200, burst_factor=0.3, timesteps=5):
    nodes = list(G.nodes())
    T = defaultdict(lambda: {'Flows': []})
    fid = 0

    # Slice-specific parameters
    slice_params = {
        "eMBB": {"buffer":200, "mu_factor":1.0},
        "URLLC": {"buffer":50, "mu_factor":1.2},
        "mIoT": {"buffer":500, "mu_factor":0.8}
    }

    for intent in intents:
        num_users = intent['user_count']
        total_bw = intent['bandwidth']  # Mbps
        flows_count = max(1, math.ceil(num_users / 100))
        per_flow_bw = total_bw / flows_count

        for _ in range(flows_count):
            src, dst = random.sample(nodes, 2)
            # Generate time-varying bandwidth per flow
            bw_time = [per_flow_bw * (1 + burst_factor*random.uniform(-1,1)) for _ in range(timesteps)]
            flow = {
                "flow_id": fid,
                "origin_node": src,
                "dest_node": dst,
                "bandwidth_t": bw_time,
                "avgPacketSize": packet_size,
                "type": intent['slice_type'],            # fixed here
                "priority": intent['priority'],
                "target_latency": intent['latency'],
                "user_count": intent['user_count'],
                "buffer": slice_params[intent['slice_type']]['buffer'],
                "mu_factor": slice_params[intent['slice_type']]['mu_factor']  # fixed here
            }
            T[(src, dst)]['Flows'].append(flow)
            fid += 1
    return T

def route_and_metrics(G, T, routing='shortest', packet_size=1200):
    R, metrics = {}, []

    for (src,dst), entry in T.items():
        tempG = nx.DiGraph()
        for u,v,d in G.edges(data=True):
            tempG.add_edge(u, v, weight=d['latency_base'])
        try:
            if routing=='shortest':
                path = nx.shortest_path(tempG, source=src, target=dst, weight='weight')
            elif routing=='ecmp':
                paths = list(nx.all_shortest_paths(tempG, source=src, target=dst, weight='weight'))
                path = random.choice(paths)
            else:
                path = nx.shortest_path(tempG, source=src, target=dst, weight='weight')
        except:
            path = [src, dst]
        R[(src,dst)] = path

        for f in entry['Flows']:
            delays_all, losses_all = [], []
            for bw in f['bandwidth_t']:  # time-varying
                lam = (bw*1e6/8)/packet_size
                delays, losses = [], []
                for u,v in zip(path[:-1], path[1:]):
                    edata = G.get_edge_data(u,v)
                    if not edata:
                        edata = {'bandwidth':100, 'latency_base':1.0, 'buffer':100}
                    else:
                        edata = list(edata.values())[0]
                    mu = (edata['bandwidth']*1e6/8)/packet_size * f['mu_factor']
                    qd = mm1_delay(mu, lam)
                    qd_ms = 1000 if math.isinf(qd) else qd*1000
                    delays.append(edata['latency_base'] + qd_ms)
                    loss = max(0.0,(lam/mu-1.0)*100.0) if mu>0 and lam>mu else 0.0
                    losses.append(loss)
                delays_all.append(sum(delays))
                losses_all.append(max(losses))
            path_delay = np.mean(delays_all)
            jitter = np.std(delays_all)
            loss_pct = np.mean(losses_all)
            delta = f['target_latency'] - path_delay
            metrics.append({
                "flow_id": f['flow_id'], "src": src, "dst": dst, "path": path,
                "delay": path_delay, "jitter": jitter, "loss": loss_pct,
                "bw": np.mean(f['bandwidth_t']), "delta": delta, "priority": f['priority'],
                "user_count": f['user_count'], "slice_type": f['type']
            })
    return R, metrics

def build_pyg_graph(G, metrics):
    """
    Creates heterogeneous graph with nodes: link, queue, flow
    Edges: link->flow, queue->flow, flow->flow dependencies
    """
    data = HeteroData()
    pr_map = {'low':0, 'medium':1, 'high':2}

    link_feats = [[d['bandwidth'], d['latency_base'], d.get('buffer',100)] for _,_,d in G.edges(data=True)]
    if link_feats:
        link_feats = np.array(link_feats, dtype=float)
        data['link'].x = torch.tensor(MinMaxScaler().fit_transform(link_feats), dtype=torch.float32)

    queue_feats = [[d.get('buffer',100)] for _,_,d in G.edges(data=True)]
    if queue_feats:
        queue_feats = np.array(queue_feats, dtype=float)
        data['queue'].x = torch.tensor(MinMaxScaler().fit_transform(queue_feats), dtype=torch.float32)

    path_feats = []
    for m in metrics:
        path_feats.append([
            m['bw'], m['delay'], m['jitter'], m['loss'], m['delta'],
            pr_map.get(m['priority'],1), m['user_count']
        ])
    if path_feats:
        path_feats = np.array(path_feats, dtype=float)
        path_feats[:,4] = StandardScaler().fit_transform(path_feats[:,4].reshape(-1,1)).flatten()
        data['flow'].x = torch.tensor(MinMaxScaler().fit_transform(path_feats), dtype=torch.float32)

    src_idx, dst_idx = [], []
    link_list = list(G.edges(keys=True))
    for i, m in enumerate(metrics):
        for u,v in zip(m['path'][:-1], m['path'][1:]):
            link_idx = next((idx for idx,(a,b,k) in enumerate(link_list) if a==u and b==v), None)
            if link_idx is not None:
                src_idx.append(link_idx)
                dst_idx.append(i)
    if src_idx:
        data['link','to','flow'].edge_index = torch.tensor([src_idx, dst_idx], dtype=torch.long)

    src_idx, dst_idx = [], []
    for i, m in enumerate(metrics):
        for u,v in zip(m['path'][:-1], m['path'][1:]):
            queue_idx = next((idx for idx,(a,b,k) in enumerate(link_list) if a==u and b==v), None)
            if queue_idx is not None:
                src_idx.append(queue_idx)
                dst_idx.append(i)
    if src_idx:
        data['queue','to','flow'].edge_index = torch.tensor([src_idx, dst_idx], dtype=torch.long)

    src_idx, dst_idx = [], []
    for i, fi in enumerate(metrics):
        for j, fj in enumerate(metrics):
            if i!=j and len(set(fi['path']).intersection(set(fj['path'])))>0:
                src_idx.append(i)
                dst_idx.append(j)
    if src_idx:
        data['flow','to','flow'].edge_index = torch.tensor([src_idx,dst_idx], dtype=torch.long)

    return data

def module2(intents, num_nodes=20, seed=42, routing='shortest'):
    topo = generate_topology(num_nodes=num_nodes, seed=seed)
    T = derive_traffic(topo, intents, timesteps=5)
    R, metrics = route_and_metrics(topo, T, routing=routing)
    g = build_pyg_graph(topo, metrics)
    return {"topology": topo, "routing": R, "metrics": metrics, "pyg_graph": g}


## Module 3: KPI Predictor (Graph Neural Network)

**Purpose:**  
Predict slice-level Key Performance Indicators (KPIs) — delay, jitter, and packet loss — using a lightweight graph neural network (GNN) applied to the heterogeneous network graph generated in Module 2. This enables the framework to anticipate slice performance before admission or resource allocation decisions.

**Input:**  
- Heterogeneous graph (`pyg_graph`) from Module 2 containing:  
  - Nodes: `flow`, `link`, `queue` (optional)  
  - Edges representing dependencies: `link->flow`, `queue->flow`, `flow->flow`  
- Flow-level metrics (targets) for supervised training  
- Intent-based features (priority, delta, user count)  

**Process:**  
1. **Feature Preprocessing:**  
   - Normalize node features (Min-Max or standard scaling).  
   - Concatenate intent-based features to flow nodes.  

2. **GNN Architecture:**  
   - Use a **heterogeneous GAT (Graph Attention Network)** for message passing between nodes.  
   - Node types (`flow`, `link`, `queue`) are projected into a hidden dimension.  
   - Aggregates information along graph edges using attention mechanism.  

3. **KPI Prediction:**  
   - Apply a small feedforward network to predict three KPIs per flow: delay, jitter, and loss.  
   - Train the model using regression loss (e.g., MSE) and evaluate using metrics such as SMAPE.  

4. **Explainability:**  
   - Use Integrated Gradients or similar XAI techniques to attribute predicted KPIs to input features.  
   - Identify which node or feature contributes most to high delay, jitter, or loss.  

**Output:**  
- Predicted KPIs for each flow in the network graph:  
  ```json
  [
    {"flow_id": 0, "delay": 12.5, "jitter": 1.2, "loss": 0.3},
    {"flow_id": 1, "delay": 8.7, "jitter": 0.9, "loss": 0.1},
    ...
  ]

In [None]:
class KPI_GNN(nn.Module):
    def __init__(self, data, hidden_dim=32, intent_features_dim=2):
        super().__init__()
        self.hidden_dim = hidden_dim

        in_channels_flow = data['flow'].x.shape[1] + intent_features_dim
        self.flow_lin = nn.Linear(in_channels_flow, hidden_dim)

        in_channels_link = data['link'].x.shape[1]
        self.link_lin = nn.Linear(in_channels_link, hidden_dim)

        in_channels_queue = data['queue'].x.shape[1] if 'queue' in data else 0
        if in_channels_queue > 0:
            self.queue_lin = nn.Linear(in_channels_queue, hidden_dim)

        conv_dict = {
            ('link','to','flow'): GATConv(hidden_dim, hidden_dim, heads=2, concat=False, add_self_loops=False),
            ('flow','to','flow'): GATConv(hidden_dim, hidden_dim, heads=2, concat=False, add_self_loops=True)
        }
        if 'queue' in data:
            conv_dict[('queue','to','flow')] = GATConv(hidden_dim, hidden_dim, heads=2, concat=False, add_self_loops=False)

        self.conv = HeteroConv(conv_dict, aggr='mean')

        self.flow_predictor = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim//2),
            nn.ReLU(),
            nn.Linear(hidden_dim//2, 3)  # Predict 3 KPIs: delay, jitter, loss
        )

    def forward(self, data):
        flow_x = (data['flow'].x - data['flow'].x.mean(0)) / (data['flow'].x.std(0) + 1e-6)
        if hasattr(data['flow'], 'intent_features'):
            flow_x = torch.cat([flow_x, data['flow'].intent_features], dim=1)
        else:
            flow_x = torch.cat([flow_x, torch.zeros(flow_x.shape[0], 2, device=flow_x.device)], dim=1)
        flow_x = self.flow_lin(flow_x)

        link_x = (data['link'].x - data['link'].x.mean(0)) / (data['link'].x.std(0) + 1e-6)
        link_x = self.link_lin(link_x)

        if 'queue' in data:
            queue_x = (data['queue'].x - data['queue'].x.mean(0)) / (data['queue'].x.std(0) + 1e-6)
            queue_x = self.queue_lin(queue_x)
        else:
            queue_x = None

        x_dict = {'flow': flow_x, 'link': link_x}
        if queue_x is not None:
            x_dict['queue'] = queue_x

        x_dict = self.conv(x_dict, data.edge_index_dict)
        flow_out = F.relu(x_dict['flow'])
        return self.flow_predictor(flow_out)

def smape(y_true, y_pred, eps=1e-6):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return 100 / len(y_true) * np.sum(
        2 * np.abs(y_pred - y_true) / (np.abs(y_true) + np.abs(y_pred) + eps)
    )

def set_flow_targets(data, metrics):
    y = torch.tensor([[m['delay'], m['jitter'], m['loss']] for m in metrics], dtype=torch.float32)
    data['flow'].y = y
    return data

def train_model(model, data, epochs=50, lr=0.01, device=None):
    device = device or torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    data = data.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    model.train()
    y_true = data['flow'].y
    for epoch in range(epochs):
        optimizer.zero_grad()
        y_pred = model(data)
        loss = F.mse_loss(y_pred, y_true)
        loss.backward()
        optimizer.step()
        if epoch % 10 == 0:
            smape_val = smape(y_true.detach().cpu().numpy(), y_pred.detach().cpu().numpy())
            print(f"Epoch {epoch}: Loss={loss.item():.4f}, SMAPE={smape_val:.2f}%")
    return model

def explain_predictions(model, data, node_type='flow', feature_idx=0):
    model.eval()
    device = next(model.parameters()).device

    flow_x = (data['flow'].x - data['flow'].x.mean(0)) / (data['flow'].x.std(0) + 1e-6)
    flow_x = flow_x.to(device)
    if hasattr(data['flow'], 'intent_features'):
        flow_x = torch.cat([flow_x, data['flow'].intent_features.to(device)], dim=1)
    flow_x.requires_grad = True

    link_x = (data['link'].x - data['link'].x.mean(0)) / (data['link'].x.std(0) + 1e-6)
    link_x = model.link_lin(link_x.to(device))
    if 'queue' in data:
        queue_x = (data['queue'].x - data['queue'].x.mean(0)) / (data['queue'].x.std(0)+1e-6)
        queue_x = model.queue_lin(queue_x.to(device))
    else:
        queue_x = torch.zeros((0, model.flow_lin.out_features), device=device)

    def forward_wrapper(flow_feats):
        x_dict = {
            'flow': model.flow_lin(flow_feats),
            'link': link_x,
            'queue': queue_x
        }
        x_out = model.conv(x_dict, data.edge_index_dict)
        return model.flow_predictor(F.relu(x_out['flow']))[:, feature_idx]

    from captum.attr import IntegratedGradients
    ig = IntegratedGradients(forward_wrapper)
    return ig.attribute(flow_x).detach().cpu().numpy()

## Module 4: Slice Admission & Resource Allocation (RL-Based)

**Purpose:**  
Decide in real-time whether to admit a network slice and how to allocate resources (e.g., bandwidth) using reinforcement learning. This module ensures SLA compliance while optimizing network utilization and fairness.

**Input:**  
- Predicted KPIs from Module 3 (delay, jitter, loss per flow)  
- Structured intent JSON from Module 1 (slice type, latency, jitter, bandwidth, priority, user count)  
- Environment constraints such as maximum available bandwidth  

**Process:**  
1. **Environment Setup:**  
   - Model the network slicing scenario as a Gymnasium environment.  
   - Each step represents processing a single slice request.  
   - Observation includes normalized metrics, slice intent features, and graph-level statistics.

2. **Action Space:**  
   - Admit or reject the slice.  
   - Allocate bandwidth (up to maximum available).  

3. **Reward Design:**  
   - Positive reward for meeting slice latency, jitter, and loss targets.  
   - Negative reward for SLA violations, over-allocation, or rejection of feasible slices.  
   - Encourage fairness and efficient resource usage across slices.

4. **Reinforcement Learning Agent:**  
   - Use policy-gradient or actor-critic algorithms (e.g., PPO) to learn optimal admission and allocation policies.  
   - Train the agent over multiple episodes to generalize across varying slice requests and network conditions.  

5. **State Update:**  
   - After each action, environment updates to the next slice request.  
   - Track cumulative rewards, terminated state, and ongoing resource usage.

**Output:**  
- Slice admission decision (`admit`: true/false)  
- Bandwidth allocation for each slice  
- Updated network state after each action  
- Learned policy that can generalize to new slice requests  

**Remarks:**  
- Enables adaptive and intent-aware slice admission and resource allocation.  
- Works in conjunction with KPI predictions from Module 3 to proactively avoid SLA violations.  
- Provides feedback for **Module 5: Slice Lifecycle Management** to monitor and adjust active slices.

In [None]:
class SliceEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self, intents, metrics, max_bandwidth=100):
        super().__init__()
        self.intents = intents
        self.metrics = metrics
        self.max_bandwidth = max_bandwidth
        self.current_slice = 0
        self.obs_dim = 8
        self.observation_space = gym.spaces.Box(low=0.0, high=1.0, shape=(self.obs_dim,), dtype=np.float32)
        self.action_space = gym.spaces.MultiDiscrete([2, max_bandwidth+1])
        self.state = self._get_state()
        self.done = False

    def _get_state(self):
        if self.current_slice >= len(self.intents):
            return np.zeros(self.obs_dim, dtype=np.float32)
        intent = self.intents[self.current_slice]
        metric = self.metrics[self.current_slice]
        delay = min(metric['delay']/intent['latency'], 1.0)
        jitter = min(metric['jitter']/intent['jitter'], 1.0)
        loss = min(metric['loss']/0.1, 1.0)
        user_count = min(intent['user_count']/1000.0, 1.0)
        priority = {'low':0.0, 'medium':0.5, 'high':1.0}.get(intent['priority'],0.5)
        graph_feats = [
            metric['total_link_bw'],
            metric['avg_flow_delay'],
            metric['avg_flow_jitter']
        ]
        return np.array([delay, jitter, loss, user_count, priority] + graph_feats, dtype=np.float32)

    def step(self, action):
        admit, bw_alloc = action
        intent = self.intents[self.current_slice]
        metric = self.metrics[self.current_slice]
        reward = 0.0
        if admit:
            reward += max(0, intent['latency'] - metric['delay'])
            reward += max(0, intent['jitter'] - metric['jitter'])
            reward += max(0, 0.1 - metric['loss'])
            reward -= max(0, bw_alloc - intent['bandwidth'])*0.01
        else:
            reward -= 0.1
        self.current_slice += 1
        terminated = self.current_slice >= len(self.intents)
        truncated = False
        self.state = self._get_state()
        return self.state, reward, terminated, truncated, {}

    def reset(self, seed=None, options=None):
        self.current_slice = 0
        self.done = False
        self.state = self._get_state()
        return self.state, {}

    def render(self, mode='human'):
        print(f"Slice {self.current_slice}/{len(self.intents)} - State: {self.state}")

def create_rl_agent(env, total_timesteps=5000):
    model = PPO("MlpPolicy", env, verbose=0)
    model.learn(total_timesteps=total_timesteps)
    return model


## Module 5: Slice Lifecycle Manager

**Purpose:**  
Monitor active network slices, ensure SLA compliance, and trigger dynamic scaling or teardown actions based on KPI updates. Maintains slice status throughout its lifecycle.

**Input:**  
- RL agent outputs from Module 4 (admit/reject decisions, bandwidth allocations)  
- Real-time or simulated KPI updates per slice (delay, jitter, loss)  
- SLA thresholds for each KPI (optional; defaults: delay=0.05, jitter=0.01, loss=0.01)

**Process:**  
1. **Initialization:**  
   - Track each slice’s status (active/inactive), SLA violations, and rescaling triggers.  
   - Set default thresholds for delay, jitter, and loss if not provided.

2. **Metrics Update:**  
   - Receive periodic KPI updates for each slice.  
   - Compare each KPI against SLA thresholds.  
   - Count the number of SLA violations per slice.  
   - Trigger rescaling if any KPI exceeds its threshold; otherwise, mark the slice as compliant.

3. **Status Tracking:**  
   - Maintain a structured record of all slices including:  
     - `slice_id`  
     - Admission status (`admit`)  
     - Current operational status (`active` or `inactive`)  
     - Number of SLA violations  
     - Rescaling triggers  
     - Current KPI metrics  

4. **Simulation (Optional):**  
   - Simulate time-varying KPI updates by adding small random noise to predicted KPIs.  
   - Iterate over multiple time steps to emulate realistic dynamic network conditions.  

**Output:**  
- Updated slice status for all slices:  
  ```json
  [
    {
      "slice_id": 0,
      "admit": true,
      "status": "active",
      "sla_violations": 0,
      "rescale_triggered": false,
      "current_metrics": {"delay": 0.012, "jitter": 0.003, "loss": 0.001}
    },
    ...
  ]


In [None]:
class SliceLifecycleManager:
    def __init__(self, rl_output, sla_thresholds=None):
        self.slices = rl_output
        self.num_slices = len(rl_output)
        self.sla_thresholds = sla_thresholds or {
            "delay": 0.05,
            "jitter": 0.01,
            "loss": 0.01
        }
        self.status = [
            {
                "slice_id": s['slice_id'],
                "admit": s['admit'],
                "status": "active" if s['admit'] else "inactive",
                "sla_violations": 0,
                "rescale_triggered": False,
                "current_metrics": {"delay": 0.0, "jitter": 0.0, "loss": 0.0}
            } for s in rl_output
        ]

    def update_slice_metrics(self, kpi_updates):
        for i, kpi in enumerate(kpi_updates):
            if not self.slices[i]['admit']:
                continue
            self.status[i]['current_metrics'] = kpi
            violations = 0
            for k in ["delay", "jitter", "loss"]:
                if kpi[k] > self.sla_thresholds[k]:
                    violations += 1
            self.status[i]['sla_violations'] = violations
            if violations > 0:
                self.status[i]['rescale_triggered'] = True
            else:
                self.status[i]['rescale_triggered'] = False

    def get_status(self):
        return self.status

def simulate_kpi_updates(rl_output, predicted_kpis, steps=10, noise_scale=0.002):
    num_slices = len(rl_output)

    for t in range(steps):
        kpi_updates = []
        for i in range(num_slices):
            if not rl_output[i]['admit']:
                kpi_updates.append({"delay": 0.0, "jitter": 0.0, "loss": 0.0})
                continue
            delay = max(0, predicted_kpis['delay'][i] + np.random.normal(0, noise_scale))
            jitter = max(0, predicted_kpis['jitter'][i] + np.random.normal(0, noise_scale))
            loss = max(0, predicted_kpis['loss'][i] + np.random.normal(0, noise_scale))
            kpi_updates.append({"delay": delay, "jitter": jitter, "loss": loss})

        yield kpi_updates
        time.sleep(0.1)

## Module 6: Explainability Layer & Dashboard

**Purpose:**  
Provide interpretability and transparency for the predictions and decisions made by the GNN-based KPI predictor (Module 3) and the RL-based slice admission agent (Module 4). Generate a consolidated dashboard summarizing explanations and key evaluation metrics for network slices.

**Input:**  
- Trained GNN model from Module 3  
- Trained RL agent from Module 4  
- Heterogeneous graph (`pyg_graph`) containing network, flow, and queue features  
- RL outputs including slice admission and bandwidth allocation  
- Predicted KPIs (delay, jitter, loss)  
- Optional: Flow feature names for readable explanations  

**Process:**  
1. **GNN Explainability:**  
   - Apply Integrated Gradients to the flow node features of the GNN.  
   - Identify which features most influence predicted KPIs (delay, jitter, loss).  
   - Return per-flow feature attributions.  

2. **RL Explainability:**  
   - Use SHAP (KernelExplainer) to attribute slice admission and allocation decisions to input features.  
   - Determine which state variables most influenced the agent’s decisions.  

3. **Evaluation Metrics Computation:**  
   - SLA satisfaction ratio: fraction of admitted slices meeting KPI thresholds.  
   - Bandwidth utilization: fraction of allocated bandwidth over maximum available.  
   - Fairness index (Jain’s index) for bandwidth allocation across slices.  
   - Admission rate: fraction of slices admitted.  
   - Policy efficiency: combined measure of SLA satisfaction, utilization, and fairness.  

4. **Dashboard Generation:**  
   - Summarize GNN feature importances per flow.  
   - Summarize RL decision explanations per action.  
   - Include computed evaluation metrics for overall network slicing performance.  

**Output:**  
- GNN explanations per flow, highlighting most influential features  
- RL decision explanations per slice, highlighting key influencing state features  
- Performance and fairness metrics:  
  ```json
  {
    "gnn_explanations": ["Flow 0 influenced by delay(0.12), bw(0.08), priority(0.05)", "..."],
    "rl_explanations": ["Decision 0 influenced by feature_2(0.15), feature_4(0.10), feature_0(0.08)", "..."],
    "evaluation_metrics": {
      "sla_satisfaction_ratio": 0.9,
      "bandwidth_utilization": 0.85,
      "fairness_index": 0.95,
      "admission_rate": 0.8,
      "policy_efficiency": 0.73
    }
  }


In [None]:
class Module6Dashboard:
    def __init__(self, gnn_model, rl_agent, pyg_graph, rl_output, predicted_kpis, flow_feature_names=None):
        self.gnn_model = gnn_model
        self.rl_agent = rl_agent
        self.data = pyg_graph
        self.rl_output = rl_output
        self.predicted_kpis = predicted_kpis
        self.flow_feature_names = flow_feature_names or ["bw", "delay", "jitter", "loss", "delta", "priority", "user_count"]

    def gnn_explain(self, feature_idx=0):
        self.gnn_model.eval()
        device = next(self.gnn_model.parameters()).device

        flow_x = self.data['flow'].x.to(device)
        if flow_x.shape[1] != self.gnn_model.flow_lin.in_features:
            raise ValueError(f"flow_x has {flow_x.shape[1]} features, but model expects {self.gnn_model.flow_lin.in_features}")
        flow_x.requires_grad = True


        def forward_wrapper(x):
            x_dict = {
                'flow': self.gnn_model.flow_lin(x),   # x must have same dim as trained GNN
                'link': self.gnn_model.link_lin(self.data['link'].x.to(device))
            }
            if 'queue' in self.data:
                x_dict['queue'] = self.gnn_model.queue_lin(self.data['queue'].x.to(device))
            x_out = self.gnn_model.conv(x_dict, self.data.edge_index_dict)
            return self.gnn_model.flow_predictor(torch.relu(x_out['flow']))[:, feature_idx]

        ig = IntegratedGradients(forward_wrapper)
        attr = ig.attribute(flow_x)
        return attr.detach().cpu().numpy()

    def rl_explain(self, state_samples, nsamples=50):
        def rl_forward(states):
            actions = []
            for s in states:
                s_tensor = np.array(s).reshape(1,-1)
                a, _ = self.rl_agent.predict(s_tensor, deterministic=True)
                actions.append(a[0])
            return np.array(actions)

        explainer = shap.KernelExplainer(rl_forward, np.array(state_samples[:10]))
        shap_values = explainer.shap_values(np.array(state_samples[:nsamples]))
        return shap_values

    def compute_metrics(self):
        metrics = {}
        sla_ok = []
        for i, s in enumerate(self.rl_output):
            if not s['admit']:
                sla_ok.append(False)
                continue
            kpi = self.predicted_kpis
            delay_ok = kpi['delay'][i] <= s.get('latency', 50)
            jitter_ok = kpi['jitter'][i] <= s.get('jitter', 5)
            loss_ok = kpi['loss'][i] <= 0.05
            sla_ok.append(delay_ok and jitter_ok and loss_ok)
        metrics['sla_satisfaction_ratio'] = sum(sla_ok)/len(sla_ok)

        total_alloc = sum([s['bandwidth_allocation'] if s['admit'] else 0 for s in self.rl_output])
        max_possible = sum([s.get('bandwidth',100) for s in self.rl_output])
        metrics['bandwidth_utilization'] = total_alloc/max_possible if max_possible>0 else 0.0

        allocs = np.array([s['bandwidth_allocation'] if s['admit'] else 0 for s in self.rl_output])
        if np.sum(allocs) > 0:
            metrics['fairness_index'] = np.sum(allocs)**2 / (len(allocs) * np.sum(allocs**2))
        else:
            metrics['fairness_index'] = 0.0

        metrics['admission_rate'] = sum([s['admit'] for s in self.rl_output])/len(self.rl_output)
        metrics['policy_efficiency'] = metrics['sla_satisfaction_ratio'] * metrics['bandwidth_utilization'] * metrics['fairness_index']

        return metrics

    def generate_dashboard(self, gnn_attr=None, rl_shap=None, top_features=3):
        gnn_msgs = []
        if gnn_attr is not None:
            for i, attr in enumerate(gnn_attr):
                top_idx = np.argsort(-np.abs(attr))[:top_features]
                features = [self.flow_feature_names[j] for j in top_idx]
                values = [attr[j] for j in top_idx]
                gnn_msgs.append(f"Flow {i} influenced by " + ", ".join([f"{f}({v:.3f})" for f,v in zip(features, values)]))

        rl_msgs = []
        if rl_shap is not None:
            for i, shap_vals in enumerate(rl_shap):
                top_idx = np.argsort(-np.abs(shap_vals))[:top_features]
                features = [f"feature_{j}" for j in top_idx]
                values = [shap_vals[j] for j in top_idx]
                rl_msgs.append(
                    f"Decision {i} influenced by " +
                    ", ".join([f"{f}({np.mean(v):.3f})" if isinstance(v, np.ndarray) else f"{f}({v:.3f})"
                              for f, v in zip(features, values)])
                )
        metrics = self.compute_metrics()

        dashboard = {
            "gnn_explanations": gnn_msgs,
            "rl_explanations": rl_msgs,
            "evaluation_metrics": metrics
        }
        return dashboard


## Run All Modules: End-to-End Simulation

**Purpose:**  
Execute the full workflow of the Lightweight Intent-Aware Network Slicing framework, from intent parsing to KPI prediction, RL-based slice admission, lifecycle management, and explainability dashboard generation. This demonstrates the integration of all modules and provides actionable insights on slice performance.



In [None]:
# Module 1:
example_request = "low-latency video streaming for 1000 users with 10ms delay and 5ms jitter"
intent_output = parse_intent_llm(example_request)
print("Intent parsed:", intent_output)

# Module 2:
result = module2([intent_output], num_nodes=20)
data = result['pyg_graph']
metrics = result['metrics']
num_links, num_flows = data['link'].x.shape[0], data['flow'].x.shape[0]
print("Graph info:", data)
print("Sample metrics:", metrics[:2])

slice_map = {"low": 0, "medium": 1, "high": 2}
intent_feats = np.array([[slice_map[m['priority']], m['user_count']/1000.0] for m in metrics])
data['flow'].intent_features = torch.tensor(intent_feats, dtype=torch.float32)

kpi_labels = np.array([[m['delay'], m['jitter'], m['loss']] for m in metrics])
data['flow'].y = torch.tensor(kpi_labels, dtype=torch.float32)

# Module 3:
hidden_dim = 32
model = KPI_GNN(data, hidden_dim=hidden_dim)
model = train_model(model, data, epochs=50, lr=0.01)
model.eval()
flow_preds = model(data).detach().numpy()
predicted_kpis = {
    "delay": flow_preds[:,0].tolist(),
    "jitter": flow_preds[:,1].tolist(),
    "loss": flow_preds[:,2].tolist()
}
print("Predicted KPIs:", predicted_kpis)
attributions = explain_predictions(model, data, feature_idx=0)
print("Feature attributions (delay):", attributions)

# Module 4:
num_flows = data['flow'].x.shape[0]
total_link_bw = data['link'].x[:,0].sum().item()
avg_flow_delay = data['flow'].x[:,1].mean().item()
avg_flow_jitter = data['flow'].x[:,2].mean().item()
rl_metrics = []
for i, m in enumerate(metrics):
    rl_metrics.append({
        "delay": predicted_kpis['delay'][i],
        "jitter": predicted_kpis['jitter'][i],
        "loss": predicted_kpis['loss'][i],
        "bandwidth": m['bw'],
        "priority": m['priority'],
        "latency": m['delta'] + m['delay'],
        "user_count": m['user_count'],
        "total_link_bw": total_link_bw,
        "avg_flow_delay": avg_flow_delay,
        "avg_flow_jitter": avg_flow_jitter
    })
env = SliceEnv([intent_output]*num_flows, rl_metrics)
model_rl = create_rl_agent(env, total_timesteps=5000)

state, _ = env.reset()
actions_taken = []
for i in range(num_flows):
    action, _ = model_rl.predict(state)
    actions_taken.append(action)
    state, reward, terminated, truncated, info = env.step(action)
    if terminated:
        break

output = []
for i, a in enumerate(actions_taken):
    slice_out = {
        "slice_id": i,
        "admit": bool(a[0]),
        "bandwidth_allocation": int(a[1]),
        "routing_update": []
    }
    output.append(slice_out)
print("RL Agent Decisions:")
for o in output:
    print(o)

# Module 5:
manager = SliceLifecycleManager(rl_output=output, sla_thresholds={
    "delay": 0.01,
    "jitter": 0.005,
    "loss": 0.01
})
for kpi_step in simulate_kpi_updates(output, predicted_kpis, steps=20, noise_scale=0.002):
    manager.update_slice_metrics(kpi_step)
    current_status = manager.get_status()
    print("Slice Status Update:")
    for s in current_status:
        print(s)
    for s in current_status:
        if s['rescale_triggered']:
            print(f"Slice {s['slice_id']} requires rescaling due to SLA violations.")

# Module 6:
flow_feature_names = ["bw", "delay", "jitter", "loss", "delta", "priority", "user_count", "feat8", "feat9"]
data['flow'].x = torch.cat([data['flow'].x, torch.zeros(data['flow'].x.shape[0], 2)], dim=1)
dashboard_obj = Module6Dashboard(
    gnn_model=model,
    rl_agent=model_rl,
    pyg_graph=data,
    rl_output=output,
    predicted_kpis=predicted_kpis,
    flow_feature_names=flow_feature_names
)

gnn_attr = dashboard_obj.gnn_explain(feature_idx=0)
state_samples = []
state, _ = env.reset()
for i in range(len(output)):
    state_samples.append(state)
    action, _ = model_rl.predict(state)
    state, _, terminated, _, _ = env.step(action)
    if terminated:
        break
rl_shap = dashboard_obj.rl_explain(state_samples, nsamples=len(state_samples))

dashboard_report = dashboard_obj.generate_dashboard(
    gnn_attr=gnn_attr,
    rl_shap=rl_shap,
    top_features=3
)
print("\n=== KPI + RL Dashboard Report ===\n")
print("GNN Explanations:")
for msg in dashboard_report['gnn_explanations']:
    print(msg)
print("\nRL Explanations:")
for msg in dashboard_report['rl_explanations']:
    print(msg)
print("\nEvaluation Metrics:")
for k, v in dashboard_report['evaluation_metrics'].items():
    print(f"{k}: {v:.3f}")

In [None]:
import gradio as gr
import matplotlib.pyplot as plt
import networkx as nx
import json
import pandas as pd

# ------------------------
# Visualization functions
# ------------------------

def plot_topology(G, metrics):
    fig, ax = plt.subplots(figsize=(8,6))
    pos = nx.spring_layout(G)
    nx.draw(G, pos, with_labels=True, node_color='skyblue', edge_color='gray', ax=ax)
    ax.set_title("Module 2: Network Topology")
    return fig

def plot_kpi(predicted_kpis):
    fig, ax = plt.subplots(figsize=(8,4))
    flows = list(range(len(predicted_kpis['delay'])))
    ax.bar(flows, predicted_kpis['delay'], alpha=0.5, label='Delay')
    ax.bar(flows, predicted_kpis['jitter'], alpha=0.5, label='Jitter')
    ax.bar(flows, predicted_kpis['loss'], alpha=0.5, label='Loss')
    ax.set_xlabel("Flow ID"); ax.set_ylabel("KPI Value"); ax.set_title("Module 3: Predicted KPIs")
    ax.legend()
    return fig

def plot_rl_decisions(rl_output):
    fig, ax = plt.subplots(figsize=(8,4))
    admits = [s['admit'] for s in rl_output]
    allocations = [s['bandwidth_allocation'] for s in rl_output]
    ax.bar(range(len(rl_output)), allocations, color=['green' if a else 'red' for a in admits])
    ax.set_xlabel("Slice ID"); ax.set_ylabel("Bandwidth Allocation")
    ax.set_title("Module 4: RL Slice Admission & Allocation")
    return fig

def plot_slice_status(slice_status):
    fig, ax = plt.subplots(figsize=(8,4))
    delays = [s['current_metrics']['delay'] for s in slice_status]
    jitt = [s['current_metrics']['jitter'] for s in slice_status]
    losses = [s['current_metrics']['loss'] for s in slice_status]
    ax.plot(delays, label='Delay')
    ax.plot(jitt, label='Jitter')
    ax.plot(losses, label='Loss')
    ax.set_xlabel("Slice ID"); ax.set_ylabel("KPI Value")
    ax.set_title("Module 5: Slice Lifecycle Status")
    ax.legend()
    return fig

def show_dashboard_structured(dashboard):
    gnn_rows = []
    for i, msg in enumerate(dashboard['gnn_explanations']):
        gnn_rows.append({"Flow ID": i, "Explanation": msg})
    gnn_df = pd.DataFrame(gnn_rows)

    rl_rows = []
    for i, msg in enumerate(dashboard['rl_explanations']):
        rl_rows.append({"Decision ID": i, "Explanation": msg})
    rl_df = pd.DataFrame(rl_rows)
    metrics = dashboard['evaluation_metrics']
    metrics_md = "\n".join([f"**{k.replace('_',' ').title()}:** {v:.3f}" for k,v in metrics.items()])

    return gnn_df, rl_df, metrics_md


# ------------------------
# Gradio function
# ------------------------

def visualize_module(module_choice):
    if module_choice == "Module 2: Topology":
        return plot_topology(result['topology'], metrics)
    elif module_choice == "Module 3: KPI Prediction":
        return plot_kpi(predicted_kpis)
    elif module_choice == "Module 4: RL Decisions":
        return plot_rl_decisions(output)
    elif module_choice == "Module 5: Slice Status":
        return plot_slice_status(manager.get_status())
    elif module_choice == "Module 6: Dashboard":
        return show_dashboard(dashboard_report)

# ------------------------
# Gradio Interface
# ------------------------

module_options = [
    "Module 2: Topology",
    "Module 3: KPI Prediction",
    "Module 4: RL Decisions",
    "Module 5: Slice Status",
    "Module 6: Dashboard"
]

with gr.Blocks() as demo:
    dropdown = gr.Dropdown(module_options, label="Select Module")
    output_plot = gr.Plot()
    gnn_table = gr.Dataframe(headers=["Flow ID","Explanation"], interactive=False)
    rl_table = gr.Dataframe(headers=["Decision ID","Explanation"], interactive=False)
    metrics_md = gr.Markdown()


    def update(module_choice):
      if module_choice == "Module 6: Dashboard":
          gnn_df, rl_df, metrics_text = show_dashboard_structured(dashboard_report)
          return gr.update(visible=False), gr.update(value=gnn_df), gr.update(value=rl_df), gr.update(value=metrics_text)
      else:
          fig = visualize_module(module_choice)
          return gr.update(visible=True, value=fig), gr.update(value=pd.DataFrame()), gr.update(value=pd.DataFrame()), gr.update(value="")


    dropdown.change(
        fn=update,
        inputs=dropdown,
        outputs=[output_plot, gnn_table, rl_table, metrics_md]
    )

demo.launch()