<a href="https://colab.research.google.com/github/dp457/LLM-UTM-Adversarial-Attacks/blob/main/Aligning_LLM_UTM_Operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers>=4.42
!pip install datasets
!pip install peft
!pip install trl
!pip install accelerate
!pip install bitsandbytes
!pip install jsonschema

Collecting trl
  Downloading trl-0.22.2-py3-none-any.whl.metadata (11 kB)
Downloading trl-0.22.2-py3-none-any.whl (544 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m544.8/544.8 kB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: trl
Successfully installed trl-0.22.2
Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.47.0


In [2]:
from __future__ import annotations
import argparse, json, os, random
from functools import partial
from datasets import load_dataset
import torch

from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig

Now, we will design the prompt scaffolds for different types of UAVs in UTM.

**What the scaffold is doing??**

*  **System role anchoring.**

    "You are Patrol-LLM" vs "You are Delivery-LLM" pins distinct operating policies into the system message. It prevents goal-exchange (e.g., delivery logic creeping into patrol) and makes each model a specialized decision head.


*   **Goals = objective function (soft targets).**
    
    Each role lists what to optimize:

    Patrol: coverage, conflict reduction, geofence respect.

    Delivery: meet SLA, minimize conflicts, balance port queues, corridor compliance.
    
    These roles map to the reward terms or KPI deltas that the validator/simulator can score.

*   **TOOLS**.
    
    Enumeration of the legal actuators the LLM may call which narrows the action space and reduces hallucinated commands.

    * Patrol can grant_corridor (authority) while Delivery can only request_corridor (deference) — a clean command/permission split that prevents a delivery agent from unilaterally changing airspace structure.

    * **Throttle_flow** exists for both (shared knob for rate-limiting).

    * **resequence_patrol / reroute_flows / set_param** expose sequencing, routing, and parameter tuning without letting the model invent new APIs.

*   **CONSTRAINTS = hard invariants.**

    `min_sep = D`, `energy <= Emax`, `SLA >= S`, `geo-fences/corridors`.
    
    These are feasibility constraints the checker can enforce post-generation. Stating them in-prompt conditions the LLM to avoid proposing infeasible actions; the environment still hard-rejects if it must.


**Why two roles (Patrol vs Delivery)?**

  * Deconfliction by design.
  * Patrol owns global airspace shaping (no-go volumes, corridors, resequencing). Delivery owns local routing and service performance.
  * The asymmetry (“request” vs “grant”) provides a natural arbitration path and avoids deadlocks from symmetric agents issuing competing structural edits.

**How the placeholders fit?**

    `D`, `Emax`, `S` are environment-bound parameters. Bind them at runtime (e.g., via prompt templating) so the same policy runs across sectors, weather, battery chemistries, or SLAs without retraining.

In [3]:
# ------------------------
# Prompt scaffolds
# ------------------------
SYSTEM_PROMPT = {
    "Patrol": (
        "You are Patrol-LLM. Goals: maintain coverage, reduce conflicts, respect geo-fences.\n"
        "TOOLS: declare_no_go_volume, grant_corridor, resequence_patrol, throttle_flow.\n"
        "CONSTRAINTS: min_sep=D, energy<=Emax, SLA>=S.\n"
        "Output JSON ONLY that matches the schema: "
        '{"actions":[...], "rationale": "short sentence"}.'
    ),
    "Delivery": (
        "You are Delivery-LLM. Goals: meet SLA, minimize conflicts, balance port queues, respect corridors.\n"
        "TOOLS: reroute_flows, set_param, request_corridor, throttle_flow.\n"
        "CONSTRAINTS: min_sep=D, energy<=Emax, geo-fences, corridors.\n"
        "Output JSON ONLY that matches the schema: "
        '{"actions":[...], "rationale": "short sentence"}.'
    ),
}


1. **Pulls the Inputs**

    Here, model is supervised to reproduce `decision` (the ground-truth action JSON) when shown `state` under a given role.

2. **Compacts the state vector**

   * **Feature selection:** Only the fields that are predictive for decisions are kept. That trims tokens and reduces noise/leakage.

   * **Compact JSON (separators=(",",":"))**: Removes spaces to lower token count and make the input deterministic across examples (helps training stability and throughput).

   * If more signals are needed (weather, corridor loads), fields can be added without changing the I/O contract.

3. **User Message Building**

    * Clear instruction: “Produce only the decision JSON.” This keeps training targets clean and discourages extra prose.

    * Prefix `STATE`: makes it visually/semantically distinct from the system policy, reducing confusion for chat models.

4. **Selects the system policy by role**

    * Binds the example to Patrol-LLM or Delivery-LLM (different goals/tools/constraints).

    * Teaches the same base model to behave differently under different system frames (multi-persona SFT).

5. **Canonicalizes the target (assistant)**

6. **Uses the model's native chat template**
    * Why: Every chat model (Llama, Mistral, Qwen, etc.) has its own BOS/EOS tokens, role tokens, and spacing. `apply_chat_template` renders exactly what the base model expects, so SFT matches inference conditions.

    * `add_generation_prompt=False`: During SFT the full prompt is needed including the assistant’s target (teacher forcing). At inference it has to be `set True` to add the assistant header but without the target text.

7. **Fallback Formatting**

    * Provides a stable, readable structure if the tokenizer lacks a template.

    * Not ideal (tokens may not align with the pretraining chat style), but it preserves the system→user→assistant pattern.

**Why this structure works well?**

  * **Exact match training target:** The assistant side is pure JSON, matching the runtime parser schema, which minimizes post-processing rigour.

  * **Low-entropy outputs:** “Produce only the decision JSON” + compact target reduce variability, improving convergence and eval scoring (string-match or JSON-schema validation).

  * **Role-conditioned behavior:** One function covers both agents; the role selects a different system policy without needing two datasets or two trainers.

  * **Token-efficient:** Compact inputs/outputs let you fit more examples per batch or use larger contexts for richer states.

In [4]:
def format_example(role: str, example: dict, tokenizer) -> str:
    """Return a single-chat example string for SFT. Uses chat_template if available."""
    state = example["state"]
    decision = example["decision"]

    # Compact the state to keep tokens small (you can customize further)
    state_str = json.dumps({
        "t": state["t"],
        "queues": state["queues"],
        "conflicts": state["conflicts"],
        "hotspots": state["hotspots"],
        "uavs_sample": state["uavs_sample"],
        "min_pairwise_sep": state["min_pairwise_sep"]
    }, separators=(",",":"))

    user_msg = f"STATE:\n{state_str}\nProduce only the decision JSON."

    system_msg = SYSTEM_PROMPT[role]
    assistant_msg = json.dumps(decision, separators=(",",":"))  # ground-truth JSON

    if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
        msgs = [
            {"role":"system", "content": system_msg},
            {"role":"user", "content": user_msg},
            {"role":"assistant", "content": assistant_msg},
        ]
        return tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)
    else:
        # Fallback simple instruction formatting
        return f"<s>[SYSTEM]\n{system_msg}\n[/SYSTEM]\n[USER]\n{user_msg}\n[/USER]\n[ASSISTANT]\n{assistant_msg}</s>"

In [5]:
def make_formatting_func(role: str, tokenizer):
    def _fmt(example):
        return format_example(role, example, tokenizer)
    return _fmt

In [6]:
# Define the Main Parameters

data = "utm_normal_ops.jsonl"
base = "Qwen/Qwen2.5-1.5B-Instruct"
steps =  5000
lr = 1e-4
batch_size = 4
grad_accum = 4
max_seq_len = 1024
bnb_4bit = True # enable 4-bit loading (QLoRA)
seed = 42

In [7]:
torch.manual_seed(seed)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base, use_fast=True)
if tokenizer.pad_token is None:
  tokenizer.pad_token = tokenizer.eos_token
  tokenizer.padding_side = "right"

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

The following code loads a base causal-LM quantized to 4-bit bitsandbytes (which fits in much less VRAM), and it prepares for k-bit training.

This **PEFT utility** does a few important things to make training on a quantized model stable:

  * Casts certain **LayerNorms / output** heads to **float32** to avoid loss-of-precision in normalization.

  * Enables gradient settings compatible with quantized linear layers (so LoRA adapters can be trained on top).

  * Freezes base quantized weights.


The following framework “stabilizes” the mixed-precision graph preventing the explosion in LoRA training.

In [8]:
# Load base model (optionally 4-bit)
quant_config = None
if bnb_4bit and torch.cuda.is_available():  #bnb_4bit: switched to quantized loading inferrring in 4 bit
    quant_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4", # nf4 - a learned non-uniform 4-bit quantization that preserves distribution tails better than int4; typically the best accuracy for QLoRA.
            bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
        )
    model = AutoModelForCausalLM.from_pretrained(
        base,
        torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
        device_map="auto",
        quantization_config=quant_config,
    )
if bnb_4bit and torch.cuda.is_available():
        model = prepare_model_for_kbit_training(model)

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

In [9]:
# PEFT LoRA config (safe defaults for small models)
lora_config = LoraConfig(
        r=16, lora_alpha=32, lora_dropout=0.05, bias="none",
        target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]
    )

In [11]:
# ------------------------
# Load dataset and split by role
# ------------------------
ds_all = load_dataset("json", data_files = data, split="train")
ds_patrol   = ds_all.filter(lambda ex: ex["role"] == "Patrol")
ds_delivery = ds_all.filter(lambda ex: ex["role"] == "Delivery")

# Small validation split from the tail
val_frac = 0.05
nP = len(ds_patrol); nD = len(ds_delivery)
nP_val = max(1, int(val_frac*nP)); nD_val = max(1, int(val_frac*nD))

dsP_train = ds_patrol.select(range(0, nP - nP_val))
dsP_val   = ds_patrol.select(range(nP - nP_val, nP))
dsD_train = ds_delivery.select(range(0, nD - nD_val))
dsD_val   = ds_delivery.select(range(nD - nD_val, nD))

#------------------------
# Train Patrol adapter
# ------------------------
patrol_out = os.path.join("patrol-lora")
os.makedirs(patrol_out, exist_ok=True)
model_patrol = get_peft_model(model, lora_config)

FileNotFoundError: Unable to find '/content/utm_normal_ops.jsonl'