An Agentic Adaptive Model Selection Framework inspired by SafeRoute, dynamically routing prompts between lightweight safety classifiers and high-capacity LLM guardrails.
All official dataset benchmarks, routing groundtruth targets, and trained Multi-Layer Perceptron (MLP) neural router checkpoints are publicly hosted on Hugging Face Hub:
- 📊 Evaluation & Training Dataset: StevenMup2004/DynaGuard-Data
- Contains
train_router.json(~7+ training instances) andtest_router.json(1,086 benchmark samples).
- Contains
- 🧠 Neural Router Model Checkpoints: StevenMup2004/DynaRoute
- Contains trained PyTorch model weights (
model.pt) optimized via Focal Loss for causal adaptive model escalation.
- Contains trained PyTorch model weights (
As Large Language Models (LLMs) are deployed in agentic workflows, enforcing robust guardrails against jailbreaks, toxicity, and policy violations is paramount. However, production guardrails face a severe Performance vs. Cost trade-off:
- High-Capacity Models (
DynaGuard-8B): Achieve exceptional safety recognition accuracy but incur prohibitive inference costs and high latency. - Lightweight Models (
DynaGuard-1.7B): Provide near-instantaneous inference at a fraction of the cost but are vulnerable to sophisticated contextual bypasses.
DynaRoute resolves this dilemma. Inspired by the SafeRoute methodology, DynaRoute introduces an intelligent neural router that intercepts hidden feature embeddings (
Input Prompt + Dynamic Policy
|
v
Lightweight Guard (DynaGuard-1.7B)
|
+--- [Extract Hidden Features: h ∈ R^2048]
|
v
Neural Router MLP (2048 -> 1024 -> 512 -> 256 -> 1)
/ \
P <= 0.60 P > 0.60 (Decision Threshold)
/ \
[Easy / Safe] [Hard / Unsafe]
| |
v v
Fast Verdict Heavy Guard (DynaGuard-8B)
(Cost = 1.7B) |
v
Final Verdict
(Cost = 8B)
To train the neural router without biased heuristics, we define an ideal Oracle Groundtruth Assignment (
-
$y = 0$ (Easy Sample): IfDynaGuard-1.7Bcorrectly predicts the safety groundtruth label ($Pred_{small} = GT$ ). Calling the small model is sufficient; zero additional inference cost is incurred. -
$y = 1$ (Hard Sample): IfDynaGuard-1.7Bfails ($Pred_{small} \neq GT$ ), butDynaGuard-8Bsucceeds ($Pred_{large} = GT$ ). The router is forced to escalate the query to the large model.
-
Architecture: A 5-layer deep Multi-Layer Perceptron (
2048 -> 1024 -> 512 -> 256 -> 1) withBatchNorm1d,GELUactivations, and regularizedDropout(0.3). -
Focal Loss: Severe class imbalance (easy samples vastly outnumber hard samples) is handled via Focal Loss (
$\alpha=0.75, \gamma=2.0$ ), forcing the gradient optimization to focus heavily on misclassified hard instances. -
Decision Threshold: Fixed globally at
FIXED_THRESHOLD = 0.60to maintain deterministic evaluation across all splits.
Guardrail/fix-ver/
├── safe-route/
│ ├── models.py # BNN / MLP Router neural network definitions
│ ├── eval_end_to_end_safety.py # E2E causal evaluation pipeline (Calculates F1 & Breakdown)
│ ├── eval_all_splits.py # Validation / Test split evaluation script
│ ├── valid_router.json # Router metadata and split tags
│ └── Plan.MD # Detailed executive presentation slide deck plan
├── DynaGuard/
│ ├── eval.py # Downstream safety test harness
│ └── ... # Feature extraction logs & hidden state buffers
└── .gitignore # Production gitignore excluding large weights & JSONs
All evaluation scripts have been upgraded with dynamic base directory resolution (BASE_DIR), enabling seamless execution from any working directory.
# Execute pipeline verification across Original, Augmented, Test, and Valid splits
python safe-route/eval_end_to_end_safety.py# Evaluate router classification precision, recall, and AUC
python safe-route/eval_all_splits.py-
Lightweight Pre-Routing Classifier: Integrate a fast text-based Domain Classifier (
$R_{domain}$ ). Incoming queries classified as VN-native are routed directly to1.7B; international or translated prompts trigger theDynaRoute MLP. - Dynamic Threshold Tuning: Enable runtime sliding thresholds based on server load and SLA budget constraints.