This repository releases the open-source artifact for Seraph, a system for LLM-assisted network intent processing. The current public release focuses on the intent comprehension module, including datasets, prompts, topology assets, evaluation scripts, and the expert review workflow used in our study. Code & data for Conflict detection & resolution and Deployment optimization will come soon.
The current release includes:
intent dataset/: natural-language intents and expected IR annotationsnetwork/: synthetic network topologies and generation scriptssnmt/: semantics-network mapping tables for each topologyprompts/: prompt templates used in the reported experimentssrc/seraph_intent_comprehension/: implementation of the intent comprehension workflowscripts/: command-line entry points for evaluation, automatic checking, expert review preparation, and feedback
Use Python 3.9 or above.
python3 -m venv .venv
source .venv/bin/activate
python -m pip install .Seraph intent comprehension is designed to run with a standard OpenAI-compatible chat completion API.
Before running experiments, copy .env.example to .env or export the variables in your shell and fill in your own credentials:
export SERAPH_API_KEY="your_key_here"
export SERAPH_API_BASE_URL="https://your-openai-compatible-endpoint/v1"Optional settings:
SERAPH_API_KEYS: comma-separated API keys for key rotationSERAPH_TIMEOUT_SECONDSSERAPH_MAX_RETRIESSERAPH_LOG_LEVEL
No private or personal API credentials are included in this repository.
repository-root/
├── intent dataset/
├── network/
├── prompts/
├── scripts/
├── snmt/
└── src/seraph_intent_comprehension/
The evaluation code automatically looks for datasets under intent dataset/, and also accepts intent datasets/ or dataset/ if users rename the folder locally.
List the built-in dataset mappings:
python scripts/run_eval.py --model gpt-4o --mode basic --list-datasetsRun one dataset with the basic prompt:
python scripts/run_eval.py \
--model gpt-4o \
--mode basic \
--topology campus_net \
--task intent \
--experiment-name paper_basicRun the iterative variant for the extreme topology:
python scripts/run_eval.py \
--model gpt-4o \
--mode iterative \
--topology extreme \
--task protect \
--experiment-name paper_iterativeRun the baseline variant:
python scripts/run_eval.py \
--model gpt-4o-mini \
--mode baseline \
--topology cloud_net \
--task intent \
--experiment-name baselineRun all registered datasets in one command:
python scripts/run_eval.py \
--model gpt-4o \
--mode basic \
--all-datasets \
--experiment-name full_runRun a prompt ablation by overriding the prompt file:
python scripts/run_eval.py \
--model gpt-4o \
--mode basic \
--all-datasets \
--prompt-file prompt_without_examples.txt \
--experiment-name ablation_without_examplesOutputs are written to outputs/<experiment>/<mode>/<model>/<topology>/.
The evaluation workflow for the intent comprehension module is:
- Run the model and generate output Excel files under
outputs/. - Run the automatic structural checker if desired.
- Prepare files for expert review.
- Ask domain experts to manually annotate the generated outputs.
- Optionally run the expert feedback loop on incorrect cases.
python scripts/score_results.py --target outputsThis command writes:
CorrectnessError Analysis
These columns are generated automatically by comparing the parsed IR structure against the expected IR. This check is useful for large-scale screening, but it does not replace expert judgment.
python scripts/prepare_manual_review.py --target outputsThis command adds the following columns when they are missing:
Expert CorrectnessExpert Error TypeExpert Notes
Reviewers can then open the output Excel files and manually annotate the generated IRs.
After expert review, run the interactive feedback loop on a scored or manually reviewed output file:
python scripts/run_feedback.py \
--model gpt-4o-mini \
--topology cloud_net \
--task intent \
--result-file outputs/baseline/baseline/gpt-4o-mini/cloud_net/intent_cloud_results.xlsx \
--feedback-prompt-mode baselineThe feedback script:
- prefers
Expert Correctnesswhen manual annotations are available - falls back to
Correctnessif expert annotations are not present - records each session in a separate
*_feedback.xlsxfile - keeps the original evaluation output unchanged
Conflict detection & resolution: code and data coming soon.Deployment optimization: code and data coming soon.