This repository is the working codebase for a simple claim:
narrow specialist models become far more controllable when they operate over explicit structured state and emit typed actions instead of free-form text.
The repo currently has two active threads:
sst-code: structured code-edit specialists, with Rust as the main implemented packagesst-world-robotics: structured coordination specialists over multi-robot world state
The current research direction is increasingly centered on Canonical Entity IDs (CEIDs): a model-facing identity layer that rewrites runtime entities into compact, typed, reversible aliases so specialists can ground actions more reliably.
At a high level, the system shape is:
- A router or wrapper interprets the incoming request.
- A state builder constructs canonical structured state.
- A task-local view is extracted for the target specialist.
- The specialist predicts one typed action.
- A validator checks the action for schema and grounding.
- The resolved action is applied back to the runtime state.
This repository is both implementation and research workspace for that workflow.
The strongest current result is in packages/sst-world-robotics.
On the harder robotics hard_v2 candidate-pressure slice, the current 250-example evaluation progression is:
| Variant | Valid JSON | Valid Schema | Exact Op | Exact Match |
|---|---|---|---|---|
| Baseline specialist | 1.000 | 0.992 | 1.000 | 0.616 |
| Baseline + CEIDs and ranked hints | 1.000 | 0.856 | 1.000 | 0.660 |
| Prior row + consistent hint rewriting | 1.000 | 0.980 | 1.000 | 0.968 |
| Prior row + parser and action normalization | 1.000 | 0.992 | 1.000 | 0.980 |
The main lesson is no longer “small specialists can work.” That is already established. The current lesson is:
- indirect entity references are the main hard problem
- reference consistency matters materially
- CEIDs plus consistent alias rewriting largely solve the grounding collapse
- docs/ — notes, writeups, and supporting documentation
- schemas/ — shared schemas, including CEID-related schema work
- configs/ — registry and configuration stubs
- router.py — simple router example
- wrapper_runner.py — wrapper dispatch harness
- papers/ceid-paper/ — Research paper focusing on Canonical Entity IDs for Grounded Action Generation within the broader structured SSTT framework.
Code-oriented specialist family container.
Main package:
- packages/sst-code/rust/ — Rust structured-edit specialist
Useful entry points:
- packages/sst-code/README.md
- packages/sst-code/docs/architecture.md
- packages/sst-code/docs/specialist_contract.md
Robotics coordination specialist family.
Main capabilities:
- synthetic dataset generation
- canonical world-state construction
- typed coordination actions
- training and validation scripts
- CEID aliasing and identity rewriting
- ranked grounding hints for candidate-pressure settings
Useful entry points:
- packages/sst-world-robotics/README.md
- packages/sst-world-robotics/src/
- packages/sst-world-robotics/training/TRAINING.md
- packages/sst-world-robotics/evals/
The Rust track is the earlier structured specialist line. It already has:
- trained checkpoints
- wrapper/routing support
- validator-backed evaluation
- result summarization scripts
Useful locations:
The robotics package is the current center of gravity.
It now contains:
- frozen easy slices
- harder
hard_v2candidate-pressure slices - CEID alias generation and reverse mapping
- training/eval workflows on Modal
- saved eval artifacts and spot-check examples
Useful artifacts:
- packages/sst-world-robotics/evals/
- packages/sst-world-robotics/evals/spotcheck_examples/
- packages/sst-world-robotics/training/identity.py
python router.py --example rust
python router.py --example gamepython wrapper_runner.py \
"Remove the unused helper function from the Rust project" \
--project-id generated_lib \
--domain-hint code \
--language-hint rust \
--model qwen3.5:cloud \
--dispatch-localpython wrapper_runner.py dummy \
--tasks-file examples/wrapper_calls/wrapped_rust_tasks.jsonl \
--model qwen3.5:cloud \
--dispatch-local \
--output wrapped_rust_results.jsonlcd packages/sst-world-robotics
node --import tsx src/export-dataset.ts ./data 10000 hard_v2cd packages/sst-world-robotics
modal run training/modal_train.py --use-identity-schema --no-resume --num-epochs 1
modal run training/modal_validate.py --use-identity-schema --max-examples 250 --progress-every 50cd packages/sst-code/rust
python scripts/summarize_results.py training/eval-results.jsonThe repo ignores local generated artifacts such as:
packages/sst-world-robotics/data/packages/sst-world-robotics/trained-model*/- local eval JSON outputs
Those are intentionally reproducible and should not normally be committed.
The current draft is here:
This paper focuses on Canonical Entity IDs for Grounded Action Generation within the broader structured SSTT framework.
