-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Problem
The shipped compacted model gives garbage for general conversation and can't use our tools reliably. We need a model that:
- Passes real coding challenges (not trivial hello worlds)
- Uses OUR tool system correctly (code/edit, code/write, shell, etc.)
- Ships with Continuum — works out of the box, zero API keys
The pipeline
- Academy RealClassEval: 488 Python challenges (390 train, 98 eval). Current best: 53.1% Pass@1 with DeepSeek-Chat (cloud). Local model needs to match or exceed this.
- Tool-call LoRA: Fine-tune on successful tool invocation traces from CodingAgent sessions. Model learns OUR tool schema, not generic function calling.
- Sentinel coding pipelines: dev/build-feature, dev/fix-bug must work end-to-end with the shipped model.
Approach
- Base: Qwen2.5-Coder-14B (fits 16GB MacBook Air at Q5_K)
- LoRA training on: RealClassEval solutions + CodingAgent tool traces + Academy exam passes
- Eval: must pass >40% RealClassEval with tool system (not raw code generation)
- Ship as:
continuum-ai/qwen2.5-coder-14b-continuumon HuggingFace - Auto-discovered by CandleAdapter on first run
Success criteria
./jtag academy/start --mode=realclasseval --questionsPerExam=98passes >40% with LOCAL model- Sentinel
dev/fix-bugcompletes successfully on real repo bugs with LOCAL model - Zero API keys needed. Works on MacBook Air 16GB.
Dependencies
- P6: Tool calling reliability — parser-per-model-family #324 (tool calling parsers)
- P6B: Ship 14B compacted model, research 32B QAT #325 (ship 14B model)
- Local inference quality — compacted 14B model gives poor responses #321 (local inference quality)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels