Skip to content

Ship a LoRA-tuned local model that passes coding challenges via our tool system #344

@joelteply

Description

@joelteply

Problem

The shipped compacted model gives garbage for general conversation and can't use our tools reliably. We need a model that:

  1. Passes real coding challenges (not trivial hello worlds)
  2. Uses OUR tool system correctly (code/edit, code/write, shell, etc.)
  3. Ships with Continuum — works out of the box, zero API keys

The pipeline

  1. Academy RealClassEval: 488 Python challenges (390 train, 98 eval). Current best: 53.1% Pass@1 with DeepSeek-Chat (cloud). Local model needs to match or exceed this.
  2. Tool-call LoRA: Fine-tune on successful tool invocation traces from CodingAgent sessions. Model learns OUR tool schema, not generic function calling.
  3. Sentinel coding pipelines: dev/build-feature, dev/fix-bug must work end-to-end with the shipped model.

Approach

  • Base: Qwen2.5-Coder-14B (fits 16GB MacBook Air at Q5_K)
  • LoRA training on: RealClassEval solutions + CodingAgent tool traces + Academy exam passes
  • Eval: must pass >40% RealClassEval with tool system (not raw code generation)
  • Ship as: continuum-ai/qwen2.5-coder-14b-continuum on HuggingFace
  • Auto-discovered by CandleAdapter on first run

Success criteria

  • ./jtag academy/start --mode=realclasseval --questionsPerExam=98 passes >40% with LOCAL model
  • Sentinel dev/fix-bug completes successfully on real repo bugs with LOCAL model
  • Zero API keys needed. Works on MacBook Air 16GB.

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions