Text-to-Polars code generation using Qwen2.5-Coder (MLX backend, 4-bit quantization).
- Model:
mlx-community/Qwen2.5-Coder-7B-Instruct-4bit(or 3B variant for speed) - Prompting: System instruction with Polars-specific syntax rules + 5 carefully chosen few-shot examples targeting common LLM failure modes (date handling, sort direction, membership tests, scalar extraction)
- Self-repair loop: If generated code throws an exception, the error is fed back to the model for one retry
- Output parsing: Strips markdown fences, special tokens (
<|im_end|>), and extracts executable expression
16/16 correct in ~39s (N/T ≈ 0.41)
Apple Silicon (MLX backend):
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-apple.txt
python data/make_data.pyLinux / CUDA (transformers + bitsandbytes backend):
python3 -m venv .venv
source .venv/bin/activate
# Install torch with the CUDA version matching your driver (example: cu121)
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
python data/make_data.pyBenchmark server (used by the platform runner):
bash start.sh
# or: uvicorn server:app --host 0.0.0.0 --port 9000The server exposes:
POST /chat— receives{question_id, message, schema, data_path?, data_b64?}, returns{question_id, response}GET /health— readiness probe
Local eval loop (development only):
python run.pyThe model backend is selected automatically: MLX on Apple Silicon, transformers (4-bit via bitsandbytes, float16 fallback) on Linux/CUDA.
server.py— FastAPI inference server (benchmark entrypoint)start.sh— Starts the server on port 8000src/model.py— Code generator (MLX on Apple Silicon, transformers on Linux/CUDA)src/prompt.py— System instruction + few-shot examplessrc/executor.py— Safe code execution with timeout and output cleanupsrc/evaluator.py— Eval loop with self-repair retryrun.py— Local evaluation script (development only)data/eval_set.json— Ground-truth test casesdata/make_data.py— Generates synthetic sales parquet
- Targeted few-shots — each example fixes a specific Polars footgun (
.dt.month()parens,.is_in()vs.isin(),descending=Truenotascending=True) - Explicit syntax rules in system prompt — cheaper than adding more few-shots
- Self-repair — catches transient generation errors with one retry
max_tokens=200— covers complex group-by chains without over-generating
- Developed on Apple Silicon (M-series).
src/model.pyauto-selects MLX on Apple Silicon andtransformers(4-bit via bitsandbytes, float16 fallback) on Linux/CUDA. - Linux target model:
Qwen/Qwen2.5-Coder-7B-Instruct(same model family, downloaded from HuggingFace Hub on first run).