Model v4 — Diverse Augmentation
Training data now uses real-world package names from PyPI (800+), npm (1300+), crates.io (500+), plus curated docker images, repo names, and system packages. No more myapp hallucinations.
Improvements
pytohn→python— previously hallucinatedmyapp:v1.0, now correctrm mydir/→rm -rf mydir/— previously garbled, now correctpip installon PEP 668 — now suggestsuvx,pipx, and venv creationdocker pspermission — now suggests bothsudoandusermod -aG docker- All v3 fixes retained (clean EOS stopping, multi-alt where appropriate)
Stats
- 60K training examples with 2600+ unique package/project names
- No single placeholder exceeds 0.1% of training data (was 7% for
myapp) - Train loss: 0.099, eval loss: 0.068