Bug
When using dlgo run in interactive mode (piping input), the --max-tokens and --temp flags appear to be ignored.
Reproduction
echo "Explain what a compiler does in one paragraph." | \
dlgo run tinyllama-1.1b-chat-v1.0.Q4_0.gguf --temp 0 --max-tokens 64 --no-stream
Expected: Temperature 0.0 and generation stops at 64 tokens.
Actual:
- Banner shows
temp=0.70 despite --temp 0
- Generated 246 tokens despite
--max-tokens 64
Model: llama
Params: 22 layers, 2048 dim, 32 heads, vocab 32000
Context: 2048 tokens
Backend: CPU (4 threads)
Sampling: temp=0.70 top-k=40 top-p=0.90 <-- should be temp=0.00
>>> A compiler is a software tool that translates ...
9.1 tok/s | 246 tokens | 37.4s <-- should stop at 64
Across multiple runs with --max-tokens 128, dlgo generated 51, 82, and 102 tokens — variable counts, none matching the requested 128.
Environment
- dlgo built from
main (commit around 2026-03-19)
- Linux 6.6.87 (WSL2), Go 1.26.0
- Model: TinyLlama 1.1B Chat v1.0 Q4_0
Bug
When using
dlgo runin interactive mode (piping input), the--max-tokensand--tempflags appear to be ignored.Reproduction
Expected: Temperature 0.0 and generation stops at 64 tokens.
Actual:
temp=0.70despite--temp 0--max-tokens 64Across multiple runs with
--max-tokens 128, dlgo generated 51, 82, and 102 tokens — variable counts, none matching the requested 128.Environment
main(commit around 2026-03-19)