Skip to content

More complete DMR support#103

Merged
krissetto merged 1 commit intodocker:mainfrom
krissetto:better-dmr
Sep 4, 2025
Merged

More complete DMR support#103
krissetto merged 1 commit intodocker:mainfrom
krissetto:better-dmr

Conversation

@krissetto
Copy link
Copy Markdown
Contributor

With this PR the dmr provider can now support:

  • Proper context length setup using max_tokens
  • temperature, top_p, frequency_penalty, presence_penalty all get mapped into the proper runtime flags based on the engine in use (for now only llama.cpp mappings);
  • Raw runtime flags to pass to the inference engine in use, via provider_opts:runtime_flags

Configuration example supported with these changes:

models:
  root:
    provider: dmr
    model: ai/qwen3:14B-Q6_K
    max_tokens: 32768
    temperature: 0.7
    top_p: 0.95
    frequency_penalty: 0.2
    presence_penalty: 0.1
    provider_opts:
      runtime_flags: |
        --batch-size 1024
        --ubatch-size 512

closes #71

Supports most top level models: configurations, proper context size config with max_tokens, and defining engine specific runtime flags manually via `provider_opts:runtime_flags`

references docker#71

Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
@krissetto krissetto requested a review from rumpl September 4, 2025 16:12
@krissetto krissetto merged commit 9239963 into docker:main Sep 4, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] - Better DMR integration

2 participants