Skip to content

Centralize Turbo export/push config, make quantization defaults deterministic, and prepare v2.1 pre-release#25

Merged
codewithdark-git merged 6 commits intomainfrom
copilot/centralize-config-export-push-parameters
Apr 25, 2026
Merged

Centralize Turbo export/push config, make quantization defaults deterministic, and prepare v2.1 pre-release#25
codewithdark-git merged 6 commits intomainfrom
copilot/centralize-config-export-push-parameters

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 24, 2026

Export/push behavior required repeating format/quantization across multiple calls, which could diverge (export() vs push()) and produce inconsistent outputs. This change introduces a single Turbo-level config for downstream export/push behavior with explicit defaults, format-safe quantization behavior, and v2.1 pre-release documentation/version updates.

  • Shared Turbo-level export/push config

    • Added config support to turbo(...) and TurboModel.from_pretrained(...).
    • Introduced centralized defaults in TurboModel:
      • format: "safetensors"
      • push_format: "safetensors"
      • quantization: "Q4_K_M"
      • push_quantization: None
    • Added internal resolver logic to normalize and store this config once at initialization.
    • Supports nullable overrides for push_quantization (explicit None is preserved).
  • export() now consumes internal config

    • format is optional; when omitted, uses shared config.
    • GGUF quantization defaults come from shared config (Q4_K_M) unless explicitly overridden.
    • Explicit format / quantization args still take precedence.
  • push() now consumes internal config with format-safe quantization

    • format is optional; when omitted, uses push_format.
    • push_quantization is no longer always-on by default.
    • GGUF push still resolves quantization deterministically (falls back to shared GGUF quantization).
    • ONNX/MLX push no longer gets forced GGUF-style quantization when not configured.
  • Docs + examples updated for new usage

    • Updated API/docs/examples to show configuring once at Turbo init, then calling export() / push() without repeating format/quantization in common GGUF flows.
    • Refreshed old usage snippets where the new shared-config API is preferred.
  • Pre-release version update

    • Updated project/docs versioning and visible version strings to v2.1.0rc1 for pre-release preparation.
  • Focused coverage for config propagation

    • Added tests validating:
      • deterministic default resolution,
      • export/push default propagation from shared config,
      • synchronization behavior when only export-side values are provided,
      • nullable push_quantization override behavior,
      • ONNX push does not force quantization by default,
      • portable temp staging paths via tmp_path.
from quantllm import turbo

model = turbo(
    "meta-llama/Llama-3.2-3B",
    config={
        "format": "gguf",
        "quantization": "Q4_K_M",
        "push_format": "gguf",
    },
)

model.export()               # uses gguf + Q4_K_M from shared config
model.push("user/my-model")  # uses gguf defaults from shared config

Copilot AI and others added 2 commits April 24, 2026 20:23
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/aa78d528-be1d-4467-813d-711a55ade22a

Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/aa78d528-be1d-4467-813d-711a55ade22a

Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Copilot AI changed the title [WIP] Centralize config for export/push parameters and improve quantization default handling Centralize Turbo export/push config and make quantization defaults deterministic Apr 24, 2026
Copilot AI requested a review from codewithdark-git April 24, 2026 20:29
Copy link
Copy Markdown
Owner

@codewithdark-git codewithdark-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot The changes is correct but we need to update the all Documentation and were that old code use update it to new One API and also pre-release to v2.1 .

@codewithdark-git codewithdark-git marked this pull request as ready for review April 25, 2026 06:11
Copilot AI review requested due to automatic review settings April 25, 2026 06:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes export/push configuration at the Turbo/TurboModel level to avoid per-call duplication and to make default export/push behavior deterministic across export() and push().

Changes:

  • Added shared config support to turbo(...) and TurboModel.from_pretrained(...), resolved once into TurboModel.export_push_config.
  • Updated export() and push_to_hub() to consume the resolved internal config when per-call args are omitted.
  • Added tests plus documentation/examples updates to reflect configuring export/push defaults once at initialization.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
quantllm/core/turbo_model.py Introduces default export/push config, resolver, and uses it to default export()/push_to_hub() behavior.
tests/test_export_push_config.py Adds coverage for deterministic config resolution and propagation into export()/push().
docs/guide/hub-integration.md Updates hub examples to show one-time config at turbo(...) init.
docs/api/turbo.md Documents the new config parameter on turbo(...).
README.md Updates quickstart snippet to use shared config + argument-free export()/push().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread quantllm/core/turbo_model.py
Comment thread quantllm/core/turbo_model.py Outdated
Comment thread quantllm/core/turbo_model.py Outdated
Comment thread tests/test_export_push_config.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@codewithdark-git
Copy link
Copy Markdown
Owner

@copilot apply changes based on the comments in this thread

Copilot AI and others added 2 commits April 25, 2026 06:29
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/4f6f0bc8-c954-444c-afa4-6244c9cf4865

Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot The changes is correct but we need to update the all Documentation and were that old code use update it to new One API and also pre-release to v2.1 .

Done in 1aa2f52 (and follow-up 1c63330). I updated documentation/examples to the new shared config API usage where applicable and bumped the project/docs pre-release versioning to v2.1.0rc1.

@copilot apply changes based on the comments in this thread

Applied the review-thread fixes in 1aa2f52 and 1c63330: ONNX/MLX push no longer gets forced GGUF quantization defaults, nullable push_quantization override is supported, test staging paths now use tmp_path, and added explicit test coverage for nullable override behavior.

Copilot AI changed the title Centralize Turbo export/push config and make quantization defaults deterministic Centralize Turbo export/push config, make quantization defaults deterministic, and prepare v2.1 pre-release Apr 25, 2026
Copilot AI requested a review from codewithdark-git April 25, 2026 06:31
@codewithdark-git codewithdark-git merged commit 4828488 into main Apr 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Centralize config for export/push parameters and improve quantization default handling

3 participants