Skip to content

Cache already downloaded HuggingFace shards.#3972

Open
niting wants to merge 1 commit into
AI-Hypercomputer:mainfrom
niting:conversion_perf
Open

Cache already downloaded HuggingFace shards.#3972
niting wants to merge 1 commit into
AI-Hypercomputer:mainfrom
niting:conversion_perf

Conversation

@niting
Copy link
Copy Markdown
Collaborator

@niting niting commented May 22, 2026

Description

Currently, shards seem to be redownloaded every time they are required causing slowdowns in conversion. Tried running the script with the changes and there's significant improvements.

Benchmark: 2-Layer Qwen3 MoE Checkpoint Conversion (Lazy Loading Enabled)

Metric Baseline (Cached) Optimized Speedup
Sharding (Materialization) 81.6s (1.36 min) 16.2s (0.27 min) 5.0x
Overall Elapse 83.4s (1.39 min) 17.4s (0.29 min) 4.8x

Integration Tests (tests/integration/checkpoint_conversion_test.py):

  • Baseline: 148.73s (2:28)
  • Optimized: 77.33s (1:17) -> 1.9x speedup overall (includes model download)

Tests

Ran the conversion script to confirm it works.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Currently, shards seem to be redownloaded every time they are required
causing slowdowns in conversion. Tried running the script with the
changes and there's significant improvements.

Benchmark: 2-Layer Qwen3 MoE Checkpoint Conversion (Lazy Loading Enabled)

| Metric                       | Baseline (Cached) | Optimized (Phase 1 Only) | Speedup  |
|------------------------------|-------------------|--------------------------|----------|
| Sharding (Materialization)   | 81.6s (1.36 min)  | 16.2s (0.27 min)         | **5.0x** |
| Overall Elapse               | 83.4s (1.39 min)  | 17.4s (0.29 min)         | **4.8x** |

Integration Tests (tests/integration/checkpoint_conversion_test.py):
- Baseline: 148.73s (2:28)
- Optimized: 77.33s (1:17) -> **1.9x speedup overall** (includes model download)
Copy link
Copy Markdown
Collaborator

@khatwanimohit khatwanimohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@niting niting assigned niting and unassigned niting May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants