Skip to content

Restore default hw_decoder_load to 0.65 in imgcodec decoder#6366

Merged
jantonguirao merged 1 commit into
mainfrom
janton/restore-hw-decoder-load-default-065
May 23, 2026
Merged

Restore default hw_decoder_load to 0.65 in imgcodec decoder#6366
jantonguirao merged 1 commit into
mainfrom
janton/restore-hw-decoder-load-default-065

Conversation

@jantonguirao
Copy link
Copy Markdown
Collaborator

@jantonguirao jantonguirao commented May 23, 2026

Summary

  • Change the default value of the hw_decoder_load argument in the nvImageCodec-based image decoder (dali/operators/imgcodec/decoder_schema.cc) from 0.90f back to 0.65f, matching the legacy decoder's default.

Motivation

The legacy decoder (dali/operators/decoder/image_decoder.cc) defaults hw_decoder_load to 0.65. The new nvImageCodec-based decoder ships with 0.90, which causes a meaningful behavior change for users who relied on the default value.

Nsys profiling of EfficientNet + AutoAugment ImageNet training (single-GPU, batch 256) on the two defaults shows:

legacy default (0.65) new default (0.90)
Measured HW path share 64.8 % 90.2 %
Hybrid (CPU+GPU) share 34.6 % 9.8 %
Decoder p50 15.6 ms 17.7 ms
Decoder p99 24.5 ms 34.3 ms
Decoder max 30.8 ms 72.7 ms
End-to-end iters/s 18.2 18.6 (+1.8 %)

Users who want the higher HW-routing fraction can still opt in explicitly via the operator argument.

Test plan

  • Existing decoder tests pass
  • internal_tools/hw_decoder_bench.py shows no regression at default settings

The nvImageCodec-based image decoder (dali/operators/imgcodec) sets the
default `hw_decoder_load` to 0.90, whereas the legacy decoder
(dali/operators/decoder) used 0.65. Profiling on
EfficientNet+AutoAugment ImageNet training showed that the higher
default increases p99 decoder latency (24.5 ms -> 34.3 ms) and the
worst-case batch decode (30.8 ms -> 72.7 ms), driven by synchronous
`cuMemFree_v2` calls inside nvjpeg that scale with HW-engine pressure.
Aggregate throughput is unchanged on that workload.

Revert the default to 0.65 to match the legacy behavior and restore
decoder determinism. Users who want the prior 0.90 behavior can opt in
explicitly via the operator argument.

Signed-off-by: Joaquin Anton Guirao <janton@nvidia.com>
Copilot AI review requested due to automatic review settings May 23, 2026 14:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores the nvImageCodec-based image decoder’s default hardware JPEG routing (hw_decoder_load) to match the historical/legacy behavior, reducing unintended behavior changes for users relying on defaults.

Changes:

  • Change default hw_decoder_load from 0.90f to 0.65f in the imgcodec decoder schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 91 to +93
the DALI pipeline and should be found empirically. More details can be found at
https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100)code",
0.90f)
0.65f)
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 23, 2026

Greptile Summary

Restores the hw_decoder_load default in the nvImageCodec-based decoder (decoder_schema.cc) from 0.90f back to 0.65f, matching the legacy decoder's long-standing default. The motivation is well-documented: profiling shows that 0.90 increases worst-case decode latency significantly (30.8 ms → 72.7 ms max) due to synchronous cuMemFree_v2 calls under high HW-engine pressure, while end-to-end throughput is essentially unchanged.

  • One-line change in decoder_schema.cc; the copyright header (2023-2026) and argument docstring are already correct and require no update.
  • No documentation, changelog, or test files reference the 0.90 value, so no secondary cleanup is needed.

Confidence Score: 5/5

Safe to merge — restores a well-understood default to align with the legacy decoder, backed by profiling data showing measurably better worst-case latency.

The change is a single-constant restoration with strong empirical justification. The legacy decoder, all tests, and all documentation are already consistent with 0.65; no secondary files need updating. No correctness, memory, or API contract concerns arise from the change.

No files require special attention.

Important Files Changed

Filename Overview
dali/operators/imgcodec/decoder_schema.cc Single-line change restoring hw_decoder_load default from 0.90f to 0.65f, matching the legacy decoder; copyright year already includes 2026.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Image Batch] --> B{hw_decoder_load routing}
    B -->|"≤ 0.65 fraction (default)"| C[HW JPEG Decoder\nNVIDIA A100 engine]
    B -->|"> 0.65 fraction"| D[Hybrid CPU+GPU Decoder\nnvJPEG software path]
    C --> E[Decoded Output]
    D --> E
    style B fill:#f0f4ff,stroke:#3366cc
    style C fill:#d4edda,stroke:#28a745
    style D fill:#fff3cd,stroke:#ffc107
Loading

Reviews (1): Last reviewed commit: "Restore default hw_decoder_load to 0.65 ..." | Re-trigger Greptile

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52351570]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52351570]: BUILD PASSED

@jantonguirao jantonguirao merged commit fc59d42 into main May 23, 2026
7 checks passed
@JanuszL JanuszL self-assigned this May 23, 2026
@JanuszL JanuszL deleted the janton/restore-hw-decoder-load-default-065 branch May 23, 2026 17:26
@JanuszL JanuszL self-requested a review May 23, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants