Skip to content

Limit CPU video decoder codec support#6352

Open
JanuszL wants to merge 1 commit into
NVIDIA:mainfrom
JanuszL:remove_cpu_codecs
Open

Limit CPU video decoder codec support#6352
JanuszL wants to merge 1 commit into
NVIDIA:mainfrom
JanuszL:remove_cpu_codecs

Conversation

@JanuszL
Copy link
Copy Markdown
Contributor

@JanuszL JanuszL commented May 18, 2026

Limit CPU video decoder codec support

Restrict the CPU frames decoder to codecs supported by the currently
compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant while VP8, VP9, and MJPEG remain
enabled.

Make ReadRegularFrame mark end-of-stream by setting next_frame_idx_
to -1 when the index reaches NumFrames(), mirroring the existing
guard in ReadFlushFrame. Without this, codecs with no decoder latency
(VP9 on the new test inputs) deliver the final frame via the regular
path, leaving next_frame_idx_ at NumFrames() and causing
VideoInput depletion to be reported one batch late.

Reset the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.

Update video decoder tests to expect CPU failures for unsupported
codecs instead of skipping only MPEG4. Use VP9 CFR/VFR test inputs
and device-less CPU pipelines where appropriate. Point the CFR/VFR
reference frame folders at `frames_{1,2}_vp9/` so CPU decode of the
new VP9 fixtures matches at the existing eps=10 tolerance. Drop the
CPU HEVC frames-decoder tests (`ConstantFrameRateHevc`,
`VariableFrameRateHevc`, `VariableFrameRateHevcNoIndex`) — HEVC is
no longer in the CPU codec allow-list.

Tolerate up to 16 isolated subpixel deviations exceeding eps in
TestVideo::CompareFrame (out of ~2.7M subpixels per frame). The CPU
VP9 decode path occasionally produces a single byte that differs by
~32 — a SIMD glitch inside libavcodec/sws_scale that Valgrind cannot
instrument. The budget is orders of magnitude below what any genuine
regression would produce, so test sensitivity is preserved.

In dali/test/python/input/test_video.py, filter out h264 from the
round-robin fixture (the unsuffixed test_{1,2}.mp4 in cfr//vfr/
are h264) and restrict test_video_input_audio_stream to the mixed
backend — the only DALI_extra video with an audio stream is h264.

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Restricts the CPU video frames decoder to the codecs supported by the
currently compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant, while VP8, VP9, and MJPEG remain enabled.

ReadRegularFrame now mirrors ReadFlushFrame and signals end-of-stream
by setting next_frame_idx_ to -1 once NumFrames() is reached, so
codecs with no decoder latency report depletion immediately instead of
one batch late.

Resets the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.

The video decoder tests now expect CPU failures for unsupported codecs
instead of skipping only MPEG4. The affected CFR/VFR test inputs are
switched to VP9 variants, and CPU pipelines use device_id=None where
appropriate. The CFR/VFR reference frame folders are repointed at the
new VP9-derived frames_{1,2}_vp9/ so CPU decode matches at the
existing eps=10 tolerance. CPU HEVC frames-decoder tests are removed.

TestVideo::CompareFrame now tolerates up to 16 isolated subpixel
deviations exceeding eps per frame (out of ~2.7M). The CPU VP9 decode
path occasionally produces a single byte that differs by ~32 — a SIMD
glitch inside libavcodec/sws_scale that Valgrind cannot instrument.
The budget is orders of magnitude below any genuine regression, so
test sensitivity is preserved.

dali/test/python/input/test_video.py filters out h264 from the
round-robin fixture and restricts test_video_input_audio_stream to
the mixed backend — the only DALI_extra video with an audio stream is
h264, which CPU can no longer decode.

Additional information:

Affected modules and functionalities:

  • CPU video frames decoder: codec allow-list and ReadRegularFrame
    end-of-stream signalling.
  • Video frames decoder seek path (reset when seek target is out of
    range).
  • Video decoder gtests: reference frame paths, dropped CPU HEVC cases,
    relaxed CompareFrame tolerance.
  • dali/test/python/input/test_video.py: h264 fixture filter and
    audio-stream test backend restriction.
  • DALI deps and DALI_extra version pins.

Key points relevant for the review:

  • DALI_DEPS_VERSION and DALI_EXTRA_VERSION are temporary ToDo
    placeholders until the corresponding dali_deps and dali_extra
    repository changes merge.
  • CPU unsupported-codec behavior is now tested as an expected failure
    instead of being skipped for a subset of codecs.
  • The ReadRegularFrame EOS guard is required for VideoInput
    depletion to fire on the right batch with VP9 inputs (h264 hid the
    off-by-one through its decoder latency, which routed the tail
    through ReadFlushFrame).
  • The 16-subpixel tolerance in CompareFrame is a flake mitigation,
    not a tolerance loosening: a real codec/colorspace bug would touch
    thousands of subpixels.

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Not run locally.

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

NVIDIA/DALI_extra#135 & NVIDIA/DALI_deps#162 are related to this change.

@dali-automaton
Copy link
Copy Markdown

CI MESSAGE: [51666287]: BUILD STARTED

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR restricts the CPU video decoder to VP8, VP9, and MJPEG codecs (removing H264 and HEVC), fixes an off-by-one EOS signalling bug in ReadRegularFrame, and adds a decoder reset when a seek target falls outside the valid frame range.

  • Codec allow-list narrowed (frames_decoder_cpu.cc): std::array size corrected from 7 to 3 matching the 3 active entries; H264 and HEVC moved to the commented-out section.
  • EOS fix (ReadRegularFrame): next_frame_idx_ is now set to -1 when it reaches NumFrames(), mirroring the existing guard in ReadFlushFrame and ensuring VP9 (no decoder latency) signals depletion on the correct batch.
  • Test suite updated: VP9 CFR/VFR fixtures replace H264 fixtures across C++ and Python tests; CPU pipelines use device_id=None; unsupported-codec tests now assert failure instead of skipping; a 16-subpixel tolerance budget is introduced in TestVideo::CompareFrame to absorb an isolated SIMD glitch in the libavcodec/sws_scale CPU VP9 path.

Confidence Score: 4/5

The core decoder logic changes are sound, but two unresolved defects from earlier review rounds remain: NumFrames() is called unconditionally in the new ReadRegularFrame EOS guard (can exhaust the demuxer packet stream on index-less decode paths), and the test_video_index_reuse test no longer exercises the index-reuse scenario it claims to test.

The EOS fix and codec allow-list narrowing are correct. The seek-reset guard in SeekFrame is also correct and the post-reset assert holds. However, the unconditional NumFrames() call on every decoded frame in ReadRegularFrame can silently trigger ParseNumFrames() when no index exists, exhausting remaining demuxer packets and dropping all subsequent frames — a real data-loss path for index-less VP9 streams. The test_video_index_reuse first pipeline is now garbage-collected before being built, so no index files are written and the test no longer validates what its comment describes.

dali/operators/video/frames_decoder_cpu.cc (unconditional NumFrames() in ReadRegularFrame EOS guard) and dali/test/python/decoder/test_video.py (test_video_index_reuse pipe.build() removal)

Important Files Changed

Filename Overview
dali/operators/video/frames_decoder_cpu.cc Codec allow-list narrowed to 3 entries (VP8, VP9, MJPEG) with correct array size; EOS guard added in ReadRegularFrame; AVERROR_EOF tolerated in flush-mode send_packet. The unconditional NumFrames() call in the new EOS guard can exhaust the demuxer when no index is built (flagged previously).
dali/operators/video/frames_decoder_base.cc SeekFrame now resets the decoder when next_frame_idx_ >= NumFrames() (with HasIndex() guard), avoiding reuse of an invalid decoder position. Reset() sets next_frame_idx_=0, so the post-reset assert holds.
dali/operators/video/video_test.cc CompareFrame updated to count bad subpixels per thread (fixing a pre-existing data race on frames_match) and tolerates up to 16 deviations exceeding eps; applies globally to all callers. Reference frame paths switched to VP9 directories.
dali/operators/video/frames_decoder_test.cc Removed CPU HEVC frames-decoder tests (ConstantFrameRateHevc, VariableFrameRateHevc, VariableFrameRateHevcNoIndex, InMemoryVfrHevcVideo) consistent with HEVC being dropped from CPU allow-list.
dali/operators/video/input/video_input_test.cc Test file paths updated from H264 test_{1,2}.mp4 to VP9 test_{1,2}_vp9.mp4 for VideoInputNextOutputDataIdTest.
dali/test/python/decoder/test_video.py Significant test rework: VP9 fixtures, device_id=None for CPU, unsupported-codec assertion path, cfr_test.mp4 now only appended for mixed device. Several pre-flagged issues remain: unconditional NumFrames() call, test_video_index_reuse index-reuse invariant broken, device_id=0 hardcoded in test_multichannel_fill_value.
dali/test/python/input/test_video.py Adds module-level filter for H264 test_{1,2}.mp4 (using os.path.basename for exact match); restricts test_video_input_audio_stream to mixed backend only.
dali/test/python/test_dali_cpu_only.py video_files updated from H264 vfr/test_{1,2}.mp4 to VP9 vfr/test_{1,2}_vp9.mp4.
dali/test/python/test_dali_variable_batch_size.py test_video_decoder file path updated to VP9 variant.
dali/test/python/test_video_pipeline.py check_corrupted_videos updated to use VP9 test_2_vp9.mp4 as the good reference video.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ReadFrame called] --> B{flush_state_?}
    B -- yes --> C[ReadFlushFrame]
    B -- no --> D[ReadRegularFrame]
    D --> E{av_read_frame ok and video stream?}
    E -- no --> F[send null packet]
    F --> G[flush_state_ = true, return false]
    E -- yes --> H[avcodec_receive_frame]
    H -- EAGAIN --> E
    H -- EOF --> G
    H -- ok --> I{copy_to_output?}
    I -- yes --> J[CopyToOutput]
    I -- no --> K[skip]
    J & K --> L[++next_frame_idx_]
    L --> M{next_frame_idx_ >= NumFrames?}
    M -- yes --> N[next_frame_idx_ = -1 NEW EOS signal]
    M -- no --> O[return true]
    N --> O
    C --> P{avcodec_receive_frame ok?}
    P -- fail --> Q[flush_state_=false, next_frame_idx_=-1, return false]
    P -- ok --> R[CopyToOutput if needed, ++next_frame_idx_]
    R --> S{next_frame_idx_ >= NumFrames?}
    S -- yes --> T[next_frame_idx_ = -1]
    S -- no --> U[return true]
    T --> U
Loading

Reviews (11): Last reviewed commit: "Limit CPU video decoder codec support" | Re-trigger Greptile

Comment thread dali/operators/video/frames_decoder_cpu.cc Outdated
Comment thread DALI_DEPS_VERSION
@@ -1 +1 @@
b270f29e9d7655512e7e8eaf055cca4d19b55f55
ToDo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 [Bug] Version pin files set to literal ToDo placeholder.

Both DALI_DEPS_VERSION and DALI_EXTRA_VERSION now contain the string ToDo instead of a commit SHA. Any CI job that reads these files to fetch the matching dali_deps / dali_extra artefacts will either fail outright or pick up an incorrect/stale revision, breaking reproducibility for the entire build. These files should be updated to the real commit SHAs before this PR merges (or the dependent PRs should land first).

Fix in Claude Code

Comment thread dali/test/python/decoder/test_video.py Outdated
Comment thread dali/test/python/decoder/test_video.py
@JanuszL JanuszL force-pushed the remove_cpu_codecs branch 2 times, most recently from 13fd2c1 to 8375cb2 Compare May 18, 2026 22:02
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

@greptile review

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 8375cb2 to 9389ee9 Compare May 18, 2026 22:18
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

@greptile review

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 9389ee9 to d364075 Compare May 18, 2026 22:27
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

@greptile review

Comment on lines 160 to 165
++next_frame_idx_;
if (next_frame_idx_ >= NumFrames()) {
next_frame_idx_ = -1;
LOG_LINE << "Next frame index out of bounds (regular), setting to -1" << std::endl;
}
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The new EOS guard calls NumFrames() unconditionally on every decoded frame, which can invoke ParseNumFrames() when no index is built and nb_frames is zero in the container. ParseNumFrames() reads all remaining demuxer packets to completion, so the very first frame's increment will exhaust the packet stream and cause all subsequent av_read_frame calls to return EOF — silently dropping every frame after the first. The existing guard in ReadFlushFrame has the same limitation (documented with a TODO) but that function runs only after the demuxer is already exhausted. The fix is to guard the check with HasIndex(), mirroring the SeekFrame condition added in this same PR.

Suggested change
++next_frame_idx_;
if (next_frame_idx_ >= NumFrames()) {
next_frame_idx_ = -1;
LOG_LINE << "Next frame index out of bounds (regular), setting to -1" << std::endl;
}
return true;
++next_frame_idx_;
// TODO(awolant): Figure out how to handle this during index building
// Or when NumFrames is unavailable
if (HasIndex() && next_frame_idx_ >= NumFrames()) {
next_frame_idx_ = -1;
LOG_LINE << "Next frame index out of bounds (regular), setting to -1" << std::endl;
}
return true;

Fix in Claude Code

Comment on lines 711 to 712
batch_size = 3
pipe = test_pipeline(batch_size=batch_size, num_threads=3, device_id=0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 test_multichannel_fill_value hard-codes device_id=0 even though the test body uses fn.experimental.decoders.video which is a CPU/mixed operator; on a device-less CI machine this will fail at pipeline construction. Other tests in this PR were correctly updated to derive device_id from device, so this one was apparently missed.

Suggested change
batch_size = 3
pipe = test_pipeline(batch_size=batch_size, num_threads=3, device_id=0)
batch_size = 3
device_id = None if device == "cpu" else 0
pipe = test_pipeline(batch_size=batch_size, num_threads=3, device_id=device_id)

Fix in Claude Code

@dali-automaton
Copy link
Copy Markdown

CI MESSAGE: [51735694]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown

CI MESSAGE: [51735694]: BUILD FAILED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch 2 times, most recently from a8f64cc to fba22e3 Compare May 19, 2026 05:11
@dali-automaton
Copy link
Copy Markdown

CI MESSAGE: [51770028]: BUILD STARTED

Comment on lines 311 to 322
# test overflow of frame_buffer_
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")
filenames = filter(lambda filename: "mpeg4" not in filename, filenames)
filenames = filter(lambda filename: "hevc" not in filename, filenames)
filenames = filter(lambda filename: "av1" not in filename, filenames)
if device == "cpu":
# some formats are not yet supported in the CPU operator itself
filenames = filter(lambda filename: "mpeg4" not in filename, filenames)
filenames = filter(
lambda filename: "test_1.mp4" not in filename and "test_2.mp4" not in filename,
filenames,
)
filenames = cycle(filenames)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 cfr_test.mp4 is H264 and is not filtered for CPU.

The DALI_extra README.rst shows cfr_test.mp4 is generated with -c:v libx264, so it's H264. This file is appended to filenames before the CPU-conditional filters run. The CPU block filters mpeg4, test_1.mp4, and test_2.mp4, but cfr_test.mp4 slips through. When the CPU decoder encounters it, SelectVideoStream issues a DALI_WARN and returns false, causing a RuntimeError that fails the test.

The fix is to guard the filenames.append call (or add "cfr_test" not in filename to the CPU filter) so that the H264-specific overflow fixture is only used when the device can decode H264.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from fba22e3 to c87a527 Compare May 19, 2026 05:51
@dali-automaton
Copy link
Copy Markdown

CI MESSAGE: [51774937]: BUILD STARTED

Comment thread dali/test/python/decoder/test_video.py Outdated
Comment on lines +312 to +313
if device == "gpu":
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 [Bug] Dead condition — cfr_test.mp4 is never appended for the "mixed" device.

test_multi_gpu_video is decorated with @params("cpu", "mixed"), so device is never "gpu". The frame-buffer overflow fixture (cfr_test.mp4) is silently skipped for the "mixed" case, which was the very path the file was added to stress. The condition should be device == "mixed" (or device != "cpu").

Suggested change
if device == "gpu":
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")
if device == "mixed":
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Restrict the CPU frames decoder to codecs supported by the currently
compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant while VP8, VP9, and MJPEG remain
enabled.

Make `ReadRegularFrame` mark end-of-stream by setting `next_frame_idx_`
to -1 when the index reaches `NumFrames()`, mirroring the existing
guard in `ReadFlushFrame`. Without this, codecs with no decoder latency
(VP9 on the new test inputs) deliver the final frame via the regular
path, leaving `next_frame_idx_` at `NumFrames()` and causing
`VideoInput` depletion to be reported one batch late.

Reset the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.

Update video decoder tests to expect CPU failures for unsupported
codecs instead of skipping only MPEG4. Use VP9 CFR/VFR test inputs
and device-less CPU pipelines where appropriate. Point the CFR/VFR
reference frame folders at `frames_{1,2}_vp9/` so CPU decode of the
new VP9 fixtures matches at the existing eps=10 tolerance. Drop the
CPU HEVC frames-decoder tests (`ConstantFrameRateHevc`,
`VariableFrameRateHevc`, `VariableFrameRateHevcNoIndex`) — HEVC is no
longer in the CPU codec allow-list.

Tolerate up to 16 isolated subpixel deviations exceeding eps in
`TestVideo::CompareFrame` (out of ~2.7M subpixels per frame). The CPU
VP9 decode path occasionally produces a single byte that differs by
~32 — a SIMD glitch inside libavcodec/sws_scale that Valgrind cannot
instrument. The budget is orders of magnitude below what any genuine
regression would produce, so test sensitivity is preserved.

In `dali/test/python/input/test_video.py`, filter out h264 from the
round-robin fixture (the unsuffixed `test_{1,2}.mp4` in `cfr/`/`vfr/`
are h264) and restrict `test_video_input_audio_stream` to the mixed
backend — the only DALI_extra video with an audio stream is h264.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from c87a527 to 4e1a23c Compare May 19, 2026 06:14
@dali-automaton
Copy link
Copy Markdown

CI MESSAGE: [51777473]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown

CI MESSAGE: [51777473]: BUILD FAILED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants