Skip to content

Add custom bt601 full range cuda kernel and fix color conversion#323

Merged
farbod-nv merged 5 commits intomainfrom
fm/color_fix
Apr 1, 2026
Merged

Add custom bt601 full range cuda kernel and fix color conversion#323
farbod-nv merged 5 commits intomainfrom
fm/color_fix

Conversation

@farbod-nv
Copy link
Copy Markdown
Contributor

@farbod-nv farbod-nv commented Mar 25, 2026

Nvenc color conversion is CSC so for custom encoding we need csc nv12->rgb
On the other hand, Oak-d uses bt601 fullrange color conversion ondevice for rgb->nv12 which is not supported by nppi. This adds a custom cuda kernel to do the correct color conversion for oakd. Alternative is to use 2 nppi methods (go to yuv420 then rgb) which is not as efficient as this.

Summary by CodeRabbit

  • New Features
    • Added GPU-accelerated full-range BT.601 NV12→RGB conversion option for decoded video.
    • Added a force_full_range toggle to the stream decoder API to override automatic range detection.
    • Per-camera color_range setting ("auto"/"full"/"limited"); "auto" resolves per camera type (OAKD defaults to full-range) and is passed to the decoder.

@farbod-nv farbod-nv requested review from jiwenc-nv and life1ess March 25, 2026 20:15
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a CUDA full-range BT.601 NV12→RGB converter, exposes a force_full_range operator parameter, auto-detects video full-range on first decode (unless forced), and wires per-camera color_range config to the decoder instantiation.

Changes

Cohort / File(s) Summary
CUDA kernel & build
examples/camera_streamer/operators/nv_stream_decoder/CMakeLists.txt, examples/camera_streamer/operators/nv_stream_decoder/nv12_to_rgb.cu, examples/camera_streamer/operators/nv_stream_decoder/nv12_to_rgb.cuh
Added nv12_to_rgb_fullrange_bt601 declaration and implementation; updated CMakeLists to compile and link the new CUDA compilation unit into the nv_stream_decoder target.
Decoder operator (C++)
examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.hpp, examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp
Added force_full_range parameter and internal use_full_range_/range_detected_ state. On first successful decode, resolve range (detected VUI flag or forced) and branch to either new CUDA full-range conversion or existing NPP limited-range conversion; adjusted CUDA context push/pop around conversion and NPP stream context setup.
Python bindings
examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op_py.cpp
Exposed force_full_range boolean in the pybind11 constructor and forwarded it into the operator ArgList; updated signature and docstring.
Application config & integration
examples/camera_streamer/camera_config.py, examples/camera_streamer/teleop_camera_subgraph.py
Added color_range field and is_full_range property to CameraConfig (defaults to "auto"); teleop_camera_subgraph now passes force_full_range=cam_cfg.is_full_range when constructing NvStreamDecoderOp.

Sequence Diagram(s)

sequenceDiagram
  participant RTP as RTP Source
  participant Op as NvStreamDecoderOp
  participant Decoder as HW Decoder
  participant CUDA as nv12_to_rgb_fullrange
  participant NPP as NPP_NV12ToRGB

  RTP->>Op: RTP packet/frame
  Op->>Decoder: submit packet / decode
  Decoder-->>Op: decoded NV12 frame + full_range_flag
  Op->>Op: resolve range (force_full_range or detected)
  alt full-range
    Op->>CUDA: launch nv12_to_rgb_fullrange (cudaStream)
    CUDA-->>Op: RGB frame
  else limited-range
    Op->>NPP: call nppiNV12ToRGB_... (Ctx)
    NPP-->>Op: RGB frame
  end
  Op-->>RTP: emit RGB tensor/frame
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through kernels, quick and keen,
NV12 to RGB — a shiny new routine.
Auto or forced, the range I choose,
Pixels dance in proper hues.
Oak‑D and friends now paint the scene ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a custom BT.601 full-range CUDA kernel and fixing the associated color conversion logic. It accurately reflects the changeset across multiple files implementing NV12→RGB conversion support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fm/color_fix

Comment @coderabbitai help to get the list of available commands and usage tips.

@farbod-nv farbod-nv requested a review from nvddr March 25, 2026 20:16
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp`:
- Around line 225-230: The custom CUDA kernel path that calls
nv12_to_rgb_fullrange_bt601 lacks post-launch error checking; after calling
nv12_to_rgb_fullrange_bt601 add a cudaGetLastError() check (e.g., cudaError_t
err = cudaGetLastError()) and handle non-success by logging/returning the same
error flow used by the NPP path (mirror its processLogger/error return behavior)
so launch failures are caught and the function returns early on error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bbfbb1b0-a798-44dd-8626-ad12f9dcaa6b

📥 Commits

Reviewing files that changed from the base of the PR and between 46c9fb6 and 77fdef1.

📒 Files selected for processing (2)
  • examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp
  • examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op_py.cpp

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds full-range BT.601 NV12→RGB conversion support to the camera_streamer NVDEC-based decoder to fix color issues with OAK-D VPU-encoded streams, and exposes an override to force full-range handling when bitstream metadata is missing.

Changes:

  • Add force_full_range parameter to NvStreamDecoderOp (C++ + pybind) and wire it from the teleop subgraph.
  • Implement a custom CUDA kernel for full-range BT.601 NV12→RGB conversion and use it when full-range is selected.
  • Switch the limited-range conversion path to use NPP’s BT.709 CSC variant.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
examples/camera_streamer/teleop_camera_subgraph.py Forces full-range decode for OAK-D streams (currently for all OAK-D).
examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op_py.cpp Exposes force_full_range in Python bindings and docs.
examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.hpp Adds operator parameters/state for range selection.
examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp Adds range detection/override and switches NV12→RGB implementation paths.
examples/camera_streamer/operators/nv_stream_decoder/nv12_to_rgb.cuh Declares the custom CUDA conversion API.
examples/camera_streamer/operators/nv_stream_decoder/nv12_to_rgb.cu Implements the full-range BT.601 NV12→RGB CUDA kernel.
examples/camera_streamer/operators/nv_stream_decoder/CMakeLists.txt Builds the new CUDA source into the decoder library.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ff4b44e6-fbd1-4b52-8317-325306809639

📥 Commits

Reviewing files that changed from the base of the PR and between 2374d1e and 62466c0.

📒 Files selected for processing (1)
  • examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp (1)

185-197: 🧹 Nitpick | 🔵 Trivial

Minor optimization opportunity (previously noted).

When force_full_range_ is true, decoder_->GetVideoFormatInfo() is still called even though its result doesn't affect use_full_range_. This could be skipped:

if (force_full_range_.get()) {
    use_full_range_ = true;
    HOLOSCAN_LOG_INFO("NV12->RGB color range: full (force_full_range=true)");
} else {
    auto fmt = decoder_->GetVideoFormatInfo();
    int bitstream_flag = fmt.video_signal_description.video_full_range_flag;
    use_full_range_ = (bitstream_flag != 0);
    HOLOSCAN_LOG_INFO("NV12->RGB color range: {} (bitstream flag={})",
                      use_full_range_ ? "full" : "limited", bitstream_flag);
}

That said, the current approach provides useful diagnostic info by always logging the bitstream flag, which can help debug encoder misconfigurations. If the GetVideoFormatInfo() call is lightweight, keeping it for diagnostics is reasonable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp`
around lines 185 - 197, When determining use_full_range_ inside the
range_detected_ block, avoid calling decoder_->GetVideoFormatInfo() when
force_full_range_.get() is true since that call doesn't affect the result;
implement a conditional: if force_full_range_.get() set use_full_range_ = true
and log that fact (using HOLOSCAN_LOG_INFO with force_full_range_.get()), else
call decoder_->GetVideoFormatInfo(), read
fmt.video_signal_description.video_full_range_flag, set use_full_range_ =
(bitstream_flag != 0) and log the bitstream flag; keep the range_detected_ =
true behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp`:
- Around line 185-197: When determining use_full_range_ inside the
range_detected_ block, avoid calling decoder_->GetVideoFormatInfo() when
force_full_range_.get() is true since that call doesn't affect the result;
implement a conditional: if force_full_range_.get() set use_full_range_ = true
and log that fact (using HOLOSCAN_LOG_INFO with force_full_range_.get()), else
call decoder_->GetVideoFormatInfo(), read
fmt.video_signal_description.video_full_range_flag, set use_full_range_ =
(bitstream_flag != 0) and log the bitstream flag; keep the range_detected_ =
true behavior unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2669718a-96da-42e8-9ac9-e7c5fbac6086

📥 Commits

Reviewing files that changed from the base of the PR and between 62466c0 and 8389b95.

📒 Files selected for processing (1)
  • examples/camera_streamer/operators/nv_stream_decoder/nv_stream_decoder_op.cpp

@farbod-nv farbod-nv merged commit 052a024 into main Apr 1, 2026
31 checks passed
@farbod-nv farbod-nv deleted the fm/color_fix branch April 1, 2026 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants