CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) #13597

artbataev · 2025-05-15T09:45:01Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

+from nemo.core.utils.cuda_python_utils import (
+    check_cuda_python_cuda_graphs_conditional_nodes_supported,
+    cu_call,
+    run_nvrtc,
+    with_conditional_node,
+)


To fix the issue, we will remove the unused import check_cuda_python_cuda_graphs_conditional_nodes_supported from the from nemo.core.utils.cuda_python_utils import statement. This will eliminate the unnecessary dependency and improve code readability without affecting functionality.

nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

+            arcs_weights_ptr=arcs_weights_ptr,
+            BLOCK_SIZE=BLOCK_SIZE,
+        )
+        ...


To fix the issue, the ... statement on line 179 should be removed. This will eliminate the unnecessary and non-functional statement, making the code cleaner and less confusing. No additional changes are required, as the function _ctc_greedy_decode_lm_triton appears to be fully implemented without the need for the ellipsis.

nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

+                ):
+                    self._inner_loop_code()
+        else:
+            stream_for_graph = torch.cuda.Stream(self._cuda_graphs_state.device)


To fix the issue, the unreachable else block should be removed entirely, as it serves no purpose if the condition use_full_cuda_graphs is always True. This will simplify the code and eliminate the unreachable code. Additionally, any associated comments or variables that are only relevant to the removed block should also be cleaned up to maintain clarity.

nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

+        # for _ in range(current_max_time):
+        #     self._cuda_graphs_state.inner_loop_graph.replay()


To fix the issue, we will remove the commented-out code block (lines 884–895) entirely. This includes the specific line flagged by CodeQL (# for _ in range(current_max_time):...) and other related commented-out lines. Removing this block will clean up the code and eliminate any confusion about its purpose.

github-actions · 2025-06-14T02:08:32Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2025-06-21T02:09:44Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

artbataev · 2025-06-24T09:53:12Z

Implemented in #13917

github-actions bot added the ASR label May 15, 2025

Add CTC decoding

efe9423

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

artbataev force-pushed the gpu_lm_decoding_ctc_greedy branch from aca9428 to efe9423 Compare May 15, 2025 09:51

github-advanced-security bot found potential problems May 15, 2025

View reviewed changes

github-actions bot added the stale label May 30, 2025

NVIDIA-NeMo deleted a comment from github-actions bot May 30, 2025

artbataev removed the stale label May 30, 2025

artbataev changed the title ~~CTC Greedy Decoding with NGPU-LM~~ CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) May 30, 2025

github-actions bot added the stale label Jun 14, 2025

github-actions bot closed this Jun 21, 2025

artbataev deleted the gpu_lm_decoding_ctc_greedy branch June 24, 2025 09:53

@@ -840,28 +840,2 @@
                                 self._inner_loop_code()
-                    else:
-                        stream_for_graph = torch.cuda.Stream(self._cuda_graphs_state.device)
-                        stream_for_graph.wait_stream(torch.cuda.default_stream(self._cuda_graphs_state.device))
-                        self._cuda_graphs_state.before_loop_graph = torch.cuda.CUDAGraph()
-                        self._cuda_graphs_state.inner_loop_graph = torch.cuda.CUDAGraph()
-                        with (
-                            torch.cuda.stream(stream_for_graph),
-                            torch.inference_mode(),
-                            torch.cuda.graph(
-                                self._cuda_graphs_state.before_loop_graph,
-                                stream=stream_for_graph,
-                                capture_error_mode="thread_local",
-                            ),
-                        ):
-                            self._before_loop()
-                        with (
-                            torch.cuda.stream(stream_for_graph),
-                            torch.inference_mode(),
-                            torch.cuda.graph(
-                                self._cuda_graphs_state.inner_loop_graph,
-                                stream=stream_for_graph,
-                                capture_error_mode="thread_local",
-                            ),
-                        ):
-                            self._inner_loop_code()

@@ -883,15 +883,4 @@
                     self._cuda_graphs_state.logits_len[current_batch_size:].fill_(0)
-                    # if self.cuda_graphs_mode is self.CudaGraphsMode.FULL_GRAPH:
-                    # self._cuda_graphs_state.full_graph.replay()
-                    # self._before_loop()
-                    # while self._cuda_graphs_state.decoding_active:
-                    #     self._inner_loop_code()
                     self._cuda_graphs_state.full_graph.replay()
-                    # self._cuda_graphs_state.before_loop_graph.replay()
-                    # for _ in range(current_max_time):
-                    #     self._cuda_graphs_state.inner_loop_graph.replay()
                     return (

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) #13597

CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) #13597

Uh oh!

artbataev commented May 15, 2025

Uh oh!

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check warning

Copilot Autofix

Check notice

Copilot Autofix

github-actions bot commented Jun 14, 2025

Uh oh!

github-actions bot commented Jun 21, 2025

Uh oh!

artbataev commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

@@ -178,3 +178,2 @@
                     )
-                    ...

		# for _ in range(current_max_time):
		# self._cuda_graphs_state.inner_loop_graph.replay()

CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) #13597

CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) #13597

Uh oh!

Conversation

artbataev commented May 15, 2025

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check warning

Copilot Autofix

Check notice

Copilot Autofix

github-actions bot commented Jun 14, 2025

Uh oh!

github-actions bot commented Jun 21, 2025

Uh oh!

artbataev commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants