Skip to content

Conversation

@artbataev
Copy link
Collaborator

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added the ASR label May 15, 2025
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@artbataev artbataev force-pushed the gpu_lm_decoding_ctc_greedy branch from aca9428 to efe9423 Compare May 15, 2025 09:51
Comment on lines +28 to +33
from nemo.core.utils.cuda_python_utils import (
check_cuda_python_cuda_graphs_conditional_nodes_supported,
cu_call,
run_nvrtc,
with_conditional_node,
)

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'check_cuda_python_cuda_graphs_conditional_nodes_supported' is not used.

Copilot Autofix

AI 8 months ago

To fix the issue, we will remove the unused import check_cuda_python_cuda_graphs_conditional_nodes_supported from the from nemo.core.utils.cuda_python_utils import statement. This will eliminate the unnecessary dependency and improve code readability without affecting functionality.

Suggested changeset 1
nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
--- a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
+++ b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
@@ -28,3 +28,2 @@
 from nemo.core.utils.cuda_python_utils import (
-    check_cuda_python_cuda_graphs_conditional_nodes_supported,
     cu_call,
EOF
@@ -28,3 +28,2 @@
from nemo.core.utils.cuda_python_utils import (
check_cuda_python_cuda_graphs_conditional_nodes_supported,
cu_call,
Copilot is powered by AI and may make mistakes. Always verify output.
arcs_weights_ptr=arcs_weights_ptr,
BLOCK_SIZE=BLOCK_SIZE,
)
...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.

Copilot Autofix

AI 8 months ago

To fix the issue, the ... statement on line 179 should be removed. This will eliminate the unnecessary and non-functional statement, making the code cleaner and less confusing. No additional changes are required, as the function _ctc_greedy_decode_lm_triton appears to be fully implemented without the need for the ellipsis.

Suggested changeset 1
nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
--- a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
+++ b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
@@ -178,3 +178,2 @@
         )
-        ...
 
EOF
@@ -178,3 +178,2 @@
)
...

Copilot is powered by AI and may make mistakes. Always verify output.
):
self._inner_loop_code()
else:
stream_for_graph = torch.cuda.Stream(self._cuda_graphs_state.device)

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.

Copilot Autofix

AI 8 months ago

To fix the issue, the unreachable else block should be removed entirely, as it serves no purpose if the condition use_full_cuda_graphs is always True. This will simplify the code and eliminate the unreachable code. Additionally, any associated comments or variables that are only relevant to the removed block should also be cleaned up to maintain clarity.


Suggested changeset 1
nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
--- a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
+++ b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
@@ -840,28 +840,2 @@
                     self._inner_loop_code()
-        else:
-            stream_for_graph = torch.cuda.Stream(self._cuda_graphs_state.device)
-            stream_for_graph.wait_stream(torch.cuda.default_stream(self._cuda_graphs_state.device))
-            self._cuda_graphs_state.before_loop_graph = torch.cuda.CUDAGraph()
-            self._cuda_graphs_state.inner_loop_graph = torch.cuda.CUDAGraph()
-            with (
-                torch.cuda.stream(stream_for_graph),
-                torch.inference_mode(),
-                torch.cuda.graph(
-                    self._cuda_graphs_state.before_loop_graph,
-                    stream=stream_for_graph,
-                    capture_error_mode="thread_local",
-                ),
-            ):
-                self._before_loop()
-
-            with (
-                torch.cuda.stream(stream_for_graph),
-                torch.inference_mode(),
-                torch.cuda.graph(
-                    self._cuda_graphs_state.inner_loop_graph,
-                    stream=stream_for_graph,
-                    capture_error_mode="thread_local",
-                ),
-            ):
-                self._inner_loop_code()
 
EOF
@@ -840,28 +840,2 @@
self._inner_loop_code()
else:
stream_for_graph = torch.cuda.Stream(self._cuda_graphs_state.device)
stream_for_graph.wait_stream(torch.cuda.default_stream(self._cuda_graphs_state.device))
self._cuda_graphs_state.before_loop_graph = torch.cuda.CUDAGraph()
self._cuda_graphs_state.inner_loop_graph = torch.cuda.CUDAGraph()
with (
torch.cuda.stream(stream_for_graph),
torch.inference_mode(),
torch.cuda.graph(
self._cuda_graphs_state.before_loop_graph,
stream=stream_for_graph,
capture_error_mode="thread_local",
),
):
self._before_loop()

with (
torch.cuda.stream(stream_for_graph),
torch.inference_mode(),
torch.cuda.graph(
self._cuda_graphs_state.inner_loop_graph,
stream=stream_for_graph,
capture_error_mode="thread_local",
),
):
self._inner_loop_code()

Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +894 to +895
# for _ in range(current_max_time):
# self._cuda_graphs_state.inner_loop_graph.replay()

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.

Copilot Autofix

AI 8 months ago

To fix the issue, we will remove the commented-out code block (lines 884–895) entirely. This includes the specific line flagged by CodeQL (# for _ in range(current_max_time):...) and other related commented-out lines. Removing this block will clean up the code and eliminate any confusion about its purpose.


Suggested changeset 1
nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
--- a/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
+++ b/nemo/collections/asr/parts/submodules/ctc_greedy_decoding.py
@@ -883,15 +883,4 @@
         self._cuda_graphs_state.logits_len[current_batch_size:].fill_(0)
-        # if self.cuda_graphs_mode is self.CudaGraphsMode.FULL_GRAPH:
-        # self._cuda_graphs_state.full_graph.replay()
-
-        # self._before_loop()
-        # while self._cuda_graphs_state.decoding_active:
-        #     self._inner_loop_code()
-
         self._cuda_graphs_state.full_graph.replay()
 
-        # self._cuda_graphs_state.before_loop_graph.replay()
-        # for _ in range(current_max_time):
-        #     self._cuda_graphs_state.inner_loop_graph.replay()
-
         return (
EOF
@@ -883,15 +883,4 @@
self._cuda_graphs_state.logits_len[current_batch_size:].fill_(0)
# if self.cuda_graphs_mode is self.CudaGraphsMode.FULL_GRAPH:
# self._cuda_graphs_state.full_graph.replay()

# self._before_loop()
# while self._cuda_graphs_state.decoding_active:
# self._inner_loop_code()

self._cuda_graphs_state.full_graph.replay()

# self._cuda_graphs_state.before_loop_graph.replay()
# for _ in range(current_max_time):
# self._cuda_graphs_state.inner_loop_graph.replay()

return (
Copilot is powered by AI and may make mistakes. Always verify output.
@github-actions github-actions bot added the stale label May 30, 2025
@NVIDIA-NeMo NVIDIA-NeMo deleted a comment from github-actions bot May 30, 2025
@artbataev artbataev removed the stale label May 30, 2025
@artbataev artbataev changed the title CTC Greedy Decoding with NGPU-LM CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) May 30, 2025
@github-actions
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jun 14, 2025
@github-actions
Copy link
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Jun 21, 2025
@artbataev
Copy link
Collaborator Author

Implemented in #13917

@artbataev artbataev deleted the gpu_lm_decoding_ctc_greedy branch June 24, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants