-
Notifications
You must be signed in to change notification settings - Fork 3.3k
CTC Greedy Decoding with NGPU-LM (N-Gram LM on GPU) #13597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
aca9428 to
efe9423
Compare
| from nemo.core.utils.cuda_python_utils import ( | ||
| check_cuda_python_cuda_graphs_conditional_nodes_supported, | ||
| cu_call, | ||
| run_nvrtc, | ||
| with_conditional_node, | ||
| ) |
Check notice
Code scanning / CodeQL
Unused import Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 8 months ago
To fix the issue, we will remove the unused import check_cuda_python_cuda_graphs_conditional_nodes_supported from the from nemo.core.utils.cuda_python_utils import statement. This will eliminate the unnecessary dependency and improve code readability without affecting functionality.
| @@ -28,3 +28,2 @@ | ||
| from nemo.core.utils.cuda_python_utils import ( | ||
| check_cuda_python_cuda_graphs_conditional_nodes_supported, | ||
| cu_call, |
| arcs_weights_ptr=arcs_weights_ptr, | ||
| BLOCK_SIZE=BLOCK_SIZE, | ||
| ) | ||
| ... |
Check notice
Code scanning / CodeQL
Statement has no effect Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 8 months ago
To fix the issue, the ... statement on line 179 should be removed. This will eliminate the unnecessary and non-functional statement, making the code cleaner and less confusing. No additional changes are required, as the function _ctc_greedy_decode_lm_triton appears to be fully implemented without the need for the ellipsis.
| @@ -178,3 +178,2 @@ | ||
| ) | ||
| ... | ||
|
|
| ): | ||
| self._inner_loop_code() | ||
| else: | ||
| stream_for_graph = torch.cuda.Stream(self._cuda_graphs_state.device) |
Check warning
Code scanning / CodeQL
Unreachable code Warning
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 8 months ago
To fix the issue, the unreachable else block should be removed entirely, as it serves no purpose if the condition use_full_cuda_graphs is always True. This will simplify the code and eliminate the unreachable code. Additionally, any associated comments or variables that are only relevant to the removed block should also be cleaned up to maintain clarity.
| @@ -840,28 +840,2 @@ | ||
| self._inner_loop_code() | ||
| else: | ||
| stream_for_graph = torch.cuda.Stream(self._cuda_graphs_state.device) | ||
| stream_for_graph.wait_stream(torch.cuda.default_stream(self._cuda_graphs_state.device)) | ||
| self._cuda_graphs_state.before_loop_graph = torch.cuda.CUDAGraph() | ||
| self._cuda_graphs_state.inner_loop_graph = torch.cuda.CUDAGraph() | ||
| with ( | ||
| torch.cuda.stream(stream_for_graph), | ||
| torch.inference_mode(), | ||
| torch.cuda.graph( | ||
| self._cuda_graphs_state.before_loop_graph, | ||
| stream=stream_for_graph, | ||
| capture_error_mode="thread_local", | ||
| ), | ||
| ): | ||
| self._before_loop() | ||
|
|
||
| with ( | ||
| torch.cuda.stream(stream_for_graph), | ||
| torch.inference_mode(), | ||
| torch.cuda.graph( | ||
| self._cuda_graphs_state.inner_loop_graph, | ||
| stream=stream_for_graph, | ||
| capture_error_mode="thread_local", | ||
| ), | ||
| ): | ||
| self._inner_loop_code() | ||
|
|
| # for _ in range(current_max_time): | ||
| # self._cuda_graphs_state.inner_loop_graph.replay() |
Check notice
Code scanning / CodeQL
Commented-out code Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 8 months ago
To fix the issue, we will remove the commented-out code block (lines 884–895) entirely. This includes the specific line flagged by CodeQL (# for _ in range(current_max_time):...) and other related commented-out lines. Removing this block will clean up the code and eliminate any confusion about its purpose.
| @@ -883,15 +883,4 @@ | ||
| self._cuda_graphs_state.logits_len[current_batch_size:].fill_(0) | ||
| # if self.cuda_graphs_mode is self.CudaGraphsMode.FULL_GRAPH: | ||
| # self._cuda_graphs_state.full_graph.replay() | ||
|
|
||
| # self._before_loop() | ||
| # while self._cuda_graphs_state.decoding_active: | ||
| # self._inner_loop_code() | ||
|
|
||
| self._cuda_graphs_state.full_graph.replay() | ||
|
|
||
| # self._cuda_graphs_state.before_loop_graph.replay() | ||
| # for _ in range(current_max_time): | ||
| # self._cuda_graphs_state.inner_loop_graph.replay() | ||
|
|
||
| return ( |
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
|
This PR was closed because it has been inactive for 7 days since being marked as stale. |
|
Implemented in #13917 |
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information