Skip to content

Fix numba-cuda and cuda-python installation and usage#15506

Merged
artbataev merged 24 commits intomainfrom
vbataev/fix_numba
Mar 23, 2026
Merged

Fix numba-cuda and cuda-python installation and usage#15506
artbataev merged 24 commits intomainfrom
vbataev/fix_numba

Conversation

@artbataev
Copy link
Collaborator

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
huggingface_hub>=0.24
numba ; platform_system == 'Darwin'
numba-cuda==0.15.1 ; platform_system != 'Darwin'
numba-cuda[cu13] ; platform_system != 'Darwin'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking from first principles:

  • we have CU13 users that have cutting edge hardware / want cutting edge perf
  • we have CU12 users that don't have cutting edge hardware
  • we likely have CU11 users too
  • we have CPU-only env users, also on platforms like MacOS

How do we cater to all of them? Sane default:

  • pip install nemo-toolkit - does not pull any CUDA depedencies; the codebase treats all of them as optional and raises an informative error about what to install if you try to use them

CUDA version is a bit more problematic because it depends on what pytorch was built against. We can't really control this in a robust way. So let's give options. Is it possible to do sth like this?

  • pip install nemo-toolkit[cu13] - pulls CUDA 13 deps
  • pip install nemo-toolkit[cu12] - pulls CUDA 12 deps
  • pip install nemo-toolkit[cu11] - pulls CUDA 11 deps

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW Curator also currently only supports CUDA 12, and Curator depends on this. So +1-ing that recommendation, Curator can then depend on nemo-toolkit[cu12]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding CUDA 11.x - I think it is ok if the users of CUDA 11.x will install extra dependencies manually.
I'm unsure about the best numba version for this case, but latest versions do not support 11.x.
For cuda-python, we require at least 12.3.x, better to use 12.6.x. So, for CUDA 11.x it is useless.

Since both libraries are optional, let's for now keep only cu12 and cu13 extras.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both libraries are optional, let's for now keep only cu12 and cu13 extras.

OK

Copy link
Member

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add cuda-python to the reqs
(nemo/core/utils/cuda_python_utils.py)

@github-actions github-actions bot added the core Changes to NeMo Core label Mar 20, 2026
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@artbataev artbataev changed the title Relax numba-cuda version Fix numba-cuda and cuda-python installation and usage Mar 20, 2026
@artbataev artbataev changed the title Fix numba-cuda and cuda-python installation and usage Fix numba-cuda and cuda-python installation and usage Mar 20, 2026
artbataev and others added 4 commits March 20, 2026 10:06
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
"""

__all__ = ["KENLM_AVAILABLE", "K2_AVAILABLE", "TRITON_AVAILABLE", "kenlm_required", "k2_required", "triton_required"]
__all__ = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm wondering if we should have API like this instead:

k2 = ext_k2.maybe_import()  # returns python module or None

@ext_k2.required()
def foo(): ...

if ext_k2.available(): ...

possibly implemented like this:

class OptionalDependency:
    NAME = ""
    INSTALATION_MESSAGE = ""
    def required(self): ...  # moved from _lib_required
    def available(self): ...  # moved from is_lib_available

class ext_k2(OptionalDependency):
    NAME = "k2"
    INSTALLATION_MESSAGE = "..."

Copy link
Collaborator Author

@artbataev artbataev Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k2 = ext_k2.maybe_import()

I initially thought about this, but this approach imposes makes imports more strict. You cannot anymore use from k2 import ... etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class OptionalDependency:
    NAME = ""
    INSTALATION_MESSAGE = ""
    def required(self): ...  # moved from _lib_required
    def available(self): ...  # moved from is_lib_available

class ext_k2(OptionalDependency):
    NAME = "k2"
    INSTALLATION_MESSAGE = "..."

This approach has more overhead for required due to dynamic availability check. I prefer to check once -> create a constant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe I'm overthinking this :)

huggingface_hub>=0.24
numba ; platform_system == 'Darwin'
numba-cuda[cu13] ; platform_system != 'Darwin'
numba-cuda ; platform_system != 'Darwin'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed from main requirements? Doesn't it conflict with lines added by cu12/cu13?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

setup.py Outdated
extras_require['all'] = list(chain(val for key, val in extras_require.items()))

# CUDA version extras (not included in 'all' - user must explicitly select)
extras_require['cu12'] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, can we move to requirements_cu12.txt (same with cu13)?

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
from nemo.utils import logging, logging_mode

if NUMBA_CUDA_AVAILABLE:
from nemo.collections.asr.parts.numba.spec_augment import SpecAugmentNumba, spec_augment_launch_heuristics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: remove these as we have a fully vectorized pytorch native implementation that's faster for a while now.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
@artbataev
Copy link
Collaborator Author

/claude review

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@artbataev
Copy link
Collaborator Author

/claude review

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@artbataev
Copy link
Collaborator Author

/claude review

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@artbataev artbataev enabled auto-merge (squash) March 22, 2026 05:58
@github-actions github-actions bot removed the Run CICD label Mar 22, 2026
@github-actions
Copy link
Contributor

[🤖]: Hi @artbataev 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @artbataev!

@artbataev artbataev merged commit 603e922 into main Mar 23, 2026
132 checks passed
@artbataev artbataev deleted the vbataev/fix_numba branch March 23, 2026 12:55
pzelasko pushed a commit that referenced this pull request Mar 23, 2026
* Fix int type in grid
* Adjust numba and cuda-python requirements
* Add cuda-python guards to optional libs
* Use cuda-python guards
* Add guards for numba

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
artbataev pushed a commit that referenced this pull request Mar 23, 2026
…15540)

* Fix int type in grid
* Adjust numba and cuda-python requirements
* Add cuda-python guards to optional libs
* Use cuda-python guards
* Add guards for numba



---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR core Changes to NeMo Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants