Skip to content

Inconsistency between setup.py vs modules.py? #368

@BjoernHaefner

Description

@BjoernHaefner

I've a machine with two GPUs:

$ nvidia-smi
Thu Sep  7 13:43:12 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro K620                    Off | 00000000:03:00.0 Off |                  N/A |
| 43%   55C    P8               1W /  30W |    119MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX 1070        Off | 00000000:04:00.0 Off |                  N/A |
| 28%   62C    P2              34W / 151W |    258MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

The seem to have compute capability of 50 and 61.

On this machine I tried to run neuralangelo, which requires this repo to be installed, i.e. setup.py is being run.
The installation goes through, but when running neuralangelo the code wants to import the module tinycudann_bindings here in the modules.py file. However, this fails with the following error:

OSError: Could not find compatible tinycudann extension for compute capability 50.

After a bit of digging into your code, I seem to understand where this error comes from.
When setup.py is being called, (in my case) this if-statement is executed, causing a single compute capability (namely that of the return of torch.cuda.get_device_capability()) to be installed. However, when importing the module tinycudann_bindings, as mentioned above, it loads the one with the smallest compute capability across all devices, see here. This causes a potential incosistency: setup.py installs with compute capability 61, neuralangelo wants to import tinycudann with compute capability of 50.

Now my questions :)

  1. What is the anticipated way to tackle this issue? Should I set the environment variable TCNN_CUDA_ARCHITECTURES to force multiple compute capabilities being installed?
  2. As the feature seems to be there to install multiple compute capabilities during setup.py (or did I get this wrong?), why not have a similar schematics as during the module loading? I.e. iterate through all found devices here and have a list of compute_capabilities.
  3. Even if 1) and 2) would work, is there a built-in way to force the modules.py script to load a desired installed binding (in my case compute capability 61)? I.e. make sure this executes not with the smallest system capability by default, but with the installed/desired/max system capability or something.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions