Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process_utilization_stats failed with NOT_FOUND error, Ubuntu 22.04 #56

Open
tubzby opened this issue Mar 15, 2024 · 3 comments
Open

process_utilization_stats failed with NOT_FOUND error, Ubuntu 22.04 #56

tubzby opened this issue Mar 15, 2024 · 3 comments

Comments

@tubzby
Copy link

tubzby commented Mar 15, 2024

use nvml_wrapper::Nvml;

fn main() {
    let nvml = Nvml::init().unwrap();
    let device = nvml.device_by_index(0).unwrap();

    let st = device.process_utilization_stats(None).unwrap();
}

cargo run with error:

thread 'main' panicked at src/main.rs:7:53:
called `Result::unwrap()` on an `Err` value: NotFound
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

My device:

Fri Mar 15 07:01:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti     Off |   00000000:01:00.0 Off |                  N/A |
|  0%   43C    P8             24W /  350W |       1MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

It's quite strange here, the first call to nvmlDeviceGetProcessUtilization to retrieve proccess count returned 79 in my situation which should be 0.

@tubzby tubzby changed the title process_utilization_stats failed with NOT_FOUND error, Ubuntu 22.04, cuda process_utilization_stats failed with NOT_FOUND error, Ubuntu 22.04 Mar 15, 2024
@Baughn
Copy link

Baughn commented Jun 1, 2024

Some observations:

  • The problem persists between restarts of the NVML-using program.
  • If there is only a single compute process running, and I restart it, then the problem temporarily disappears.
  • Passing in a timestamp makes it far more likely to break. If I always pass in None, then it'll usually keep working for half a minute or so, polling every 2s.
  • nvtop doesn't have the issue. What do they do differently?
  • Feels like a driver bug. Does this happen on every GPU?

@Baughn
Copy link

Baughn commented Jun 1, 2024

@Baughn
Copy link

Baughn commented Jun 1, 2024

Another observation: Processes appear to only be returned if they are running. An idle process doesn't end up in the array, unless it was non-idle very recently. This accounts for what happens if I set the timestamp -- it reduces the horizon.

Also means that swallowing the error (and returning []) should be a valid workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants