Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvml-wrapper 0.7.0 enables loading NVML at runtime #87

Open
Cldfire opened this issue Dec 7, 2020 · 3 comments · Fixed by #89
Open

nvml-wrapper 0.7.0 enables loading NVML at runtime #87

Cldfire opened this issue Dec 7, 2020 · 3 comments · Fixed by #89

Comments

@Cldfire
Copy link

Cldfire commented Dec 7, 2020

Hi! Author of nvml-wrapper here.

I've just published nvml-wrapper 0.7.0, the headlining feature of which is the ability to load the NVML library at runtime (as opposed to linking to it at compiletime). Updating to this version of nvml-wrapper will allow you to get rid of the nvidia feature in zenith and give you the ability to enable / disable GPU-related functionality at runtime based on the hardware in the system!

I cloned the repo locally and tried zenith out with the new nvml-wrapper release; everything worked with zero code changes, which was great to see. One thing I would highly recommend you do, though, is change your strategy for constructing the NVML struct. Currently, the code does the following every time it polls the GPU for information:

let nvml = NVML::init();
let n = match nvml {
    Ok(n) => n,
    Err(e) => {
        error!("Couldn't init NVML: {:?}", e);
        return;
    }
};

This isn't ideal today because it performs all of the work to initialize NVML every time data is collected, which (at least on my machine) skews GPU usage stats and increases power usage. With nvml-wrapper 0.7.0 and beyond this call is also responsible for finding and loading all NVML function symbols, so it'll become even more expensive.

Instead, I'd recommend initializing NVML once at startup, and using the resulting instance for the rest of the program's lifetime. This could be accomplished quite simply as follows using once_cell (error handling up to you):

static NVML_INSTANCE: Lazy<NVML> = Lazy::new(|| NVML::init().unwrap());

Or, depending on how you want to architect it, you could initialize NVML elsewhere and pass it into the function calls where you need access to it.

If you run into any issues updating to 0.7.0 please let me know and I'd be happy to help! :)

@bvaisvil
Copy link
Owner

bvaisvil commented Dec 7, 2020

This sounds fantastic! Thank you!

When I was first working on this I was hoping to accomplish exactly this, awesome.

@bvaisvil
Copy link
Owner

Re-opening as though @alexmaco changes address many of the issues here, I did want to be able to remove --features nvidia and have supported detected at runtime.

@alexmaco
Copy link
Contributor

@bvaisvil To safely have nvidia support always-on, the error handling should also be adjusted a bit. I still have an nvidia gpu for which nvml prints errors on every loop. That detection and counting of gpus should either be done only at initialization, or only logged to files, instead of the stdout where it interferes with the display.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants