Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Use official nvidia-ml-py package instead of fork #4208

Merged
merged 11 commits into from
Nov 7, 2023
Merged

build: Use official nvidia-ml-py package instead of fork #4208

merged 11 commits into from
Nov 7, 2023

Conversation

ecederstrand
Copy link
Contributor

@ecederstrand ecederstrand commented Sep 25, 2023

What does this PR address?

There are two packages in PyPI installing a pynvml module - pynvml and nvidia-ml-py. The former is a forked version of the latter, which is the official package published by NVIDIA.

I stumbled upon this because I built a virtual environment that also installs gpustat, and that failed because that package pulls in nvidia-ml-py, thus overwriting the BentoML dependency.

Before submitting:

@ecederstrand ecederstrand requested a review from a team as a code owner September 25, 2023 13:16
@ecederstrand ecederstrand requested review from sauyon and removed request for a team September 25, 2023 13:16
@ecederstrand ecederstrand changed the title build: Use the official package instead of a forked version build: Use official nvidia-ml-py package instead of fork Sep 26, 2023
@sauyon
Copy link
Contributor

sauyon commented Oct 10, 2023

Hiya, sorry about the slow response! Mind if I push to fix the CI errors?

@ecederstrand
Copy link
Contributor Author

Please do 😊

@sauyon
Copy link
Contributor

sauyon commented Oct 10, 2023

Ah, seems like there are real test failures caused by the dependency change, we'll probably have to go through and fix those.

@ecederstrand
Copy link
Contributor Author

ecederstrand commented Oct 12, 2023

Tests are failing because they can't find the NVidia driver (cannot find libnvidia-ml.so.1). I assume the CI pipeline is missing e.g. apt install libnvidia-compute-535-server. Any suggestions on where a good place to add that would be? Also, something similar will be needed for the OS X and Windows tests.

@ecederstrand
Copy link
Contributor Author

ecederstrand commented Oct 13, 2023

It turns out the tests don't actually test the pynvml package. They just die because pynvml now throws a different exception when being imported. Should be fixed now.

@ecederstrand
Copy link
Contributor Author

Test failures are dues to a bad pdm.lock file. I'm not familiar with pdm and just wanted to swap pynvml with nvidia-ml-py in pyproject.toml. What's the correct way to do that and re-generate the lock file?

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
@ecederstrand
Copy link
Contributor Author

The 3 failing checks don't look related to the changes in this MR.

@aarnphm
Copy link
Member

aarnphm commented Nov 7, 2023

Yep. Our tests are just very much broken.

@aarnphm
Copy link
Member

aarnphm commented Nov 7, 2023

Thanks.

@aarnphm aarnphm merged commit a59750c into bentoml:main Nov 7, 2023
35 of 38 checks passed
@ecederstrand ecederstrand deleted the patch-1 branch November 7, 2023 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants