Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use globally insatlled installed packages for GPU tests #1011

Merged
merged 8 commits into from
Mar 1, 2023

Conversation

edknv
Copy link
Contributor

@edknv edknv commented Mar 1, 2023

For GPU tests, we should use globally installed packages as we don't install cudf, cupy, etc. in the test environment. By setting sitepackages=true in tox, tox will use the packages in the CI container if not available in the environment. We had sitepackages=true before but it got dropped at some point. This PR restores sitepackages=true in the tox test py38-gpu.

@github-actions
Copy link

github-actions bot commented Mar 1, 2023

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-1011

@edknv edknv changed the title User globally insatlled installed packages for GPU tests Use globally insatlled installed packages for GPU tests Mar 1, 2023
@edknv edknv self-assigned this Mar 1, 2023
@edknv edknv added ci chore Maintenance for the repository labels Mar 1, 2023
@edknv edknv added this to the Merlin 23.03 milestone Mar 1, 2023
@@ -15,9 +15,9 @@ commands =
deps =
-rrequirements/test.txt
setenv =
CUDA_VISIBLE_DEVICES=0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without setting CUDA_VISIBLE_DEVICES=0, we get an error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Device ordinals must be set for all virtual devices or none. But the device_ordinal is specified for 1 while previous devices didn't have any set.

We didn't have to set this with TF 2.9. Is it specific to TF 2.10? Is there a better way to handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If would like to run these tests with only 1 GPU, we can also change the runner we use in the workflow config. Currently set to 2GPU

runs-on: 2GPU

Comment on lines -65 to -66
_ = model.evaluate(valid)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because this is unnecessary for these tests, and evaluate() is already tested in one of the other unit tests.

@edknv edknv requested a review from jperez999 March 1, 2023 08:13
@edknv edknv marked this pull request as ready for review March 1, 2023 08:13
@edknv edknv merged commit f14fb4d into NVIDIA-Merlin:main Mar 1, 2023
@edknv edknv deleted the ci/tox_site_packages branch March 1, 2023 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Maintenance for the repository ci
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants