Use globally insatlled installed packages for GPU tests #1011

edknv · 2023-03-01T04:37:02Z

For GPU tests, we should use globally installed packages as we don't install cudf, cupy, etc. in the test environment. By setting sitepackages=true in tox, tox will use the packages in the CI container if not available in the environment. We had sitepackages=true before but it got dropped at some point. This PR restores sitepackages=true in the tox test py38-gpu.

github-actions · 2023-03-01T04:42:48Z

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-1011

edknv · 2023-03-01T08:11:44Z

tox.ini

@@ -15,9 +15,9 @@ commands =
 deps =
    -rrequirements/test.txt
 setenv =
+    CUDA_VISIBLE_DEVICES=0


Without setting CUDA_VISIBLE_DEVICES=0, we get an error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Device ordinals must be set for all virtual devices or none. But the device_ordinal is specified for 1 while previous devices didn't have any set.

We didn't have to set this with TF 2.9. Is it specific to TF 2.10? Is there a better way to handle this?

If would like to run these tests with only 1 GPU, we can also change the runner we use in the workflow config. Currently set to 2GPU

models/.github/workflows/gpu-ci.yml

Line 15 in e2276f4

runs-on: 2GPU

edknv · 2023-03-01T08:12:58Z

tests/unit/lightfm/test_lightfm.py

-    _ = model.evaluate(valid)
-


Removed because this is unnecessary for these tests, and evaluate() is already tested in one of the other unit tests.

edknv added 2 commits February 28, 2023 20:05

testing if cudf is availble in gpu-ci

a6a14c4

set sitepackages=true in tox

34a94ed

edknv changed the title ~~User globally insatlled installed packages for GPU tests~~ Use globally insatlled installed packages for GPU tests Mar 1, 2023

edknv self-assigned this Mar 1, 2023

edknv added ci chore Maintenance for the repository labels Mar 1, 2023

edknv added this to the Merlin 23.03 milestone Mar 1, 2023

edknv added 6 commits February 28, 2023 21:01

remove cudf availability test

6a3da92

test cuda_visible_devices

2621c4b

set cuda_visible_devices=0

be56632

restore commented out code

1249c09

fix tests

b2763f9

remove unnecessary evalute in lightfm tests

b35ca7f

edknv commented Mar 1, 2023

View reviewed changes

edknv requested a review from jperez999 March 1, 2023 08:13

edknv marked this pull request as ready for review March 1, 2023 08:13

marcromeyn approved these changes Mar 1, 2023

View reviewed changes

oliverholworthy mentioned this pull request Mar 1, 2023

Install gpu requirements in tox config #1005

Closed

1 task

edknv merged commit f14fb4d into NVIDIA-Merlin:main Mar 1, 2023

edknv deleted the ci/tox_site_packages branch March 1, 2023 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use globally insatlled installed packages for GPU tests #1011

Use globally insatlled installed packages for GPU tests #1011

edknv commented Mar 1, 2023 •

edited

Loading

github-actions bot commented Mar 1, 2023

edknv Mar 1, 2023

oliverholworthy Mar 1, 2023

edknv Mar 1, 2023

Use globally insatlled installed packages for GPU tests #1011

Use globally insatlled installed packages for GPU tests #1011

Conversation

edknv commented Mar 1, 2023 • edited Loading

github-actions bot commented Mar 1, 2023

Documentation preview

edknv Mar 1, 2023

Choose a reason for hiding this comment

oliverholworthy Mar 1, 2023

Choose a reason for hiding this comment

edknv Mar 1, 2023

Choose a reason for hiding this comment

edknv commented Mar 1, 2023 •

edited

Loading