Skip to content

Conversation

@edknv
Copy link
Contributor

@edknv edknv commented Nov 10, 2022

Hot fix for NVIDIA-Merlin/Transformers4Rec#526.

Implementation

  • With sparse tensors, create the tensor for values on the same device as the tensor for offsets.

Testing

python -m pytest tests/unit/loader/test_torch_dataloader.py
Manually tested the notebook from T4R as described in NVIDIA-Merlin/Transformers4Rec#526.
Ideally we probably want to set up automated tests for multi-gpu settings but it will probably have to be a follow-up work after the current release.

@edknv edknv added the bug Something isn't working label Nov 10, 2022
@edknv edknv self-assigned this Nov 10, 2022
@edknv edknv marked this pull request as ready for review November 10, 2022 02:38
@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/dataloader/review/pr-45

Comment on lines +26 to +27
passenv =
NR_USER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tox blocks all environment variables by default. We need to pass this in order to skip this test that uses horovod. The test was skipped previously because horovod was not installed in the ci runner image, but horovod was recently installed in the ci runner image because of multi-gpu support in Models.

@edknv edknv requested review from jperez999, nzarif and rnyak November 10, 2022 02:47
@edknv edknv merged commit db6a1fe into NVIDIA-Merlin:main Nov 10, 2022
Copy link

@nzarif nzarif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with T4Rec multi-gpu training and the fix seems to work.

edknv added a commit that referenced this pull request Nov 14, 2022
* Put values and offsets on the same device on gpu

* pass environ var in tox
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants