Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda_filter set cudaStream in global scope breaking multi-stream support #5153

Closed
nv-dlasalle opened this issue Jan 11, 2023 · 0 comments · Fixed by #5157
Closed

cuda_filter set cudaStream in global scope breaking multi-stream support #5153

nv-dlasalle opened this issue Jan 11, 2023 · 0 comments · Fixed by #5157
Assignees
Labels
Release Blocker Issues that blocks release

Comments

@nv-dlasalle
Copy link
Collaborator

🐛 Bug

This line:
https://github.com/dmlc/dgl/blob/master/src/array/cuda/cuda_filter.cu#L21
Should be inside of the function calls so as to capture the current cuda stream.

I believe this has been an issue since multi-stream support was officially added, as what was once a constexpr default stream declared globally got converted to a call to get the current cuda stream by #4480 and #4503, but remained in global scope.

This bug essentially causes undefined behavior as alternate cuda streams may access uninitialized GPU memory.

@nv-dlasalle nv-dlasalle self-assigned this Jan 11, 2023
@frozenbugs frozenbugs added the Release Blocker Issues that blocks release label Jan 12, 2023
yaox12 added a commit that referenced this issue Jan 12, 2023
…5153) (#5157)

* Add failing unit test

* Add fix

* Remove extra newline

* skip cpu test

Co-authored-by: Xin Yao <yaox12@outlook.com>
frozenbugs added a commit that referenced this issue Jan 17, 2023
* [Bugfix] Fix the call of tar.extractall() method (#5139)

* [Sparse] Add elementwise mul and div method, and polish the python doc. (#5142)

* add mul and div

* polish

* equivalent

* indent

* double

Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Polish (#5149)

Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* polish (#5150)

Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* blabla (#5152)

Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* typo (#5133)

* fix (#5151)

Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Sparse] Reduce peak memory of SpSpAdd. (#5143)

* [Sparse] Reduce peak memory usage of SpSpAdd

* Update

* Update

* [Sparse] Use efficient implementation for Diag @ Sparse and Sparse @ Diag. (#5147)

* change regex (#5158)

* [Bugfix] Replace global cudaStream in Filter with runtime calls (fix #5153) (#5157)

* Add failing unit test

* Add fix

* Remove extra newline

* skip cpu test

Co-authored-by: Xin Yao <yaox12@outlook.com>

* [Sparse] Use NotImplemented to let Python dispatch types (#5160)

* use NotImplemented

* add TypeError tests

* format

* lint

* fix type

* fix docstring

* lint

* remove redundant condition

* Delete train_sampling_unsupervised.py (#5161)

* [CI] fix device configure when run on GPU (#5154)

* [Doc] remove deprected Node/EdgeDataLoader (#5171)

* [Sparse] Fix sddmm docstring (#5169)

* [Sparse] Make functions compatible with PyTorch scalar tensors (#5163)

* use NotImplemented

* format

* extend to pytorch scalar

* reformat

* reformat

* lint

* [Sparse] Rename as_sparse to to_sparse, dense to to_dense. (#5170)

* as_sp_to_sp

* dense

* revert_mock

* test

* revert

Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Sparse] Refactor matmul interface. (#5162)

* [Sparse] Refactor matmul interface.

* Update

* [Sparse] Support sparse matrix dividing scalar (#5173)

* use NotImplemented

* format

* extend to pytorch scalar

* sparse div scalar

* oops

* Apply suggestions from code review

Co-authored-by: Mufei Li <mufeili1996@gmail.com>

Co-authored-by: Mufei Li <mufeili1996@gmail.com>

* [Sparse] Use X to represent dense tensors in sddmm (#5174)

* [Sparse] Use X to represent dense tensors in sddmm

* indent

* pylint: disable=invalid-name

Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>

* [Sparse] Sparse matrix subtraction (#5164)

* use NotImplemented

* format

* sparse sub

* address comments

* lint

* Update elementwise_op_diag.py

* ugh

Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

* [BUILD] enable dgl sparse build in default (#5175)

* enable dgl sparse build in default

* disable sparse build on cugraph

* update

* [Doc] sparse lib tutorial and colab links (#5181)

* Created using Colaboratory

* nbsphinx & nblink

* Created using Colaboratory

* update notebook

* minor update

Co-authored-by: Chang Liu <chang.liu@utexas.edu>
Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: czkkkkkk <zekucai@gmail.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this issue Mar 12, 2024
…mlc#5153) (dmlc#5157)

* Add failing unit test

* Add fix

* Remove extra newline

* skip cpu test

Co-authored-by: Xin Yao <yaox12@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release Blocker Issues that blocks release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants