[Feature] Import PyTorch's CUDA stream management #4503

yaox12 · 2022-09-01T09:11:23Z

Description

Resolve #4467.

Add stream-related utilities:

Add RecordStream() for NDArray and HeteroGraph on the C++ side,
Add record_stream() for dgl.heterograph on the Python side.
Replace all runtime::CUDAThreadEntry::ThreadLocal()->stream with runtime::getCurrentCUDAStream() to ensuring we're using PyTorch's current stream. With this change, we don't need dgl.set_stream() method or dgl.cuda.stream() context, and just use torch's stream management instead. This will simplify the user experience.
Add unit tests.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR

python/dgl/heterograph.py

src/graph/unit_graph.cc

tests/pytorch/test_stream.py

yaox12 · 2022-09-07T01:13:12Z

Paste @chang-l 's comments in #4480 (comment).

I (Xin) created a method runtime::getCurrentCUDAStream() that would query the current stream from pytorch, as a replacement of runtime::CUDAThreadEntry::ThreadLocal()->stream.

(Chang): I see... in this case, I wonder (1) are we going to keep SetStream/CreateStream/... methods in DGL core library? Or, they are going to in fact set/create pytorch stream instead of cuda stream? (2) if pytorch current stream is changing outside, would that introduces any sync issue to DGL core library? (e.g., when users trying to overlap some operations between dgl and pytorch and they want to call torch.cuda.set_stream when dgl library is running? or it should almost never happen?)

chang-l · 2022-09-07T20:10:06Z

@yaox12 Thanks for pulling my comment here :)
In general, I agree on using the current stream from pytorch to launch every dgl kernels. There are two things I suggest to fix in the meanwhile:
(1) Update SetStream/CreateStream methods as well, to set/create from pytorch stream pool.
(2) Update doc to remind users that all dgl functions use the same current stream as backend (pytorch), so if they changed to another stream, they should always restore the stream before calling next dgl function, for example better using with instead of set_stream.

jermainewang · 2022-09-08T05:05:06Z

Reassigned the reviewers. For CUDA related PRs, please reach out to @isratnisa for review. Changes to TensorAdapter needs @BarclayII 's approval.

nv-dlasalle

I think we also need to delete the CUDAThreadEntry::stream member, to ensure it is not used after this change.

How do we handle the fact that PyTorch's streams are not created with the cudaStreamNonBlocking flag, so to get real parallelism, users will need to do their training on the non-default stream, and dataloading on other streams as well.

python/dgl/frame.py

src/graph/unit_graph.cc

yaox12 · 2022-09-14T02:09:50Z

I think we also need to delete the CUDAThreadEntry::stream member, to ensure it is not used after this change.

What if TensorAdaptor is not available? Should runtime::getCurrentCUDAStream() just return the default stream?

How do we handle the fact that PyTorch's streams are not created with the cudaStreamNonBlocking flag, so to get real parallelism, users will need to do their training on the non-default stream, and dataloading on other streams as well.

PyTorch's streams are created in
https://github.com/pytorch/pytorch/blob/d00cabae7bc894b951d4b2c9c24c7d95bebd86e1/c10/cuda/CUDAStream.cpp#L165-L168

    C10_CUDA_CHECK(cudaStreamCreateWithPriority(
        &lowpri_stream, kDefaultFlags, kLowPriority));
    C10_CUDA_CHECK(cudaStreamCreateWithPriority(
        &hipri_stream, kDefaultFlags, kHighPriority));

with the flag
https://github.com/pytorch/pytorch/blob/d00cabae7bc894b951d4b2c9c24c7d95bebd86e1/c10/cuda/CUDAStream.cpp#L25

static constexpr unsigned int kDefaultFlags = cudaStreamNonBlocking;

nv-dlasalle · 2022-09-14T04:14:07Z

I think we also need to delete the CUDAThreadEntry::stream member, to ensure it is not used after this change.

What if TensorAdaptor is not available? Should runtime::getCurrentCUDAStream() just return the default stream?

If tensoradapter is not available, I don't think we can safely pass streams between DGL and a backend, so I think returning the default stream is the correct behavior. I think this also makes #3998 more of a priority, since if tensoradapter fails to load we lose more than just faster memory management.

How do we handle the fact that PyTorch's streams are not created with the cudaStreamNonBlocking flag, so to get real parallelism, users will need to do their training on the non-default stream, and dataloading on other streams as well.

PyTorch's streams are created in https://github.com/pytorch/pytorch/blob/d00cabae7bc894b951d4b2c9c24c7d95bebd86e1/c10/cuda/CUDAStream.cpp#L165-L168
    C10_CUDA_CHECK(cudaStreamCreateWithPriority(
        &lowpri_stream, kDefaultFlags, kLowPriority));
    C10_CUDA_CHECK(cudaStreamCreateWithPriority(
        &hipri_stream, kDefaultFlags, kHighPriority));
with the flag https://github.com/pytorch/pytorch/blob/d00cabae7bc894b951d4b2c9c24c7d95bebd86e1/c10/cuda/CUDAStream.cpp#L25
static constexpr unsigned int kDefaultFlags = cudaStreamNonBlocking;

Sorry, you're right, I read the pytorch source wrong.

dgl-bot · 2022-09-15T08:45:44Z

Commit ID: ef1d3a3

Build ID: 16

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

* add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle

* [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by: xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by: Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by: Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by: Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c6. * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

* add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle

* Update from master (#4584) * [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by: xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by: Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by: Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by: Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c6. * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> * [Deprecation] Dataset Attributes (#4546) * Update * CI * CI * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [Example] Bug Fix (#4665) * Update * CI * CI * Update * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * Update Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

* Update from master (#4584) * [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by: xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by: Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by: Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by: Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c6. * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> * [Deprecation] Dataset Attributes (#4546) * Update * CI * CI * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [Example] Bug Fix (#4665) * Update * CI * CI * Update * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * Update * Update (#4724) Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

* Update from master (#4584) * [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by: xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by: Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by: Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by: Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c6. * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> * [Deprecation] Dataset Attributes (#4546) * Update * CI * CI * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [Example] Bug Fix (#4665) * Update * CI * CI * Update * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * Update * Update (#4724) Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * change DGLHeteroGraph to DGLGraph in DOC * revert c change Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-19.ap-northeast-1.compute.internal>

* Update from master (#4584) * [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by: xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by: Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by: Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by: Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c6. * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> * [Deprecation] Dataset Attributes (#4546) * Update * CI * CI * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [Example] Bug Fix (#4665) * Update * CI * CI * Update * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * Update * Update (#4724) Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [API Deprecation]Rename DGLHeterpGraph to DGLGraph in Py files (#4835) * rename DGLHeterpGraph to DGLGraph * [Sparse] Add sparse matrix C++ implementation (#4773) * [Sparse] Add sparse matrix C++ implementation * Add documentation * Update * Minor fix * Move Python code to dgl/mock_sparse2 * Move headers to include * lint * Update * Add dgl_sparse directory * Move src code to dgl_sparse * Add __init__.py in tests to avoid naming conflict * Add dgl sparse so in Jenkinsfile * Complete docstring & SparseMatrix basic op * lint * Disable win tests * fix lint issue * [Misc] clang-format auto fix. (#4831) * [Misc] clang-format auto fix. * blabla * nolint * blabla Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Dist] enable access DistGraph.edges via canonical etype (#4814) * [Dist] enable access DistGraph.edges via canonical etype * refine code * refine test * refine code * Reading files in chunks to reduce the memory footprint of pyarrow (#4795) All tasks completed. * [Dist] Create <graph_name>_stats.txt file if it does not exist before ParMETIS execution (#4791) * check if stats file exists, if not create one before parmetis run * correct the typo error and correctly use constants.GRAPH_NAME * alltoall returns tensor list with None values, which is failing torch.cat(). (#4788) * replace batch_hetero * [Misc] Add // NOLINT for the very long code. (#4834) * alternative * fix * remove_todo * blabl * ablabl Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * fix (#4841) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * a better way to init threadlocal prng (#4808) Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com> * [DIST] Message size to retrieve SHUFFLE_GLOBAL_NIDs is resulting in very large messages and resulting in killed process (#4790) * Send out the message to the distributed lookup service in batches. * Update function signature for allgather_sizes function call. * Removed the unnecessary if statement . * Removed logging.info message, which is not needed. * [Misc] Minor code style fix. (#4843) * [Misc] Change the max line length for cpp to 80 in lint. * blabla * blabla * blabla * ablabla Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Sparse] Lint C++ files (#4845) * [Dist] Fix typo of metis preprocess in dist partitin pipeline * Fix ogb/ogbn-mag/heter-RGCN example (#4839) Co-authored-by: Mufei Li <mufeili1996@gmail.com> * fix issue * [Misc] Update cpplint. (#4844) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-19.ap-northeast-1.compute.internal> Co-authored-by: czkkkkkk <zekucai@gmail.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: kylasa <kylasa@gmail.com> Co-authored-by: Muhammed Fatih BALIN <m.f.balin@gmail.com> Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Mufei Li <mufeili1996@gmail.com> * [API Deprecate]Remove as_heterograph and as_immutable_graph (#4851) * Remove batch_hetero and unbatch_hetero * [API Deprecate]Rename DGLHeterpGraph to DGLGraph in python files (#4833) * remove training new line * remove node_attrs and edge_attrs in batch.py (#4890) * [API Deprecation] Remove copy_src,copy_edge,src_mul_edge in dgl.function (#4891) * remove emb_tensor in NodeEmbedding (#4892) * [API Deprecation]Remove 5 APIs in DGLGraph (#4902) * [API Deprecation]Remove APIs in old dgl section in DGLGraph (#4901) * [API Deprecation]Remove adjacency_matrix_scipy and inplace args in candidates (#4895) * [API Deprecation] Remove add_edge in DGLGraph (#4894) * [API Derepcation]Remove __contains__ in DGLGraph (#4937) * [API Deprecation]Remove edge_id() and force_multi argument in edge_ids() (#4896) * fix issue * remove deprecated_kwargs * remove unused import Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-19.ap-northeast-1.compute.internal> Co-authored-by: czkkkkkk <zekucai@gmail.com> Co-authored-by: kylasa <kylasa@gmail.com> Co-authored-by: Muhammed Fatih BALIN <m.f.balin@gmail.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

yaox12 added 7 commits August 29, 2022 11:36

add set_stream

86a70e8

add .record_stream for NDArray and HeteroGraph

de30736

refactor dgl stream Python APIs

1c13dce

test record_stream

0f3124e

Merge branch 'master' into add_stream_utility

a6f02b3

add unit test for record stream

1224fb5

Merge branch 'master' into add_stream_utility

fab2243

yaox12 requested review from Rhett-Ying, BarclayII, chang-l, jermainewang and nv-dlasalle September 1, 2022 09:11

yaox12 mentioned this pull request Sep 1, 2022

[Bug] CUDA stream synchronization issue between pytorch and DGL internal functions #4436

Closed

3 tasks

nv-dlasalle reviewed Sep 1, 2022

View reviewed changes

python/dgl/heterograph.py Outdated Show resolved Hide resolved

src/graph/unit_graph.cc Show resolved Hide resolved

tests/pytorch/test_stream.py Outdated Show resolved Hide resolved

Merge branch 'master' into add_stream_utility

5f8551e