Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Distributed Sparse Backend #36

Open
8 tasks
Seventeen17 opened this issue May 24, 2023 · 0 comments
Open
8 tasks

[Feat] Distributed Sparse Backend #36

Seventeen17 opened this issue May 24, 2023 · 0 comments

Comments

@Seventeen17
Copy link
Collaborator

馃殌 The feature, motivation and pitch

Background

The GNN convolutions generates a lot of memory expansion during message passing and hard to make an optimal use of parallelization resources due to the sparsity, which result in insufficient computational performance and too high peak memory.
geSpMM, geSDDMM integrate graph operator, matrix calculation and reduce operator into one sparse kernel, to reduce kernel launch times and usage of memory, and then improve performance.

Objective

Distributed Sparse Backend using Sparse Matrix Multiplication to express convolutions in GNN, replacing the commonly used Message Passing paradigm, and supporting high distributed sparse convolution.

Moreover, we can optimize the parallel implementation of the kernel based on the sparsity and feature dimensions of the input data.
When the graph data or model is too large, we can use data parallelism, model parallelism, and pipeline parallelism for distributed optimization.

Tasks

This work includes the following major tasks, we will enrich each specific task into detailed subtasks.

Phase 1: Implementations

  • Sparse Matrix representation: Convert graph data in GNN into sparse matrix format for efficient matrix computation like multiplication, softmax...
  • Sparse Matrix computation kernels: like geSpMM, geSDDMM, EdgeSoftmax..
  • GNN models: Implement basic GNN models and LLM-GNN models with Sparse kernels to improve computation efficiency and reduce peak memory .
  • Distributed sparse modules: For commonly used GNN models, using DP, MP, PP to implement the most efficient distributed sparse convs, just like Megatron.

Phase 2: Performance optimizations

  • Kernel optimization: Optimize parallelization of kernels for different workloads, half-precision and mixed-precision.
  • Computation graph capture and compilation optimization: using TorchDynamo or other techniques to capture GNN operators and dynamic sparse shapes, enrich HLO to support lowering the sparse kernels mentioned above, and optimize based on input graph.
  • Memory optimization: using techniques like CPU offload-ZERO.
  • Distributed optimization: more efficient parallelism, cache..

Alternatives

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant