You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The GNN convolutions generates a lot of memory expansion during message passing and hard to make an optimal use of parallelization resources due to the sparsity, which result in insufficient computational performance and too high peak memory. geSpMM, geSDDMM integrate graph operator, matrix calculation and reduce operator into one sparse kernel, to reduce kernel launch times and usage of memory, and then improve performance.
Objective
Distributed Sparse Backend using Sparse Matrix Multiplication to express convolutions in GNN, replacing the commonly used Message Passing paradigm, and supporting high distributed sparse convolution.
Moreover, we can optimize the parallel implementation of the kernel based on the sparsity and feature dimensions of the input data.
When the graph data or model is too large, we can use data parallelism, model parallelism, and pipeline parallelism for distributed optimization.
Tasks
This work includes the following major tasks, we will enrich each specific task into detailed subtasks.
Phase 1: Implementations
Sparse Matrix representation: Convert graph data in GNN into sparse matrix format for efficient matrix computation like multiplication, softmax...
Sparse Matrix computation kernels: like geSpMM, geSDDMM, EdgeSoftmax..
GNN models: Implement basic GNN models and LLM-GNN models with Sparse kernels to improve computation efficiency and reduce peak memory .
Distributed sparse modules: For commonly used GNN models, using DP, MP, PP to implement the most efficient distributed sparse convs, just like Megatron.
Phase 2: Performance optimizations
Kernel optimization: Optimize parallelization of kernels for different workloads, half-precision and mixed-precision.
Computation graph capture and compilation optimization: using TorchDynamo or other techniques to capture GNN operators and dynamic sparse shapes, enrich HLO to support lowering the sparse kernels mentioned above, and optimize based on input graph.
Memory optimization: using techniques like CPU offload-ZERO.
Distributed optimization: more efficient parallelism, cache..
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
馃殌 The feature, motivation and pitch
Background
The GNN convolutions generates a lot of memory expansion during message passing and hard to make an optimal use of parallelization resources due to the sparsity, which result in insufficient computational performance and too high peak memory.
geSpMM
,geSDDMM
integrate graph operator, matrix calculation and reduce operator into one sparse kernel, to reduce kernel launch times and usage of memory, and then improve performance.Objective
Distributed Sparse Backend using Sparse Matrix Multiplication to express convolutions in GNN, replacing the commonly used Message Passing paradigm, and supporting high distributed sparse convolution.
Moreover, we can optimize the parallel implementation of the kernel based on the sparsity and feature dimensions of the input data.
When the graph data or model is too large, we can use data parallelism, model parallelism, and pipeline parallelism for distributed optimization.
Tasks
This work includes the following major tasks, we will enrich each specific task into detailed subtasks.
Phase 1: Implementations
Phase 2: Performance optimizations
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: