Skip to content

Latest commit

 

History

History
28 lines (23 loc) · 2.41 KB

AGL: A Scalable System for Industrial-purpose Graph Machine Learning.md

File metadata and controls

28 lines (23 loc) · 2.41 KB

Key ideas

  • GNNs are inherently difficult to train in parallel because of the edge dependencies.
  • Alternatives are storing the whole graph in memory (very expensive) or use third party storage (very slow)
  • Instead: generate K-hop neighborhood, information complete subgraph for each node
  • Merge values of in-edge neighbors, propagate to out-edge via MapReduce
  • 2-layer GNN training with 2B nodes 100B edges in 14h.

Introduction

  • Heterogeneous graph in Ant Financial based on message passing, merging and propagation
  • Computing each node’s K-hop embeddings based on message passing:
    • merging neighbors from in-edges and propagating merged information to neighbors along out-edges.
  • Each K-hop neighborhood is a subgraph

System

Screenshot 2022-06-26 at 00 44 54

  • GraphFlat is an efficient and distributed generator, based on message passing for generating K-hop neighborhoods.
    • These neighborhoods are flattened into protobuf strings.
  • GraphTrainer leverages many techniques, such as pipeline, pruning, and edge-partition, to eliminate the overhead on I/O and optimize the floating point calculations
  • GraphInfer, a distributed inference module that splits K layer GNN models into K slices, and applies the message passing K times based on MapReduce.
  • GraphInfer maximally utilizes the embedding of each node because all the intermediate embedding at the k-th layer will be propagated to next round of message passing. This significantly boosts the inference tasks
  • We can divide an industrial-scale graph into massive of tiny K-hop neighborhoods w.r.t. their target nodes in advance, and load one or a batch of them rather than the entire graph into memory in the training phase

Screenshot 2022-06-26 at 15 54 41

Screenshot 2022-06-26 at 15 56 08

Screenshot 2022-06-26 at 15 57 18