This repository is the official implementation of MSPipe: Efficient Temporal GNN Training via Staleness-aware Pipeline
Our development environment:
- Ubuntu 20.04LTS
- g++ 9.4
- CUDA 11.3 / 11.6
- cmake 3.23
Dependencies:
- torch >= 1.10
- dgl (CUDA version)
Compile and install the MSPipe:
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py install
For debug mode,
DEBUG=1 pip install -v -e .
Compile and install the TGL (presample version):
cd tgl
python setup_tgl.py build_ext --inplace
cd scripts/ && ./download_data.sh
MSPipe
Training TGN model on the REDDIT dataset with MSPipe on 4 GPUs.
cd scripts
./run_offline.sh TGN REDDIT 4
Presample (TGL)
Training TGN model on the REDDIT dataset with Presample on 4 GPUs.
cd tgl
./run_tgl.sh TGN REDDIT 4
Distributed training
Training TGN model on the GDELT dataset on more than 1 servers, each server is required to do the following step:
- change the
INTERFACE
to your netcard name (can be found usingifconfig
) - change the
HOST_NODE_ADDR
: IP address of the host machineHOST_NODE_PORT
: The port of the host machineNNODES
: Total number of serversNPROC_PER_NODE
: The number of GPU for each servers
cd script
./run_offline_dist.sh TGN GDELT