Artifacts for Spara

This repository contains the source code of Spara, an efficient distributed GNN training system. Scripts to reproduce the Figures and Tables in the paper are also provided.

Project Structure

Spara is highly integrated into DGL. The Python part for Spara is in the python/dgl/dev directory whereas the cpp source code are in the src/spara directory.

Software

We assume you use a Linux machine.

How to build

Install Docker

Follow the instructions from this website to install docker.

Install Nividia CUDA Driver

wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run
sudo sh cuda_12.3.2_545.23.08_linux.run --silent --driver
sudo reboot

Option 1: Using Docker

Install NVIDIA Container Toolkit

We recommend using docker to create the development environment. To do this, you need to configure docker to use nvidia GPUs inside a container. After installing Docker and the Nvidia driver, you can follow the remaining instructions to install the nvidia container toolkit.

Configure the production repository of nvidia-container-toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install and configure the container runtime by using the nvidia-ctk command:

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Build the Docker Image

sudo docker build -t spara:latest .

Run the docker image

$sudo docker run --shm-size=180GB --gpus all -itd spara:latest
<container_id>
$sudo docker exec -it <container_id> /bin/bash

Option 2: Not Using Docker

conda create -n spara python=3.10
conda activate spara

# install pytorch v2.0
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

# install pyg
pip install torch_geometric

# install pyg lib
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html

# install dependencies
pip install torchmetrics jupyterlab numpy matplotlib pandas ogb

Then, install CUDA toolkit 11.8 and set CUDA_HOME environment variable accordingly. Also install a C++ compiler, CMake, and Ninja, which will be used to build the source code.

Build Spara from the source

Inside the docker container, you can find all the source code in the /spara directory.

Then, you can build the project by running:

./build.sh

Then, check if you have installed Spara correctly:

python -c "import dgl; print(dgl.__path__)"

The output should be something like ['/conda/lib/python3.10/site-packages/dgl'], indicating you have installed Spara correctly.

Download Dataset

Option I: Download Raw Dataset and Process Them

Please refer to README.md for more details.

Option II: Download Preprocessed Dataset

Item	Description	Download Size	Uncompressed Size
Graphs	It contains graph topology, label, node and edge weights for produces, paper100M, orkut and friendster.	75GB	144GB
Partition Maps	Contains partition maps for produces, paper100M, orkut and friendster using different combination of nodes and edge weights	1GB	4.1GB

You can use the download.sh script to download the dataset and save it to ./dataset directory. The graph topology data will be in ./dataset/graph directory whereas the partition maps will be inside ./dataset/graph/partition_map directory. You can change the dataset_dir variable inside the env.sh script to change the default downloading folder. (Notice that the partition_map directory must be inside graph directory so the dataloading function can find the maps correctly.)

Artifact Execution

S1 The first step is to obtain the input datasets, which include the graph topology data and partition maps. We provide pre-processed datasets that can be downloaded from Amazon S3. The script download.sh can be used to download those files automatically. Alternatively, you can use scripts in the repository to generate the prepared datasets. Notice that this would take several days.
S2 After obtaining the prepared datasets, you can run the main experiment running the bash scripts experi- ment/script/main.sh. This script runs all the baselines and generates the log file. (Expected time: 120 min, depends on S1.)
S3 Postprocess the logs from S2. Generate Figures 3 using the notebook plot/time breakdown. (depends on S2).
S4 Postprocess the logs from S2 using the notebook plot/main to generate Table 3. (depends on S2).
S5 Run the sampling simulation experiment/sample main, to generate the varying edges computed and features loaded for varying batch sizes and graphs to generate the datapoints in table 1. (Expected time: 30min, depends on S1)
S6 Run the script experiment/scripts/ablation.sh to run all the ablation experiments on papers graph. (depends on S1
S7 Postproces the logs generated in the previous step with the jupyter notebook plot/final_ablation (Expected time: 3 hours, depends on S6)
S8 : Run the python file experiment/simulate main for the friendster and various partitioning schemes to collect the workload characteristics. (command line arguments details provided in the README).(depends on S1)
S9 : Post process the workloads generated in step S8 with the notebook plot/simulation plot to generate Figures 5
S10 : Run the script experiment/partition ablation to col- lect the training logs for varying partitioning strategies. (depends S1)
S11 : Run the notebook plot/partitioning to generate table 4 from the training logs. (depends on S10)

Name		Name	Last commit message	Last commit date
Latest commit History 3,749 Commits
.github		.github
apps/life_sci		apps/life_sci
benchmarks		benchmarks
cmake		cmake
conda/dgl		conda/dgl
dgl_sparse		dgl_sparse
dglgo		dglgo
docker		docker
docs		docs
examples		examples
experiment		experiment
graphbolt		graphbolt
include/dgl		include/dgl
notebooks		notebooks
plot		plot
python		python
script		script
src		src
tensoradapter		tensoradapter
tests		tests
third_party		third_party
tools		tools
tutorials		tutorials
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.lintrunner.toml		.lintrunner.toml
CMakeLists.txt		CMakeLists.txt
CONTRIBUTORS.md		CONTRIBUTORS.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
NEWS.md		NEWS.md
README.md		README.md
build.sh		build.sh
conda_requirements.txt		conda_requirements.txt
dockerfile		dockerfile
download.sh		download.sh
init.sh		init.sh
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml

License

Juelin-Liu/dgl

Folders and files

Latest commit

History

Repository files navigation

Artifacts for Spara

Project Structure

Software

How to build

Install Docker

Install Nividia CUDA Driver

Option 1: Using Docker

Install NVIDIA Container Toolkit

Build the Docker Image

Run the docker image

Option 2: Not Using Docker

Build Spara from the source

Download Dataset

Option I: Download Raw Dataset and Process Them

Option II: Download Preprocessed Dataset

Artifact Execution

Experiment Result Generation

Data Analysis and Visualization

About

Resources

License

Stars

Watchers

Forks

Languages