Release v0.9.0 · dmlc/dgl

This is a major update with several new features including graph prediction pipeline in DGL-Go, cuGraph support, mixed precision support, and more.

Starting from 0.9 we also ship arm64 builds for Linux and OSX.

DGL-Go

DGL-Go now supports training GNNs for graph property prediction tasks. It includes two popular GNN models – Graph Isomorphism Network (GIN) and Principal Neighborhood Aggregation (PNA). For example, to train a GIN model on the ogbg-molpcba dataset, first generate a YAML configuration file using command:

dgl configure graphpred --data ogbg-molpcba --model gin

which generates the following configuration file. Users can then manually adjust the configuration file.

version: 0.0.2
pipeline_name: graphpred
pipeline_mode: train
device: cpu                     # Torch device name, e.g., cpu or cuda or cuda:0
data:
    name: ogbg-molpcba
    split_ratio:                # Ratio to generate data split, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
model:
     name: gin
     embed_size: 300            # Embedding size
     num_layers: 5              # Number of layers
     dropout: 0.5               # Dropout rate
     virtual_node: false        # Whether to use virtual node
general_pipeline:
    num_runs: 1                 # Number of experiments to run
    train_batch_size: 32        # Graph batch size when training
    eval_batch_size: 32         # Graph batch size when evaluating
    num_workers: 4              # Number of workers for data loading
    optimizer:
        name: Adam
        lr: 0.001
        weight_decay: 0
    lr_scheduler:
        name: StepLR
        step_size: 100
        gamma: 1
    loss: BCEWithLogitsLoss
    metric: roc_auc_score
    num_epochs: 100             # Number of training epochs
    save_path: results          # Directory to save the experiment results

Alternatively, users can fetch model recipes of pre-defined hyperparameters for the original experiments.

dgl recipe get graphpred_pcba_gin.yaml

To launch training:

dgl train --cfg graphpred_ogbg-molpcba_gin.yaml

Another addition is a new command to conduct inference of a trained model on some other dataset. For example, the following shows how to apply the GIN model trained on ogbg-molpcba to ogbg-molhiv.

# Generate an inference configuration file from a saved experiment checkpoint
dgl configure-apply graphpred --data ogbg-molhiv --cpt results/run_0.pth

# Apply the trained model for inference
dgl apply --cfg apply_graphpred_ogbg-molhiv_pna.yaml

It will save the model prediction in a CSV file like below

Mixed Precision

DGL is compatible with the PyTorch Automatic Mixed Precision (AMP) package for mixed precision training, thus saving both training time and GPU memory consumption. This feature requires PyTorch 1.6+ and Python 3.7+.

By wrapping the forward pass with torch.cuda.amp.autocast(), PyTorch automatically selects the appropriate data type for each op and tensor. Half precision tensors are memory efficient, most operators on half precision tensors are faster as they leverage GPU tensorcores.

import torch.nn.functional as F
from torch.cuda.amp import autocast

def forward(g, feat, label, mask, model):
      with autocast(enabled=True):
            logit = model(g, feat)
            loss = F.cross_entropy(logit[mask], label[mask])
            return loss

Small gradients in float16 format have underflow problems (flush to zero). PyTorch provides a GradScaler module to address this issue. It multiplies the loss by a factor and invokes backward pass on the scaled loss to prevent the underflow problem. It then unscales the computed gradients before the optimizer updates the parameters. The scale factor is determined automatically.

from torch.cuda.amp import GradScaler

scaler = GradScaler()

def backward(scaler, loss, optimizer):
      scaler.scale(loss).backward()
      scaler.step(optimizer)
      scaler.update()

Putting everything together, we have the example below.

import torch
import torch.nn as nn
from dgl.data import RedditDataset
from dgl.nn import GATConv
from dgl.transforms import AddSelfLoop

class GAT(nn.Module):
      def __init__(self, in_feats, num_classes, num_hidden=256, num_heads=2):
            super().__init__()
            self.conv1 = GATConv(in_feats, num_hidden, num_heads, activation=F.elu)
            self.conv2 = GATConv(num_hidden * num_heads, num_hidden, num_heads)

      def forward(self, g, h):
            h = self.conv1(g, h).flatten(1)
            h = self.conv2(g, h).mean(1)
            return h

device = torch.device('cuda')

transform = AddSelfLoop()
data = RedditDataset(transform)

g = data[0]
g = g.int().to(device)
train_mask = g.ndata['train_mask']
feat = g.ndata['feat']
label = g.ndata['label']
in_feats = feat.shape[1]

model = GAT(in_feats, data.num_classes).to(device)
model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)

for epoch in range(100):
     optimizer.zero_grad()
     loss = forward(g, feat, label, train_mask, model)
     backward(scaler, loss, optimizer)

Thanks @nv-dlasalle @ndickson-nvidia @yaox12 etc. for support!

cuGraph Interface

The RAPIDS cuGraph library provides a collection of GPU accelerated algorithms for graph analytics, such as centrality computation and community detection. According to its documentation, “the latest NVIDIA GPUs (RAPIDS supports Pascal and later GPU architectures) make graph analytics 1000x faster on average over NetworkX”.

To install cuGraph, we recommend following the practice below.

conda install mamba -n base -c conda-forge

mamba create -n dgl_and_cugraph -c dglteam -c rapidsai-nightly -c nvidia -c pytorch -c conda-forge cugraph pytorch torchvision torchaudio cudatoolkit=11.3 dgl-cuda11.3 tqdm

conda activate dgl_and_cugraph

DGL now supports compatibility with cuGraph by allowing conversion between a DGLGraph object and a cuGraph graph object, making it possible for DGL users to access efficient graph analytics implementations in cuGraph. For example, users can perform community detection on a graph with the Louvain method available in cuGraph.

import cugraph

from dgl.data import CoraGraphDataset

dataset = CoraGraphDataset()
g = dataset[0].to('cuda')
cugraph_g = g.to_cugraph()
cugraph_g = cugraph_g.to_undirected()
parts, modularity_score = cugraph.louvain(cugraph_g)

The community membership of nodes from parts['partition'] can then be used as auxiliary node labels or node features.

If you have modified the structure of a cuGraph graph object or loaded graph data with cuGraph, you can also convert it to a DGLGraph object.

import dgl
g = dgl.from_cugraph(cugraph_g)

Credits to @VibhuJawa!

Arm64 builds

Linux AArch64 and OSX M1 (arm64) are now supported. One can install them as usual with pip and conda:

pip install dgl-cuXX -f https://data.dgl.ai/wheels/repo.html
conda install -c dglteam dgl-cudaXX.X   # currently not available for OSX M1

Quality-of-life updates

Added more missing FP16 specializations (#4140, @ndickson-nvidia )
Allow communicators of size one when NCCL is missing (#3713, @nv-dlasalle )
Automatically unpin DGL tensors when out of scope to avoid potential bugs (#4135, @yaox12 )

System optimizations

Enable using UVA and FP16 with SparseAdam Optimizer (#3885, @nv-dlasalle )
Enable USE_EPOLL by default in distributed training (#4167)
Optimize the use of alternative streams in dataloader (#4177, @yaox12 )
Redirect AllocWorkspace to PyTorch's allocator if available (#4199, @yaox12 )

Bug fixes

Massive refactoring of examples including GCN, GraphSAGE, PinSAGE, EGES, DGI, GATv2, and many more (#4130, #4194, #4186, #4197, #4201, #4160, #4220, #4219, #4218, #4242, #4255, huge thanks to @chang-l!)
Fix CareGNN example to adapt to new sampler interface (#4211, @yaox12)
Fix #4150 (#4164, #4198, #4212)
Fix etype not guaranteed to be sorted in distributed training (#4156)
Fix compiler warnings (#4051, @TristonC)
Fix correct and smooth example using validation labels during prediction in validation (#4158, @LucasPrietoAl )
Fix build issues on mac OS (#4168, #4175)
Fix that pin_prefetcher is not actually enabled (#4169, @yaox12 )
Fix A Bug Related to GroupRevRes (#4181)
Fix deferred_dtype missing error (#4174, @nv-dlasalle )
Add CUDA context availability check before setting curand seed (#4223, @yaox12)
Fix dtype mismatch when copy graph into shared memory and get it back (#4222) (#4228)
Fix graph attribute missing in DataLoader when device is not specified (#4245)
Record stream when using another CUDA stream for data transfer (#4250, @yaox12 )
Fix Multiple Backwards Pass Error with retain_graph being set (#4078) (#4249)
Doc fixes (#4149, #4180, #4193, #4246, #4248, @PotatoChipsNinja @yaox12 @alxwen711 @Zhanghyi )

Misc

Test pipeline for distributed training (#4122 , @Kh4L)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0