bugs in minibatch trainning #131

suxnju · 2022-08-16T19:40:35Z

🐛 Bug

To Reproduce

error occurred in _mini_train_step function in trainerflow/node_classification.py when use mini_batch_flag in node_classification task and SimpleHGN model

import argparse
from openhgnn.experiment import Experiment

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', '-m', default='SimpleHGN', type=str, help='name of models')
    parser.add_argument('--task', '-t', default='node_classification', type=str, help='name of task')
    # link_prediction / node_classification
    parser.add_argument('--dataset', '-d', default='imdb4MAGNN', type=str, help='name of datasets')
    parser.add_argument('--gpu', '-g', default='0', type=int, help='-1 means cpu')
    parser.add_argument('--use_best_config', action='store_true', help='will load utils.best_config')
    parser.add_argument('--load_from_pretrained', action='store_true', help='load model from the checkpoint')
    args = parser.parse_args()

    experiment = Experiment(model=args.model, dataset=args.dataset, task=args.task, gpu=args.gpu,
                            use_best_config=args.use_best_config, load_from_pretrained=args.load_from_pretrained, mini_batch_flag = True, batch_size=64)
    experiment.run()

Expected behavior

Minibatch training on a large heterograph

Environment

torch==1.12.1
dgl-cu113==0.9.0 # for CUDA support
openhgnn==0.3.0
Linux
Python 3.8.13

Additional context

the default minibatch sampler is MultiLayerFullNeighborSampler
the blocks is a list (line 164) and the expected input in the forward function of the model (e.g. SimpleHGN) is a hg(line 159)

for i, (input_nodes, seeds, blocks) in enumerate(loader_tqdm):
    blocks = [blk.to(self.device) for blk in blocks]
    ...
    logits = self.model(blocks, emb)[self.category]

def forward(self, hg, h_dict):
    with hg.local_scope():
        hg.ndata['h'] = h_dict

The text was updated successfully, but these errors were encountered:

dddg617 · 2022-08-17T02:49:17Z

Currently, many models do not support mini-batch training, we are now trying to fix this. You may refer to RGCN.py to support mini-batch. However, models like SimpleHGN, HGT, and HetSANN may have more trouble as these models need dgl.to_homogeneous. As far as I know, this API has bugs when using mini-batch and we are reporting this to DGL Team.

suxnju · 2022-09-02T05:52:00Z

Currently, many models do not support mini-batch training, we are now trying to fix this. You may refer to RGCN.py to support mini-batch. However, models like SimpleHGN, HGT, and HetSANN may have more trouble as these models need dgl.to_homogeneous. As far as I know, this API has bugs when using mini-batch and we are reporting this to DGL Team.

Thank you for your reply. Finally, I solved the minibatch training problem in the context of my scenario, in short, I use dgl.dataloading.GraphDataLoader method as my dataset has many small graphs.

But I find a very strange little problem, the process does not shut down properly.

To Produce

from openhgnn.config import Config

config = Config(file_path="./model/config.ini", model="SimpleHGN", dataset="imdb4MAGNN", task="node_classification",gpu=2)

print(config)

The console output is

[Config Info]   Model: SimpleHGN,       Task: node_classification,      Dataset: imdb4MAGNN

But the program does not shut down. File ./model/config.ini is copied from openhgnn/config.ini.

I still use the following code, It can end normally.

import configparser
import numpy as np
import torch as th
import sys

print(111)

suxnju closed this as completed Sep 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugs in minibatch trainning #131

bugs in minibatch trainning #131

suxnju commented Aug 16, 2022

dddg617 commented Aug 17, 2022

suxnju commented Sep 2, 2022

bugs in minibatch trainning #131

bugs in minibatch trainning #131

Comments

suxnju commented Aug 16, 2022

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

dddg617 commented Aug 17, 2022

suxnju commented Sep 2, 2022

To Produce