How can one add a new algorithm and benchmark it with GOOD? #4

LFhase · 2022-08-26T13:08:06Z

Hi GOOD authors,

Thanks for your impressive amount of work on developing the GOOD benchmark. As one who also works with OOD algorithms for graph data, I believe this benchmark could provide more insights for future developments in this field. Recently, I am trying to add and benchmark my graph OOD algorithm (which happens to have the same name as GOOD 🤣) with the GOOD benchmark:

However, when reading through the code and the documents, I find it seems there is no explicit description of the pipeline for one to add a new algorithm and benchmark with GOOD. For example,

How can one add OOD algorithm-specific parameters into the pipeline and use them in the code?
After adding the new algorithm, how can one evaluate it with all datasets in the GOOD benchmark?
How can one sweep the hyperparameters based on the GOOD benchmark?
How can one compare different algorithms in a fair way with the GOOD benchmark?

It could facilitate the follow-up developments with the GOOD benchmark if the authors could provide a convenient way along with an explicit description for adding and benchmarking new algorithms. Going through the OOD literature, I find DomainBed seems to be a good example that provides both a rigorous evaluation and a convenient new algorithm integration pipeline. I believe the GOOD benchmark would make much more impact on the community if you could add the corresponding features to the existing code : )

Best Regards,
Andrew

CM-BF · 2022-08-26T21:12:02Z

Hi Andrew,

Your suggestions are beneficial! We noticed your GOOD algorithm, too:smile:. Recently, we have added DIR, EERM, and SRGNN into our baselines in dev branch. During this period, we continue improving our code to better facilitating development. We will begin building a better documentation for contributions as soon as we finish current merging process.
We will manage to make adding new algorithms like a breeze. Since it will take some times to improve, I would like to first answer your questions as quick solutions:

How can one add OOD algorithm-specific parameters into the pipeline and use them in the code?

Let's take the GOOD-Motif as an example. The first thing I will do is copy this ERM config file as my_algorithm.yaml in the same folder. Then I'll change the config file as following:

includes:
  - base.yaml
model:
  model_name: GIN
ood:
  ood_alg: my_algorithm #ERM
  ood_param: 100 # if I only have one OOD specific parameter
  extra_param:      # More OOD specific parameters will be included in this extra parameter list.
    - 1e-3
    - 0.5
  DIR_ratio: 0.6     # If one wants to access parameters as attributes instead of elements in the extra_param list, I will recommend to create a new named parameter like this.
train:
  max_epoch: 200
  lr: 1e-3
  mile_stones: [150]

If one chooses to add, for example, DIR_ratio, one also needs to add it as a class attribute in class OODArgs in args.py for code prompts and documentations.

To use these new parameters in the code, we need to have the access of config, where config = config_summoner(args_parser()) as shown in pipeline.py. Then we can access them, for example:

loss = loss + config.ood.ood_param * IRM_loss
p = config.train.epoch * config.ood.extra_param[0]
num_subgraph_node = num_node * config.ood.DIR_ratio

When it is time to run the new algorithm, the command is:

goodtg --config_path GOOD_configs/GOODMotif/basis/covariate/my_algorithm.yaml

After adding the new algorithm, how can one evaluate it with all datasets in the GOOD benchmark?

The new algorithm will be automatically evaluated when it inherits BaseOODAlg. IRM can be a good example if one tries to modify loss. Mixup is a good example if one is developing a data augmentation method. In your case, I think DIR may be a good example? You may be also interested in how to modified the model structure. Here is the DIR's example.

The BaseOODAlg acts like a pipeline manager, working for all the important logics except the main pipeline stream train.py. We can see how our ood algorithm class's instance ood_algorithm.XXX affects the func:train_batch, func:train, and func:evaluate. And the modified GNN is also interacting with OOD algorithms here. For example, if the GNN has several outputs, the BaseOODAlg:output_postprocess should tackle it:

Save necessary extra results into instance attributes.
Leave only the output that can be used in both training/inference time.

Actually, we are trading off between POP (Procedural Oriented Programming) and OOP (Object-Oriented Programming). We will consider incorporate the pipeline as a part of OOD algorithm, if more flexible training/test pipelines are needed.

Currently, the overall structure is a main pipeline (training/val/test) embedding with three types of modules (datasets/networks/ood_algorithms). Therefore, except for adding new arguments, the major change should locate at a new network file with a registered GNN class and a new ood_algorithm file with a registered algorithm class.

How can one sweep the hyperparameters based on the GOOD benchmark?

We have some small scripts to do that. We will organize them as a part of GOOD soon.

How can one compare different algorithms in a fair way with the GOOD benchmark?

First, the algorithm should do proper valuable control by themselves. For example, if the novelty of the algorithm does not locate at a new GNN encoder, the algorithm should follow the GNN encoder used by others. This can be simply done by setting the same model.model_name in config files. Second, setting the argument exp_round from 1 to 10 for ten-round experiments. Generally 3+ rounds experiment is acceptable.

Thank you again for your valuable suggestions! We would like to share our roadmap or TODO list in README. Please keep an eye on our work for updates! 😊

LFhase · 2022-09-04T03:49:02Z

Hi GOOD team, thanks for your detailed answers! I have similar questions as in #5 . It seems in the current version, we have to manually sweep the hyperparameters and collect the results of different runs. Maybe I missed something. Is there any convenient way to do so?

CM-BF · 2022-09-04T23:56:50Z

Hi Andrew,

As I mentioned in #5, the best way to do it in the current version is to manage the log files. We need to assign one log file for each run. The way of saving log files may be helpful. Therefore, we wrote a script to read log files in the same way as how we save them.

A more convenient way to do it is to rerun the same config file with a different --task option. For example, modifying the load_task function and use this function to return the results for your hyperparameters sweeping:

def load_task(task: str, model: torch.nn.Module, loader: DataLoader, ood_algorithm: BaseOODAlg,
              config: Union[CommonArgs, Munch]):
    r"""
    Launch a training or a test. (Project use only)
    """
    if task == 'train':
        train(model, loader, ood_algorithm, config)

    elif task == 'test':

        # config model
        print('#D#Config model and output the best checkpoint info...')
        test_score, test_loss = config_model(model, 'test', config=config)
    elif task == 'read_log':
        with open(config.log_path, 'r') as f:
            return f.readlines()[-1]

CM-BF · 2022-09-30T21:52:46Z

Hi Andrew,

We have updated this project to version 1. You can now launch multiple jobs and collect their results easily. Please refer to the new README.

Please let me know if any questions.

LFhase · 2022-10-01T09:53:40Z

Hi Shurui,

Thanks very much for the update! They can help us a lot in incorporating our algorithm into GOOD benchmark with the updated code.

Besides the running code, I am wondering would it be possible for you to maintain some online "leaderboard"? which could help the followers to identify the state-of-the-arts in OOD generalization on graphs : )

BTW, congulaturations on your NeurIPS acceptance, we believe GOOD could facilitate the development of better OOD algorithms for graph data!

CM-BF · 2022-10-03T15:30:29Z

Hi Andrew,

You are very welcome! And thank you for your advice and congratulations!

For the online leaderboard: Yes, I have the plan to maintain a leaderboard later. It will take some time for the full documentation update and the online leaderboard building, but I will try my best to work on them as earlier as possible. : )

CM-BF · 2022-10-22T00:04:43Z

Hi Andrew,

Current, the documentation for "how to add a new algorithm" has been completed. The leaderboard construction is in our roadmap. I'll close this issue, since the major issue has been solved. Please start a new issue or a discussion if any questions. 😄

CM-BF closed this as completed Oct 22, 2022

CM-BF added the good first issue Good for newcomers label Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can one add a new algorithm and benchmark it with GOOD? #4

How can one add a new algorithm and benchmark it with GOOD? #4

LFhase commented Aug 26, 2022

CM-BF commented Aug 26, 2022

LFhase commented Sep 4, 2022

CM-BF commented Sep 4, 2022

CM-BF commented Sep 30, 2022

LFhase commented Oct 1, 2022

CM-BF commented Oct 3, 2022

CM-BF commented Oct 22, 2022

How can one add a new algorithm and benchmark it with GOOD? #4

How can one add a new algorithm and benchmark it with GOOD? #4

Comments

LFhase commented Aug 26, 2022

CM-BF commented Aug 26, 2022

LFhase commented Sep 4, 2022

CM-BF commented Sep 4, 2022

CM-BF commented Sep 30, 2022

LFhase commented Oct 1, 2022

CM-BF commented Oct 3, 2022

CM-BF commented Oct 22, 2022