Torchdrug integration #49

jannisborn · 2022-03-10T12:21:03Z

TorchDrug integration (inference-only) as discussed @drugilsberg!

2 models implemented GCPN (Graph Convolutional Policy Network, NeurIPS 2018) and GAF (Graph AutoregressiveFlow, ICLR 2020).
For each models, 3 pretrained models are made available:
- zinc250k: TorchDrugZincGCPN and TorchDrugZincGAF
- optimized to generated high-qed molecules: TorchDrugQedGAF and TorchDrugQedGCPN.
- optimized to generated high-plogp molecules: TorchDrugPlogpGCPN and TorchDrugPlogpGAF
unittests implemented for all models

ToDo:

fix CI pipeline
upload models on COS (2 times)

Regarding CI:

I knew this would backfire, the tests are still failing at installation. Locally, I had to install pytorch-scatter via conda, I did the following:

pip3 install torch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
conda install pytorch-scatter -c pyg
pip install torchdrug

and this worked fine.

When I reproduce the CI pipeline locally, I also do not get a functioning installation. The env is created successfully but importing pytorch-scatter results in segmentation faults.

Possible solution: Adapt the conda.yml like so:

name: gt4sd
channels:
  - https://conda.anaconda.org/pyg
dependencies:
  - python>=3.7,<3.8
  - pip>=19.1,<20.3
  - pytorch-scatter
  - pip:
      - -r requirements.txt
      - -r vcs_requirements.txt
      # development
      - -r dev_requirements.txt

…into torchdrug

jannisborn · 2022-03-10T23:52:46Z

Open for review and discussion about pytorch-scatter/torchdrug installation workflow @drugilsberg!

…port order

jannisborn · 2022-03-11T21:21:33Z

Final updates on this PR @drugilsberg:

I parametrized num_sample for the generation, as we discussed. I checked it locally, the sample generation is faster now
Good news about openmp which I initially had to disable package-wide because torchdrug assumes specific version of libomp that are beyond our control (ships via brew/apt): openmp can be enabled again after the torchdrug imports and it does not interfer with torchdrug-related inference, even if we mix it with inference of other models.
torchdrug installation: It was tricky to ensure a correct enironment setup. We cannot follow the canonical way of installing torchdrug via pip, I tried in 9b65f45 but the installation fails because torch_scatter does not find pytorch. I rolled back to installing torch_scatter via conda but this gives conflicts in the env setup (df99b5c) unless we do the same with pytorch too. Then, it works on Mac but not so on ubuntu because of segfaults (803a39b). The segfaults were caused by an interference between sentencepiece and pytorch_lightning that was reported here. I found a fix by forcing sentencepiece to be imported before pytorch_lightning. So in sum, I changed the conda.yml and enforced import ordering and the installations runs through now, no more segmentation faults on both mac and ubuntu 🚀 The conda incubator setup in every build goes up from 4 to 10min but I guess that's the price we have to pay for torchdrug 😐
I found (and fixed) a terrible bug caused by torchdrug. Essentially, they overwrite the default nn.Module of torch. See here. The crazy thing is that all subsequent code running native torch uses this patched module. I found it out because they forgot to implement a keyword argument that is present in native torch and that was used by the EnzymeOptimzer (Roberta). Glad that we do thorough unit testing :) I patched it in torchdrug.implementation, see bda7951 and opened an issue here.

I'm glad I managed to find fixes for all these issues, but the mere wall of issues I encountered for doing this PR should make it evident that the torchdrug integration raises some concerns regarding the overall code reliability 😐

drugilsberg

Great job some general comments we need to address before merging:

I would not have configuration classes for specific datasets. Since the classes are the same we should simply use versions for the different datatsets (see comment in PR).
sentencepiece treatment. Instead of doing the trick every time I would do it once in src/gt4sd/__init__.py. After the definition of __version__ and __name__ we can have the following:

import sentencepiece as _sentencepiece  # noqa: F401
import pytorch_lightning as _pl  # noqa: F401

In this way we only have it there and inside the module we can import pytorch_lightning without any issue.

I left a note on the installation.

Once those are addressed I would say we are good to go.

conda.yml

src/gt4sd/algorithms/generation/torchdrug/implementation.py

src/gt4sd/frameworks/granular/arg_parser/parser.py

src/gt4sd/algorithms/generation/torchdrug/core.py

jannisborn · 2022-03-12T14:47:46Z

I addressed all comments @drugilsberg, but unfortunately your solution with importing sentencepiece in the global init file does not work. Dependent on which functions we import first from the package, the __init__ is not always executed before an import of lightning, thus this "global" solution might still cause segfaults

drugilsberg

Looks good great job, just few comments remained.

src/gt4sd/algorithms/generation/torchdrug/core.py

src/gt4sd/frameworks/granular/arg_parser/parser.py

src/gt4sd/algorithms/generation/torchdrug/abc.py

drugilsberg

Looks good great job, just few comments remained.

src/gt4sd/algorithms/generation/torchdrug/abc.py

src/gt4sd/frameworks/granular/arg_parser/parser.py

src/gt4sd/training_pipelines/pytorch_lightning/core.py

src/gt4sd/training_pipelines/pytorch_lightning/language_modeling/core.py

src/gt4sd/training_pipelines/pytorch_lightning/language_modeling/models.py

src/gt4sd/frameworks/granular/arg_parser/parser.py

src/gt4sd/frameworks/granular/dataloader/data_module.py

src/gt4sd/frameworks/granular/ml/module.py

src/gt4sd/frameworks/granular/train/core.py

updating comment.

…g/models.py updating comment.

…g/core.py updating comment.

updating comment.

src/gt4sd/algorithms/generation/torchdrug/abc.py

updating comment.

Signed-off-by: Matteo Manica <drugilsberg@gmail.com>

drugilsberg

Looks great well done

jannisborn added 2 commits March 8, 2022 18:28

wip: create branch

138c490

wip: stub for torchdrug integration

68e35c4

cla-bot bot added the cla-signed CLA has been signed label Mar 10, 2022

jannisborn added 6 commits March 11, 2022 00:09

wip: create branch

29ab6e3

wip: stub for torchdrug integration

4f7ebf6

feat: adding torchdrug interface

aff86fe

chore: style enforcement

a2e58a1

test: torchdrug test suite

331adfc

Merge branch 'torchdrug' of https://github.com/jannisborn/gt4sd-core …

9b65f45

…into torchdrug

jannisborn marked this pull request as ready for review March 10, 2022 23:52

jannisborn and others added 7 commits March 11, 2022 11:43

ci: torch-scatter installation

df99b5c

wip: paramatrizing number-of-samples

43ae98b

wip: enable openmp again, mypy ignore and registtry imports

c59890f

ci: install pytorch via conda

803a39b

Merge branch 'main' into torchdrug

f80d8a1

ci: resolve ubuntu segfault due to sentencepiece/pytorch-lightning im…

190b15c

…port order

fix: resolving side effect of torchdrug that overwrites nn.Module

bda7951

fix: fix unittests

9c20aa1

drugilsberg requested changes Mar 12, 2022

View reviewed changes

jannisborn added 3 commits March 12, 2022 14:00

chore: isort force sentencepiece to-top

7b6e013

chore: env setup and docstrings

04fd6f6

refactor: turn datasets into algorithm_version

13a843f

drugilsberg requested changes Mar 14, 2022

View reviewed changes

src/gt4sd/algorithms/generation/torchdrug/core.py Outdated Show resolved Hide resolved

src/gt4sd/frameworks/granular/arg_parser/parser.py Outdated Show resolved Hide resolved

src/gt4sd/algorithms/generation/torchdrug/abc.py Outdated Show resolved Hide resolved

jannisborn added 2 commits March 14, 2022 08:42

doc: huggingface -> torchdrug [skip ci]

8d9c72d

ci: dummy commit (run CI)

b71bb08

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/algorithms/generation/torchdrug/abc.py Outdated Show resolved Hide resolved

src/gt4sd/frameworks/granular/arg_parser/parser.py Outdated Show resolved Hide resolved

chore: sentencepiece and openmp finalization

694b57f

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/training_pipelines/pytorch_lightning/core.py Outdated Show resolved Hide resolved

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/training_pipelines/pytorch_lightning/language_modeling/core.py Outdated Show resolved Hide resolved

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/training_pipelines/pytorch_lightning/language_modeling/models.py Outdated Show resolved Hide resolved

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/frameworks/granular/arg_parser/parser.py Outdated Show resolved Hide resolved

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/frameworks/granular/dataloader/data_module.py Outdated Show resolved Hide resolved

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/frameworks/granular/ml/module.py Outdated Show resolved Hide resolved

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/frameworks/granular/train/core.py Outdated Show resolved Hide resolved

drugilsberg added 7 commits March 14, 2022 13:47

docs: src/gt4sd/frameworks/granular/arg_parser/parser.py

3558fa7

updating comment.

docs: src/gt4sd/frameworks/granular/dataloader/data_module.py

60112e6

updating comment.

docs: src/gt4sd/frameworks/granular/ml/module.py

94494c4

updating comment.

docs: src/gt4sd/frameworks/granular/train/core.py

3ca1fa9

updating comment.

docs: src/gt4sd/training_pipelines/pytorch_lightning/language_modelin…

6893e3f

…g/models.py updating comment.

docs: src/gt4sd/training_pipelines/pytorch_lightning/language_modelin…

d9d6fda

…g/core.py updating comment.

docs: src/gt4sd/training_pipelines/pytorch_lightning/core.py

901f383

updating comment.

drugilsberg reviewed Mar 14, 2022

View reviewed changes

src/gt4sd/algorithms/generation/torchdrug/abc.py Outdated Show resolved Hide resolved

drugilsberg added 2 commits March 14, 2022 13:52

docs: src/gt4sd/algorithms/generation/torchdrug/abc.py

fae3fde

updating comment.

chore: updating variables naming.

59d21e6

Signed-off-by: Matteo Manica <drugilsberg@gmail.com>

drugilsberg approved these changes Mar 14, 2022

View reviewed changes

drugilsberg merged commit 0064ef6 into GT4SD:main Mar 14, 2022

jannisborn mentioned this pull request Mar 28, 2022

include support and interfaces for torchdrug #10

Closed

jannisborn deleted the torchdrug branch May 13, 2022 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchdrug integration #49

Torchdrug integration #49

jannisborn commented Mar 10, 2022 •

edited

Loading

jannisborn commented Mar 10, 2022

jannisborn commented Mar 11, 2022 •

edited

Loading

drugilsberg left a comment

jannisborn commented Mar 12, 2022 •

edited

Loading

drugilsberg left a comment

drugilsberg left a comment

drugilsberg left a comment

Torchdrug integration #49

Torchdrug integration #49

Conversation

jannisborn commented Mar 10, 2022 • edited Loading

Regarding CI:

jannisborn commented Mar 10, 2022

jannisborn commented Mar 11, 2022 • edited Loading

drugilsberg left a comment

Choose a reason for hiding this comment

jannisborn commented Mar 12, 2022 • edited Loading

drugilsberg left a comment

Choose a reason for hiding this comment

drugilsberg left a comment

Choose a reason for hiding this comment

drugilsberg left a comment

Choose a reason for hiding this comment

jannisborn commented Mar 10, 2022 •

edited

Loading

jannisborn commented Mar 11, 2022 •

edited

Loading

jannisborn commented Mar 12, 2022 •

edited

Loading