Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torchdrug integration #49

Merged
merged 31 commits into from
Mar 14, 2022
Merged

Torchdrug integration #49

merged 31 commits into from
Mar 14, 2022

Conversation

jannisborn
Copy link
Contributor

@jannisborn jannisborn commented Mar 10, 2022

TorchDrug integration (inference-only) as discussed @drugilsberg!

  • 2 models implemented GCPN (Graph Convolutional Policy Network, NeurIPS 2018) and GAF (Graph AutoregressiveFlow, ICLR 2020).

  • For each models, 3 pretrained models are made available:

    • zinc250k: TorchDrugZincGCPN and TorchDrugZincGAF
    • optimized to generated high-qed molecules: TorchDrugQedGAF and TorchDrugQedGCPN.
    • optimized to generated high-plogp molecules: TorchDrugPlogpGCPN and TorchDrugPlogpGAF
  • unittests implemented for all models

ToDo:

  • fix CI pipeline
  • upload models on COS (2 times)

Regarding CI:

I knew this would backfire, the tests are still failing at installation. Locally, I had to install pytorch-scatter via conda, I did the following:

pip3 install torch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
conda install pytorch-scatter -c pyg
pip install torchdrug

and this worked fine.

When I reproduce the CI pipeline locally, I also do not get a functioning installation. The env is created successfully but importing pytorch-scatter results in segmentation faults.

Possible solution: Adapt the conda.yml like so:

name: gt4sd
channels:
  - https://conda.anaconda.org/pyg
dependencies:
  - python>=3.7,<3.8
  - pip>=19.1,<20.3
  - pytorch-scatter
  - pip:
      - -r requirements.txt
      - -r vcs_requirements.txt
      # development
      - -r dev_requirements.txt

@cla-bot cla-bot bot added the cla-signed CLA has been signed label Mar 10, 2022
@jannisborn jannisborn marked this pull request as ready for review March 10, 2022 23:52
@jannisborn
Copy link
Contributor Author

Open for review and discussion about pytorch-scatter/torchdrug installation workflow @drugilsberg!

@jannisborn
Copy link
Contributor Author

jannisborn commented Mar 11, 2022

Final updates on this PR @drugilsberg:

  • I parametrized num_sample for the generation, as we discussed. I checked it locally, the sample generation is faster now
  • Good news about openmp which I initially had to disable package-wide because torchdrug assumes specific version of libomp that are beyond our control (ships via brew/apt): openmp can be enabled again after the torchdrug imports and it does not interfer with torchdrug-related inference, even if we mix it with inference of other models.
  • torchdrug installation: It was tricky to ensure a correct enironment setup. We cannot follow the canonical way of installing torchdrug via pip, I tried in 9b65f45 but the installation fails because torch_scatter does not find pytorch. I rolled back to installing torch_scatter via conda but this gives conflicts in the env setup (df99b5c) unless we do the same with pytorch too. Then, it works on Mac but not so on ubuntu because of segfaults (803a39b). The segfaults were caused by an interference between sentencepiece and pytorch_lightning that was reported here. I found a fix by forcing sentencepiece to be imported before pytorch_lightning. So in sum, I changed the conda.yml and enforced import ordering and the installations runs through now, no more segmentation faults on both mac and ubuntu 🚀 The conda incubator setup in every build goes up from 4 to 10min but I guess that's the price we have to pay for torchdrug 😐
  • I found (and fixed) a terrible bug caused by torchdrug. Essentially, they overwrite the default nn.Module of torch. See here. The crazy thing is that all subsequent code running native torch uses this patched module. I found it out because they forgot to implement a keyword argument that is present in native torch and that was used by the EnzymeOptimzer (Roberta). Glad that we do thorough unit testing :) I patched it in torchdrug.implementation, see bda7951 and opened an issue here.

I'm glad I managed to find fixes for all these issues, but the mere wall of issues I encountered for doing this PR should make it evident that the torchdrug integration raises some concerns regarding the overall code reliability 😐

Copy link
Contributor

@drugilsberg drugilsberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job some general comments we need to address before merging:

  • I would not have configuration classes for specific datasets. Since the classes are the same we should simply use versions for the different datatsets (see comment in PR).
  • sentencepiece treatment. Instead of doing the trick every time I would do it once in src/gt4sd/__init__.py. After the definition of __version__ and __name__ we can have the following:
import sentencepiece as _sentencepiece  # noqa: F401
import pytorch_lightning as _pl  # noqa: F401

In this way we only have it there and inside the module we can import pytorch_lightning without any issue.

  • I left a note on the installation.

Once those are addressed I would say we are good to go.

conda.yml Outdated Show resolved Hide resolved
conda.yml Outdated Show resolved Hide resolved
src/gt4sd/frameworks/granular/arg_parser/parser.py Outdated Show resolved Hide resolved
src/gt4sd/algorithms/generation/torchdrug/core.py Outdated Show resolved Hide resolved
@jannisborn
Copy link
Contributor Author

jannisborn commented Mar 12, 2022

I addressed all comments @drugilsberg, but unfortunately your solution with importing sentencepiece in the global init file does not work. Dependent on which functions we import first from the package, the __init__ is not always executed before an import of lightning, thus this "global" solution might still cause segfaults

Copy link
Contributor

@drugilsberg drugilsberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good great job, just few comments remained.

src/gt4sd/algorithms/generation/torchdrug/core.py Outdated Show resolved Hide resolved
src/gt4sd/frameworks/granular/arg_parser/parser.py Outdated Show resolved Hide resolved
src/gt4sd/algorithms/generation/torchdrug/abc.py Outdated Show resolved Hide resolved
Copy link
Contributor

@drugilsberg drugilsberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good great job, just few comments remained.

src/gt4sd/algorithms/generation/torchdrug/abc.py Outdated Show resolved Hide resolved
src/gt4sd/frameworks/granular/arg_parser/parser.py Outdated Show resolved Hide resolved
Signed-off-by: Matteo Manica <drugilsberg@gmail.com>
Copy link
Contributor

@drugilsberg drugilsberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great well done

@drugilsberg drugilsberg merged commit 0064ef6 into GT4SD:main Mar 14, 2022
@jannisborn jannisborn deleted the torchdrug branch May 13, 2022 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed CLA has been signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants