Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Optional RDKit run-time dependency not correctly pinned #50

Closed
xiki-tempula opened this issue May 3, 2023 · 3 comments · Fixed by #51
Closed

[BUG] Optional RDKit run-time dependency not correctly pinned #50

xiki-tempula opened this issue May 3, 2023 · 3 comments · Fixed by #51
Labels
bug Something isn't working exscientia Related to work with Exscientia

Comments

@xiki-tempula
Copy link

Describe the bug
toRDKit function is broken

To Reproduce
create the env with
mamba create -n BSS_test -c openbiosim/label/main -c conda-forge BioSimSpace rdkit ambertools
Then run

import BioSimSpace as BSS
mol = BSS.Parameters.openff_unconstrained_2_0_0('C').getMolecule()
BSS.Convert.toRDKit(mol)

Gives

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1                                                                                    │
│                                                                                                  │
│ /home/ubuntu/miniconda3/envs/BSS_test/lib/python3.10/site-packages/BioSimSpace/Convert/_convert. │
│ py:518 in toRDKit                                                                                │
│                                                                                                  │
│   515 │   converted_obj :                                                                        │
│   516 │      The object in OpenMM format.                                                        │
│   517 │   """                                                                                    │
│ ❱ 518 │   return to(obj, format="rdkit", property_map=property_map)                              │
│   519                                                                                            │
│   520                                                                                            │
│   521 def toSire(obj, property_map={}):                                                          │
│                                                                                                  │
│ /home/ubuntu/miniconda3/envs/BSS_test/lib/python3.10/site-packages/BioSimSpace/Convert/_convert. │
│ py:182 in to                                                                                     │
│                                                                                                  │
│   179 │   format = format.lower().replace(" ", "")                                               │
│   180 │                                                                                          │
│   181 │   if not format in supportedFormats():                                                   │
│ ❱ 182 │   │   raise ValueError(                                                                  │
│   183 │   │   │   f"Unsupported format '{format}', options are: {', '.join(supportedFormats())   │
│   184 │   │   )                                                                                  │
│   185                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Unsupported format 'rdkit', options are: biosimspace, openmm, sire.

Expected behavior
It works.

(please complete the following information):

  • OS: Linux
  • Version of Python: 3.8
  • Version of BioSimSpace: latest
  • I confirm that I have checked this bug still exists in the latest released version of BioSimSpace: yes
@xiki-tempula xiki-tempula added the bug Something isn't working label May 3, 2023
@lohedges
Copy link
Contributor

lohedges commented May 4, 2023

Thanks for reporting. This is an issue with Sire, so I'll transfer over. I am not quite sure how this has happened, since there is a lot of internal functionality that uses this feature and the CI passed on main, which includes unit tests for this functionality.

Bizarrely, doing the following with Sire:

import sire as sr

sr.sr.__version__
'2023.2.2'

sr.convert.supported_formats()
['biosimspace', 'openmm', 'sire']

RDKit functionality is optional, so would be suppressed if the package is not installed. However, RDKit is a dependency of BioSimSpace and Sire will have been built against the same version.

Trying to convert directly with Sire gives:

import BioSimSpace as BSS
import sire as sr
mol = BSS.Parameters.openff_unconstrained_2_0_0('C').getMolecule()

sr.convert.to_rdkit(mol)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1                                                                                    │
│                                                                                                  │
│ /home/lester/.conda/envs/test/lib/python3.10/site-packages/sire/convert/__init__.py:184 in       │
│ to_rdkit                                                                                         │
│                                                                                                  │
│   181Convert the passed object from its current object format to a                          │
│   182rdkit object format.                                                                   │
│   183 │   """                                                                                    │
│ ❱ 184 │   return sire_to_rdkit(to_sire(obj, map=map), map=map)                                   │
│   185                                                                                            │
│   186                                                                                            │
│   187 def to_openmm(obj, map=None):                                                              │
│                                                                                                  │
│ /home/lester/.conda/envs/test/lib/python3.10/site-packages/sire/convert/__init__.py:419 in       │
│ sire_to_rdkit                                                                                    │
│                                                                                                  │
│   416 │                                                                                          │
│   417 │   from ..base import create_map                                                          │
│   418 │                                                                                          │
│ ❱ 419 │   mols = _sire_to_rdkit(obj, map=create_map(map))                                        │
│   420 │                                                                                          │
│   421 │   if mols is None:                                                                       │
│   422 │   │   return None                                                                        │
│                                                                                                  │
│ /home/lester/.conda/envs/test/lib/python3.10/site-packages/sire/legacy/Convert/__init__.py:38 in │
│ sire_to_rdkit                                                                                    │
│                                                                                                  │
│    35 │   _has_rdkit = False                                                                     │
│    36 │                                                                                          │
│    37 │   def sire_to_rdkit(*args, **kwargs):                                                    │
│ ❱  38 │   │   _no_rdkit()                                                                        │
│    39 │                                                                                          │
│    40 │   def rdkit_to_sire(*args, **kwargs):                                                    │
│    41 │   │   _no_rdkit()                                                                        │
│                                                                                                  │
│ /home/lester/.conda/envs/test/lib/python3.10/site-packages/sire/legacy/Convert/__init__.py:29 in │
│ _no_rdkit                                                                                        │
│                                                                                                  │
│    26 except Exception:                                                                          │
│    27 │   # RDKit support is not available                                                       │
│    28 │   def _no_rdkit():                                                                       │
│ ❱  29 │   │   raise ModuleNotFoundError(                                                         │
│    30 │   │   │   "Unable to convert to/from RDKit as it is not installed. "                     │
│    31 │   │   │   "Please install using `mamba install -c conda-forge rdkit` "                   │
│    32 │   │   │   "and then re-run this script."                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: Unable to convert to/from RDKit as it is not installed. Please install using `mamba install
-c conda-forge rdkit` and then re-run this script.

However:

mamba list rdkit                                                                               ✘ 1
# packages in environment at /home/lester/.conda/envs/test:
#
# Name                    Version                   Build  Channel
rdkit                     2023.03.1       py310h399bcf7_0    conda-forge

If I dig into the Python wrappers for sire.legacy.Convert, then the actual error that is raised when trying to load the RDKit C++ library is:

ImportError: /home/lester/.conda/envs/test/lib/python3.10/site-packages/sire/legacy/Convert/_SireRDKit.so:
undefined symbol:
_ZN5RDKit12DGeomHelpers18EmbedMultipleConfsERNS_5ROMolERSt6vectorIiSaIiEEjRKNS0_15EmbedParametersE

What I think is going on is that the RDKit version pinning is failing. Either RDKit have made a breaking change for a minor version bump, or the versions of rdkit and rdkit-dev have changed. Sire is built against rdkit and rdkit-dev, but only requires rdkit at runtime. Perhaps some functionality has been removed from the core package?

Looking at the generated meta.yaml for the Sire 2023.2.2 package I see the following in the host section:

- rdkit 2022.09.5 py310h7988725_0

This approach is only used to build Sire in an environment with a consistent set of dependencies (and transitive dependencies) to those BioSimSpace. However, it doesn't guarantee the pinning if any of those dependencies are actually needed by Sire at run time. As such, we'll likely need to add a run_constrained for rdkit to the Sire recipe.

@lohedges lohedges transferred this issue from OpenBioSim/biosimspace May 4, 2023
@lohedges lohedges changed the title [BUG] BSS.Convert.toRDKit broken [BUG] Optional RDKit run-time dependency not correctly pinned May 4, 2023
@lohedges lohedges added the exscientia Related to work with Exscientia label May 4, 2023
@lohedges
Copy link
Contributor

lohedges commented May 9, 2023

As a workaround until this is fixed, you can pin rdkit to version 2022 when installing, i.e.:

mamba install -c conda-forge -c openbiosim sire rdkit=2022

@lohedges
Copy link
Contributor

Just to say that while I think the solution used will work, i.e. adding...

  run_constrained:
    - {{ pin_compatible('rdkit', max_pin='x.x') }}

...it's possible that conda could resolve a different version of rdkit for the run-time requirement to the one that is used for the build. Annoyingly there is no equivalent version of the configuration option pin_run_as_build for the run_constrained section, i.e. it only applies to the actual run requirements. (I've tested this.) There is a way to use jinja expressions to loop over the build or host requirements and manually pin them, but this is quite clunky and there is no way to select only a specific package without some complex logic.

I guess we can revisit this if it turns out to be problematic.

(Note that pin_run_as_build is quite useful since it allows us to automatically pin the runtime requirement of packages that we build against when the don't impose their own run_exports. For example, we use this for boost and openmm, where the packages don't assume that you are linking against their libraries, i.e. you are only using the Python layer.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exscientia Related to work with Exscientia
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants