CLI for AtomMapping #57

dwhswenson · 2022-02-15T21:54:18Z

Includes and builds on #49, should only be merged after that one.

This adds some reusable parameters (--mol and --mapper), as well as a command that outputs the mol1_to_mol2 mapping dict.

The current usage is only intended to be a starting point, which we can improve later. It looks something like this:

$ openfe atommapping --mol "Cc1ccccc1" --mol "CC1CCCCC1" --mapper LomapAtomMapper
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6}

Things that aren't ideal there:

Takes SMILES strings (and only SMILES for now). That can easily be extended. Currently this is also using an ugly workaround to avoid issue in openfreeenergy/lomap#4.
There's no way to provide options to mapper. ~~I'm not sure that there's an easy way to do that, either.~~ [EDIT: I came up with an approach that I think will work, but the question whether it's worth it is still on the table.] Perhaps it would make more sense to load from a serialized network and to find the edge(s?) associated with the two molecules -- in this sense, you'd use it to look at the mapping after it has been generated (but before simulating), instead of generating it before. @richardjgowers, you would have better sense on whether that approach would fit with what was requested.

A few corners are still WIP, but the main ideas are here.

Currently requires the OPS CLI to be installed as well; next steps: 1. Add parameter core classes to infrastructure 2. Fully separate infrastructure into its own package

pep8speaks · 2022-02-15T21:54:23Z

Hello @dwhswenson! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-02-21 16:44:43 UTC

…rt-atommapping

codecov · 2022-02-16T15:45:16Z

Codecov Report

Merging #57 (01008c3) into main (ea64274) will increase coverage by 4.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #57      +/-   ##
==========================================
+ Coverage   94.69%   98.75%   +4.05%     
==========================================
  Files          24       36      +12     
  Lines         452      640     +188     
==========================================
+ Hits          428      632     +204     
+ Misses         24        8      -16

Impacted Files	Coverage Δ
openfecli/commands/atommapping.py	`100.00% <100.00%> (ø)`
openfecli/parameters/__init__.py	`100.00% <100.00%> (ø)`
openfecli/parameters/mapper.py	`100.00% <100.00%> (ø)`
openfecli/parameters/mol.py	`100.00% <100.00%> (ø)`
openfecli/parameters/utils.py	`100.00% <100.00%> (ø)`
openfecli/tests/commands/test_atommapping.py	`100.00% <100.00%> (ø)`
openfecli/tests/parameters/test_mapper.py	`100.00% <100.00%> (ø)`
openfecli/tests/parameters/test_mol.py	`100.00% <100.00%> (ø)`
openfecli/tests/parameters/test_utils.py	`100.00% <100.00%> (ø)`
openfecli/tests/test_utils.py	`100.00% <100.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea64274...01008c3. Read the comment docs.

dwhswenson · 2022-02-16T20:42:14Z

This is ready for review.

@mikemhenry : You should be able to plug in your visualization pretty easily. You'll probably want to create something in parameters to add an IMAGE_OUTPUT_FILE reusable decorator (or maybe specify the image format you're using?); otherwise, the existing command should do the job.

Question: do we want to keep the option of outputting the dict, or will we completely replace that with Mike's visualization? I implemented this under the assumption we'd replace the text output with the more useful image file, but we could find a way to keep both.

mikemhenry · 2022-02-16T21:36:30Z

@dwhswenson I think we can use some dunder magic to determine what the environment is in and produce the right output, like if in a notebook we dump the viz + dict but if we are in a cli, just the dict.

dwhswenson · 2022-02-16T21:58:25Z

I think we can use some dunder magic to determine what the environment is in and produce the right output, like if in a notebook we dump the viz + dict but if we are in a cli, just the dict.

What I mean is that, in the CLI, there should be an option to do something like:

$ openfe atommapping --mol $MOL1 --mol $MOL2 --mapper $MAPPER -o output.pdf

(or whatever output file format you can easily support). Using your visualization will be much better for users than outputting the dict in the terminal, which I only intended as a placeholder. The question is whether that replaces the text output (my assumption) or whether, perhaps if the -o argument isn't provided, we use that text as a fallback.

As for identifying whether you're in a notebook, that should be a last-second check. FWIW, OPS has this code for identifying if it is running in a notebook, although there are probably better ways (that code hasn't changed in 6 years). I'd take a look at tqdm.auto, which is probably better maintained and tested (even though the lead dev of tqdm doesn't actively maintain the Jupyter integration.)

richardjgowers · 2022-02-17T10:37:31Z

openfecli/parameters/mol.py

+
+get_molecule = MultiStrategyGetter(
+    strategies=[
+        # NOTE: I think loading from smiles must be last choice, because
+        # failure will give meaningless user-facing errors
+        _load_molecule_from_smiles,
+    ],
+    error_message="Unable to generate a molecule from '{user_input}'."
+)


So a . in a SMILES would mean a disconnected component, which I think we can assume we won't be given... so another strategy could be that any string with a dot is a filepath. Is the StrategyGetter just iterating through the list of functions until it finds a hit?

Is the StrategyGetter just iterating through the list of functions until it finds a hit?

Basically, yeah. It attempts to parse user input according the function in each of the strategies, and those functions return the sentinel plugcli.params.NOT_PARSED if they can't interpret the input. So it's sort of a "forgiveness, not permission" approach -- if we added a _load_molecule_from_file strategy before _load_molecule_from_smiles, then it will take the user input, try to interpret it as a filename. If that fails (file doesn't exist or can't be understood) then it will fall back to _load_molecule_from_smiles, which would treat the same string as SMILES.

Adding new strategies is easy. I didn't include file loading in this PR just to try to keep it minimal, but my assumption is that loading from file will usually be the first choice.

Also, get_molecule can be replaced with a function, if we want more control on, e.g., error handling. MultiStrategyGetter is just a shortcut for making a callable from composable pieces.

richardjgowers · 2022-02-21T10:05:29Z

@dwhswenson re: options to the mapper, could we not just have something like --mapper-kwargs={'foo': 'bar'} etc?

dwhswenson · 2022-02-21T16:14:06Z

re: options to the mapper, could we not just have something like --mapper-kwargs={'foo': 'bar'} etc?

In theory, something like this could work, but I don't think it creates a good user experience. To me, the core idea in CLI design is that the CLI is for users who are less comfortable with Python (because if they're comfortable with Python, then the same tasks should be nearly as easy in Python!)

A --mapper-kwargs wouldn't follow that principle:

It requires that the user be familiar with Python/JSON dicts (and somewhat assumes familiarity with the idea of **kwargs). If a user is not comfortable passing the relevant arguments to the AtomMapper in Python, then I would expect them to be equally uncomfortable trying to manage that through the shell, which will lead to user confusion. For example, a common user mistake will be to forget to quote strings.
As written, I don't think it works. It would require quoting, otherwise some shells would try to interpret the { and } before the string even gets to our Python. How shells do this depends somewhat on the shell. I'd rather avoid situations where we're answering user questions that are really about how each shell handles quoting and special characters.
No discoverability. There's no way for a user to learn, from the CLI, what the allowed options are. The only documentation would be the main API docs -- and again, if the user isn't comfortable with Python, I wouldn't want to assume that they can translate API docs to reasonable input strings here.
It potentially introduces a security weakness and/or still remains limited. I don't see a way to parse that without an eval (perhaps literal_eval for some level of security). Even so, if some AtomMapper takes a parameter that isn't a builtin, we need to maintain the globals/locals in eval to whitelist certain items, which places a maintenance burden on the CLI that should be the role of the AtomMapper itself.

If this functionality is really important, there are better ways around this, but implementing that might take me a day or two -- I could basically make a plugcli-style plugin class that generates this from inspecting an AtomMapper. I'd rather do that as a follow-up, so that the basic functionality here is available for Mike to build on. Switching the output to an image should be orthogonal to handling the specific inputs, so work could proceed on both at once without and blocking.

richardjgowers

Trying this out locally I get this traceback on openfe --help

Traceback (most recent call last):
  File "/home/richard/miniconda3/envs/openfe/bin/openfe", line 5, in <module>
    from openfecli.cli import main
  File "/home/richard/miniconda3/envs/openfe/lib/python3.9/site-packages/openfecli/__init__.py", line 5, in <module>
    from . import commands
ImportError: cannot import name 'commands' from partially initialized module 'openfecli' (most likely due to a circular import) (/home/richard/miniconda3/envs/openfe/lib/python3.9/site-packages/openfecli/__init__.py)

I have a vague idea this might be because internally the import order is incorrect, I think I've fixed this with an __init__.py in commands/

richardjgowers

Seems to work nicely. Maybe --mol1 and --mol2 arguments would be clearer/better than just --mol twice. I'm not sure we actually guarantee that AtomMappers are symmetrical, (or if there is any ordering?) but this would make issues around that easier to debug.

dwhswenson · 2022-02-21T16:39:08Z

@richardjgowers : Did you try installing again? I think that fixed this for me. I had this at one point, and just running pip install -e . fixed.

You should NOT need to update commands/__init__.py. That's the whole point of the plugin architecture. You only ever change the file that contains the plugin. If that isn't working, then the problem is with the plugin architecture itself.

richardjgowers · 2022-02-21T16:45:47Z

@dwhswenson hmm yeah removing it works now. It might be something about a mix of develop and regular installs...

richardjgowers · 2022-02-21T16:58:18Z

openfecli/parameters/mol.py

+
+    # TODO: next is (temporary?) hack: see
+    # https://github.com/OpenFreeEnergy/Lomap/issues/4
+    Chem.rdDepictor.Compute2DCoords(mol)


This fix should probably get pushed back onto the LomapAtomMapper until the bug is fixed. (Check if any conformers, if not add 2d)

Probably a more elegant way but I was doing this at one point:

# Not sure if there is a better way to check if 2DCoords exist for mol in [mol_1, mol_2]: try: mol.GetConformers()[0] except IndexError: Chem.rdDepictor.Compute2DCoords(mol)

dwhswenson added 11 commits February 10, 2022 15:21

Initial skeleton for the CLI

291b560

Currently requires the OPS CLI to be installed as well; next steps: 1. Add parameter core classes to infrastructure 2. Fully separate infrastructure into its own package

Merge branch 'main' of github.com:OpenFreeEnergy/openfe into cli

3f7a345

switch to using plugcli

6dfc1e9

tests for CLI

f76dd0e

helps if you add the tests...

f57ee68

pep8 and deps cleanup

1c1753f

add test_plugins

ac23677

Start of CLI atommapping report

530af99

Add tests for parameters

3bc4957

Almost-working atommapping command

f4c2feb

add hack to work-around lomap issue

8401c21

dwhswenson added 2 commits February 16, 2022 10:20

Merge branch 'main' of github.com:OpenFreeEnergy/openfe into cli-repo…

d729d4d

…rt-atommapping

fix test

570e8aa

dwhswenson added 3 commits February 16, 2022 12:37

pep8; docstrings; add header comments

4a124bb

Add tests for commands/atommapping

9e7795a

docstrings

4148c2b

dwhswenson changed the title ~~[WIP] CLI for AtomMapping~~ CLI for AtomMapping Feb 16, 2022

dwhswenson marked this pull request as ready for review February 16, 2022 20:32

dwhswenson requested review from IAlibay, mikemhenry and richardjgowers February 16, 2022 20:42

richardjgowers reviewed Feb 17, 2022

View reviewed changes

add init to command to stop circular import error

d43b78e

richardjgowers reviewed Feb 21, 2022

View reviewed changes

remove init and it still works?

01008c3

richardjgowers reviewed Feb 21, 2022

View reviewed changes

richardjgowers merged commit 41068e0 into main Feb 23, 2022

richardjgowers deleted the cli-report-atommapping branch February 23, 2022 16:43

richardjgowers mentioned this pull request Mar 1, 2022

CLI isn't working with environment.yaml OpenFreeEnergy/ExampleNotebooks#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI for AtomMapping #57

CLI for AtomMapping #57

dwhswenson commented Feb 15, 2022 •

edited

Loading

pep8speaks commented Feb 15, 2022 •

edited

Loading

codecov bot commented Feb 16, 2022 •

edited

Loading

dwhswenson commented Feb 16, 2022

mikemhenry commented Feb 16, 2022

dwhswenson commented Feb 16, 2022

richardjgowers Feb 17, 2022

dwhswenson Feb 17, 2022

richardjgowers commented Feb 21, 2022

dwhswenson commented Feb 21, 2022

richardjgowers left a comment

richardjgowers left a comment

dwhswenson commented Feb 21, 2022

richardjgowers commented Feb 21, 2022

richardjgowers Feb 21, 2022

mikemhenry Feb 21, 2022

CLI for AtomMapping #57

CLI for AtomMapping #57

Conversation

dwhswenson commented Feb 15, 2022 • edited Loading

pep8speaks commented Feb 15, 2022 • edited Loading

Comment last updated at 2022-02-21 16:44:43 UTC

codecov bot commented Feb 16, 2022 • edited Loading

Codecov Report

dwhswenson commented Feb 16, 2022

mikemhenry commented Feb 16, 2022

dwhswenson commented Feb 16, 2022

richardjgowers Feb 17, 2022

Choose a reason for hiding this comment

dwhswenson Feb 17, 2022

Choose a reason for hiding this comment

richardjgowers commented Feb 21, 2022

dwhswenson commented Feb 21, 2022

richardjgowers left a comment

Choose a reason for hiding this comment

richardjgowers left a comment

Choose a reason for hiding this comment

dwhswenson commented Feb 21, 2022

richardjgowers commented Feb 21, 2022

richardjgowers Feb 21, 2022

Choose a reason for hiding this comment

mikemhenry Feb 21, 2022

Choose a reason for hiding this comment

dwhswenson commented Feb 15, 2022 •

edited

Loading

pep8speaks commented Feb 15, 2022 •

edited

Loading

codecov bot commented Feb 16, 2022 •

edited

Loading