Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI for AtomMapping #57

Merged
merged 18 commits into from
Feb 23, 2022
Merged

CLI for AtomMapping #57

merged 18 commits into from
Feb 23, 2022

Conversation

dwhswenson
Copy link
Member

@dwhswenson dwhswenson commented Feb 15, 2022

Includes and builds on #49, should only be merged after that one.

This adds some reusable parameters (--mol and --mapper), as well as a command that outputs the mol1_to_mol2 mapping dict.

The current usage is only intended to be a starting point, which we can improve later. It looks something like this:

$ openfe atommapping --mol "Cc1ccccc1" --mol "CC1CCCCC1" --mapper LomapAtomMapper
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6}

Things that aren't ideal there:

  • Takes SMILES strings (and only SMILES for now). That can easily be extended. Currently this is also using an ugly workaround to avoid issue in openfreeenergy/lomap#4.
  • There's no way to provide options to mapper. I'm not sure that there's an easy way to do that, either. [EDIT: I came up with an approach that I think will work, but the question whether it's worth it is still on the table.] Perhaps it would make more sense to load from a serialized network and to find the edge(s?) associated with the two molecules -- in this sense, you'd use it to look at the mapping after it has been generated (but before simulating), instead of generating it before. @richardjgowers, you would have better sense on whether that approach would fit with what was requested.

A few corners are still WIP, but the main ideas are here.

  • MOL parameter
  • MAPPER parameter
  • atommapping command
  • Tests for everything
  • Docstrings for everything

@pep8speaks
Copy link

pep8speaks commented Feb 15, 2022

Hello @dwhswenson! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-02-21 16:44:43 UTC

@codecov
Copy link

codecov bot commented Feb 16, 2022

Codecov Report

Merging #57 (01008c3) into main (ea64274) will increase coverage by 4.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #57      +/-   ##
==========================================
+ Coverage   94.69%   98.75%   +4.05%     
==========================================
  Files          24       36      +12     
  Lines         452      640     +188     
==========================================
+ Hits          428      632     +204     
+ Misses         24        8      -16     
Impacted Files Coverage Δ
openfecli/commands/atommapping.py 100.00% <100.00%> (ø)
openfecli/parameters/__init__.py 100.00% <100.00%> (ø)
openfecli/parameters/mapper.py 100.00% <100.00%> (ø)
openfecli/parameters/mol.py 100.00% <100.00%> (ø)
openfecli/parameters/utils.py 100.00% <100.00%> (ø)
openfecli/tests/commands/test_atommapping.py 100.00% <100.00%> (ø)
openfecli/tests/parameters/test_mapper.py 100.00% <100.00%> (ø)
openfecli/tests/parameters/test_mol.py 100.00% <100.00%> (ø)
openfecli/tests/parameters/test_utils.py 100.00% <100.00%> (ø)
openfecli/tests/test_utils.py 100.00% <100.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea64274...01008c3. Read the comment docs.

@dwhswenson dwhswenson changed the title [WIP] CLI for AtomMapping CLI for AtomMapping Feb 16, 2022
@dwhswenson dwhswenson marked this pull request as ready for review February 16, 2022 20:32
@dwhswenson
Copy link
Member Author

This is ready for review.

@mikemhenry : You should be able to plug in your visualization pretty easily. You'll probably want to create something in parameters to add an IMAGE_OUTPUT_FILE reusable decorator (or maybe specify the image format you're using?); otherwise, the existing command should do the job.

Question: do we want to keep the option of outputting the dict, or will we completely replace that with Mike's visualization? I implemented this under the assumption we'd replace the text output with the more useful image file, but we could find a way to keep both.

@mikemhenry
Copy link
Contributor

@dwhswenson I think we can use some dunder magic to determine what the environment is in and produce the right output, like if in a notebook we dump the viz + dict but if we are in a cli, just the dict.

@dwhswenson
Copy link
Member Author

I think we can use some dunder magic to determine what the environment is in and produce the right output, like if in a notebook we dump the viz + dict but if we are in a cli, just the dict.

What I mean is that, in the CLI, there should be an option to do something like:

$ openfe atommapping --mol $MOL1 --mol $MOL2 --mapper $MAPPER -o output.pdf

(or whatever output file format you can easily support). Using your visualization will be much better for users than outputting the dict in the terminal, which I only intended as a placeholder. The question is whether that replaces the text output (my assumption) or whether, perhaps if the -o argument isn't provided, we use that text as a fallback.

As for identifying whether you're in a notebook, that should be a last-second check. FWIW, OPS has this code for identifying if it is running in a notebook, although there are probably better ways (that code hasn't changed in 6 years). I'd take a look at tqdm.auto, which is probably better maintained and tested (even though the lead dev of tqdm doesn't actively maintain the Jupyter integration.)

Comment on lines +22 to +30

get_molecule = MultiStrategyGetter(
strategies=[
# NOTE: I think loading from smiles must be last choice, because
# failure will give meaningless user-facing errors
_load_molecule_from_smiles,
],
error_message="Unable to generate a molecule from '{user_input}'."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a . in a SMILES would mean a disconnected component, which I think we can assume we won't be given... so another strategy could be that any string with a dot is a filepath. Is the StrategyGetter just iterating through the list of functions until it finds a hit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the StrategyGetter just iterating through the list of functions until it finds a hit?

Basically, yeah. It attempts to parse user input according the function in each of the strategies, and those functions return the sentinel plugcli.params.NOT_PARSED if they can't interpret the input. So it's sort of a "forgiveness, not permission" approach -- if we added a _load_molecule_from_file strategy before _load_molecule_from_smiles, then it will take the user input, try to interpret it as a filename. If that fails (file doesn't exist or can't be understood) then it will fall back to _load_molecule_from_smiles, which would treat the same string as SMILES.

Adding new strategies is easy. I didn't include file loading in this PR just to try to keep it minimal, but my assumption is that loading from file will usually be the first choice.

Also, get_molecule can be replaced with a function, if we want more control on, e.g., error handling. MultiStrategyGetter is just a shortcut for making a callable from composable pieces.

@richardjgowers
Copy link
Contributor

@dwhswenson re: options to the mapper, could we not just have something like --mapper-kwargs={'foo': 'bar'} etc?

@dwhswenson
Copy link
Member Author

re: options to the mapper, could we not just have something like --mapper-kwargs={'foo': 'bar'} etc?

In theory, something like this could work, but I don't think it creates a good user experience. To me, the core idea in CLI design is that the CLI is for users who are less comfortable with Python (because if they're comfortable with Python, then the same tasks should be nearly as easy in Python!)

A --mapper-kwargs wouldn't follow that principle:

  1. It requires that the user be familiar with Python/JSON dicts (and somewhat assumes familiarity with the idea of **kwargs). If a user is not comfortable passing the relevant arguments to the AtomMapper in Python, then I would expect them to be equally uncomfortable trying to manage that through the shell, which will lead to user confusion. For example, a common user mistake will be to forget to quote strings.
  2. As written, I don't think it works. It would require quoting, otherwise some shells would try to interpret the { and } before the string even gets to our Python. How shells do this depends somewhat on the shell. I'd rather avoid situations where we're answering user questions that are really about how each shell handles quoting and special characters.
  3. No discoverability. There's no way for a user to learn, from the CLI, what the allowed options are. The only documentation would be the main API docs -- and again, if the user isn't comfortable with Python, I wouldn't want to assume that they can translate API docs to reasonable input strings here.
  4. It potentially introduces a security weakness and/or still remains limited. I don't see a way to parse that without an eval (perhaps literal_eval for some level of security). Even so, if some AtomMapper takes a parameter that isn't a builtin, we need to maintain the globals/locals in eval to whitelist certain items, which places a maintenance burden on the CLI that should be the role of the AtomMapper itself.

If this functionality is really important, there are better ways around this, but implementing that might take me a day or two -- I could basically make a plugcli-style plugin class that generates this from inspecting an AtomMapper. I'd rather do that as a follow-up, so that the basic functionality here is available for Mike to build on. Switching the output to an image should be orthogonal to handling the specific inputs, so work could proceed on both at once without and blocking.

Copy link
Contributor

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying this out locally I get this traceback on openfe --help

Traceback (most recent call last):
  File "/home/richard/miniconda3/envs/openfe/bin/openfe", line 5, in <module>
    from openfecli.cli import main
  File "/home/richard/miniconda3/envs/openfe/lib/python3.9/site-packages/openfecli/__init__.py", line 5, in <module>
    from . import commands
ImportError: cannot import name 'commands' from partially initialized module 'openfecli' (most likely due to a circular import) (/home/richard/miniconda3/envs/openfe/lib/python3.9/site-packages/openfecli/__init__.py)

I have a vague idea this might be because internally the import order is incorrect, I think I've fixed this with an __init__.py in commands/

Copy link
Contributor

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work nicely. Maybe --mol1 and --mol2 arguments would be clearer/better than just --mol twice. I'm not sure we actually guarantee that AtomMappers are symmetrical, (or if there is any ordering?) but this would make issues around that easier to debug.

@dwhswenson
Copy link
Member Author

@richardjgowers : Did you try installing again? I think that fixed this for me. I had this at one point, and just running pip install -e . fixed.

You should NOT need to update commands/__init__.py. That's the whole point of the plugin architecture. You only ever change the file that contains the plugin. If that isn't working, then the problem is with the plugin architecture itself.

@richardjgowers
Copy link
Contributor

@dwhswenson hmm yeah removing it works now. It might be something about a mix of develop and regular installs...


# TODO: next is (temporary?) hack: see
# https://github.com/OpenFreeEnergy/Lomap/issues/4
Chem.rdDepictor.Compute2DCoords(mol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix should probably get pushed back onto the LomapAtomMapper until the bug is fixed. (Check if any conformers, if not add 2d)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a more elegant way but I was doing this at one point:

    # Not sure if there is a better way to check if 2DCoords exist
    for mol in [mol_1, mol_2]:
        try:
            mol.GetConformers()[0]
        except IndexError:
            Chem.rdDepictor.Compute2DCoords(mol)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants