Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap RDKit drawing code for AtomGroups #2900

Draft
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

cbouy
Copy link
Member

@cbouy cbouy commented Aug 11, 2020

Part of #2468
Depends on #2775

Changes made in this Pull Request:

  • In notebooks, by importing from MDAnalysis.visualization.RDKit import RDKitDrawer, the default representation of the AtomGroup is changed for small molecules: it displays an image of the AtomGroup using RDKit's drawing code (and the default representation for large AtomGroups)
    image
  • The default drawer displays PNG images for atomgroups below 200 atoms, removes non-polar hydrogens, kekulizes and flattens the structure, although all of these are configurable
    image
  • You can reuse the code that is used to generate the PNG/SVG by calling drawer.atomgroup_to_image(ag, ...). There is also some code to automatically display or save gifs from an atomgroup:
    image
  • And all of this can be cancelled at any point
    image

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Tagging @MDAnalysis/coredevs

I'm not sure if I've put the code in the right place but "visualization" made sense to me

@pep8speaks
Copy link

pep8speaks commented Aug 11, 2020

Hello @cbouy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 146:80: E501 line too long (98 > 79 characters)

Line 64:73: W291 trailing whitespace
Line 87:73: W291 trailing whitespace

Line 41:9: E731 do not assign a lambda expression, use a def
Line 44:1: W293 blank line contains whitespace
Line 46:9: E731 do not assign a lambda expression, use a def
Line 52:9: E731 do not assign a lambda expression, use a def
Line 56:46: W292 no newline at end of file

Line 72:62: W291 trailing whitespace

Comment last updated at 2020-08-31 16:50:33 UTC

@cbouy
Copy link
Member Author

cbouy commented Aug 11, 2020

I added a metaclass and a base class for "formatters" (i.e. classes that can modify the representation of other objects in interactive shells/notebooks), in case we want to add another visualization package like nglview to display atomgroups directly.

Also, I'm not sure how to test the AtomGroup representation and the generation of images...
Does anyone have some pointers ?

@codecov
Copy link

codecov bot commented Aug 18, 2020

Codecov Report

Merging #2900 into develop will increase coverage by 5.37%.
The diff coverage is 1.02%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2900      +/-   ##
===========================================
+ Coverage    87.20%   92.57%   +5.37%     
===========================================
  Files          167      189      +22     
  Lines        21744    24690    +2946     
  Branches      3186     3196      +10     
===========================================
+ Hits         18961    22857    +3896     
+ Misses        2258     1787     -471     
+ Partials       525       46     -479     
Impacted Files Coverage Δ
package/MDAnalysis/visualization/RDKit.py 0.00% <0.00%> (ø)
package/MDAnalysis/visualization/base.py 0.00% <0.00%> (ø)
package/MDAnalysis/__init__.py 92.68% <100.00%> (+0.18%) ⬆️
package/MDAnalysis/topology/tpr/obj.py 96.96% <0.00%> (-3.04%) ⬇️
package/MDAnalysis/lib/formats/libdcd.pyx 88.48% <0.00%> (-0.85%) ⬇️
package/MDAnalysis/lib/formats/libmdaxdr.pyx 89.81% <0.00%> (-0.16%) ⬇️
package/MDAnalysis/core/topology.py 100.00% <0.00%> (ø)
package/MDAnalysis/analysis/polymer.py 100.00% <0.00%> (ø)
package/MDAnalysis/coordinates/MMTF.py 100.00% <0.00%> (ø)
package/MDAnalysis/coordinates/NAMDBIN.py 100.00% <0.00%> (ø)
... and 124 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 563995c...1f9d3d0. Read the comment docs.

Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about testing images themselves. I don't think we do much/any of that for i.e., the streamlines modules at the moment. I think matplotlib does have a testing harness/exapmles for that kind of testing though.

package/MDAnalysis/visualization/RDKit.py Outdated Show resolved Hide resolved
package/MDAnalysis/visualization/base.py Outdated Show resolved Hide resolved
@IAlibay IAlibay added this to the 2.0 milestone Apr 6, 2021
@IAlibay IAlibay modified the milestones: 2.0, 2.1.0 Aug 17, 2021
@IAlibay
Copy link
Member

IAlibay commented Aug 17, 2021

I'm moving this to the 2.1.0 milestone

@ldomic
Copy link

ldomic commented Sep 2, 2021

Copy link

@ldomic ldomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of little nitpicky comments and questions about how this works! Appreciate that I am coming late to the party and would like to say that this is looking really good @cbouy

Parameters
----------
size : tuple
default width and height of images, in pixels
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to set up your own width and height for an image? If so, then default is not necessary - the fact that it is set up as an attribute of the function suggests the default :)

format = "RDKIT"

def __init__(self, size=(450, 250), max_atoms=200, removeHs=True,
kekulize=True, drawOptions=None, useSVG=False):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am of course biased, but if SVG were to be True by default (or even as the only option), it would become much easier to test and then we could leave it to the users to convert the image in the format of their choosing - PNG, JPEG etc - or if something like this is needed, to have a separate function that converts SVG to PNG or JPEG - it seems like there are some libraries that could do this easily. That way the testing burden is reduced, as you know that RDKitDrawer produces expected content and only the conversion remains untested,

size : tuple
default width and height of images, in pixels
max_atoms : int
AtomGroups with more atoms that this limit won't be displayed as
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question - I reckon this would be enough to show a regular lipid molecule without the hydrogens? With increased interest in lipids in MD, it could be useful to consider their drawings as well.

keep_3D : bool
Keep or remove 3D coordinates to generate the image
"""
mol = ag.convert_to("RDKIT")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the tricks I found while trying to make this happen a few years ago was that RDKit drew the molecules best when they were loaded as SMILES - I suppose they might already be, but if not it is worth a shot!

if isinstance(fp, BytesIO):
b64 = b64encode(fp.getvalue()).decode("ascii")
display(HTML(f"<img src='data:image/gif;base64,{b64}' />"))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
raise ValueError("Creation of GIF requires a trajectory as part of the Universe")

Or something along those lines

class _Formattermeta(type):
"""Automatic Formatter registration metaclass

.. versionadded:: 2.0.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these need to move up to 2.1.0 (provisionally?)

shell = get_ipython()
except NameError:
shell = None
warnings.warn("You should be in an interactive python shell (IPython or "
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I not create visualizations outside of IPython ecosystem? If e.g. I would like to create a pipeline such as LINTools used to be, I might struggle with this requirement, as it requires chaining functions in this PR and new ones. I do wonder if this requirement could make the visualization packages harder to use..


def test_ag_to_img_svg(self, u, drawer):
svg = drawer.atomgroup_to_image(u.atoms, useSVG=True)
assert len(svg) == 1275
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be extended to check the contents of the file as well as the length :)

@IAlibay IAlibay modified the milestones: 2.1.0, 3.0 Jun 2, 2022
@hmacdope hmacdope added the stale label Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC GSoC project stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants