Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly using an RDKit mol with the mol_col parameter #3

Merged
merged 5 commits into from
Mar 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ mols2grid is a Python chemical viewer for 2D structures of small molecules, base
## Installation ℹ️
---

mols2grid was developped for Python 3.6+ and requires rdkit, pandas and jinja2 as dependencies.
mols2grid was developped for Python 3.6+ and requires rdkit (>=2019.09.1), pandas and jinja2 as dependencies.

To install mols2grid from a clean conda environment:
```shell
conda install -c conda-forge rdkit
conda install -c conda-forge 'rdkit>=2019.09.1'
pip install mols2grid
```

Expand Down Expand Up @@ -44,20 +44,21 @@ mols2grid.display("path/to/molecules.sdf",
#### Input parameters

You can setup the grid from various inputs:
* a pandas **DataFrame** (with a column of SMILES, controlled by the `smiles_col="SMILES"` parameter),
* a pandas **DataFrame** (with a column of SMILES or RDKit molecules, controlled by the `smiles_col` and `mol_col` parameters),
* a list of **RDKit molecules** (with properties accessible through the `mol.GetPropsAsDict()` method),
* or an **SDF file**

You can also rename each field of your input with the `mapping` parameter. Please note that 2 fields are automatically added regardless of your input: `SMILES` and `img`. If a "SMILES" field already exists, it will not be overwritten.
You can also rename each field of your input with the `mapping` parameter. Please note that 3 fields are automatically added regardless of your input: `mols2grid-id`, `SMILES` and `img`. If a "SMILES" field already exists, it will not be overwritten.

#### Parameters for the drawing of each molecule

* `useSVG=True`: use SVG images or PNG
* `coordGen=True`: use the coordGen library instead of the RDKit one to depict the molecules in 2D
* `size=(160, 120)`: size of each image
* `use_coords=True`: use the coordinates of the input molecules if available
* `remove_Hs=True`: remove hydrogen atoms from the drawings
* and all the arguments available in RDKit's [MolDrawOptions](https://www.rdkit.org/docs/source/rdkit.Chem.Draw.rdMolDraw2D.html#rdkit.Chem.Draw.rdMolDraw2D.MolDrawOptions), like `addStereoAnnotation=True`


#### Parameters for the grid

You can control the general look of the document through the `template` argument:
Expand Down
2 changes: 1 addition & 1 deletion demo.ipynb

Large diffs are not rendered by default.

11 changes: 9 additions & 2 deletions mols2grid/dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,20 @@ def display(arg, **kwargs):
----------
arg : pandas.DataFrame, SDF file or list of molecules
The input containing your molecules
smiles_col : str ("SMILES")
If a pandas DataFrame is used, name of the columns with SMILES
smiles_col : str or None ("SMILES")
If a pandas DataFrame is used, name of the column with SMILES
mol_col : str or None (None)
If a pandas DataFrame is used, name of the column with RDKit molecules
useSVG : bool (True)
Use SVG images or PNG
coordGen : bool (True)
Use the coordGen library instead of the RDKit one to depict the
molecules in 2D
use_coords : bool
Use the coordinates of the molecules (only relevant when an SDF file, a
list of molecules or a DataFrame of RDKit molecules were used as input)
remove_Hs : bool
Remove hydrogen atoms from the drawings
size : tuple ((160, 120))
Size of each image
mapping : dict (None)
Expand Down
58 changes: 39 additions & 19 deletions mols2grid/molgrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
from base64 import b64encode
from html import escape
from pathlib import Path
from copy import deepcopy
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Draw
from jinja2 import Environment, FileSystemLoader
from .utils import requires, tooltip_formatter
from .utils import requires, tooltip_formatter, mol_to_record
try:
from IPython.display import HTML
except ModuleNotFoundError:
Expand All @@ -24,16 +25,19 @@ class MolGrid:
"""Class that handles drawing molecules, rendering the HTML document and
saving or displaying it in a notebook
"""
def __init__(self, df, smiles_col="SMILES", coordGen=True, useSVG=True,
mapping=None, **kwargs):
def __init__(self, df, smiles_col="SMILES", mol_col=None, coordGen=True,
useSVG=True, mapping=None, **kwargs):
"""
Parameters
----------
df : pandas.DataFrame
Dataframe containing a SMILES column and some other information
about each molecule
smiles_col : str
Name of the SMILES column in the dataframe
smiles_col : str or None
Name of the SMILES column in the dataframe, if available
mol_col : str or None
Name of an RDKit molecule column. If available, coordinates and
atom/bonds annotations from this will be used for depiction
coordGen : bool
Sets whether or not the CoordGen library should be preferred to the
RDKit depiction library
Expand All @@ -44,14 +48,22 @@ def __init__(self, df, smiles_col="SMILES", coordGen=True, useSVG=True,
kwargs : object
Arguments passed to the `draw_mol` method
"""
if not (smiles_col or mol_col):
raise ValueError("One of `smiles_col` or `mol_col` must be set")
Draw.rdDepictor.SetPreferCoordGen(coordGen)
self.useSVG = useSVG
dataframe = df.copy()
if mapping:
dataframe.rename(columns=mapping, inplace=True)
dataframe["img"] = dataframe[smiles_col].apply(self.smi_to_img,
**kwargs)
if smiles_col and not mol_col:
mol_col = "mol"
dataframe[mol_col] = dataframe[smiles_col].apply(Chem.MolFromSmiles)
elif mol_col and not smiles_col:
dataframe["SMILES"] = dataframe[mol_col].apply(Chem.MolToSmiles)
dataframe["img"] = dataframe[mol_col].apply(self.mol_to_img, **kwargs)
dataframe["mols2grid-id"] = list(range(len(dataframe)))
self.dataframe = dataframe
self.mol_col = mol_col

@classmethod
def from_mols(cls, mols, **kwargs):
Expand All @@ -65,10 +77,9 @@ def from_mols(cls, mols, **kwargs):
kwargs : object
Other arguments passed on initialization
"""
df = pd.DataFrame([{"SMILES": Chem.MolToSmiles(mol),
**mol.GetPropsAsDict()}
for mol in mols if mol])
return cls(df, **kwargs)
df = pd.DataFrame([mol_to_record(mol) for mol in mols if mol])
mol_col = kwargs.pop("mol_col", "mol")
return cls(df, mol_col=mol_col, **kwargs)

@classmethod
def from_sdf(cls, sdf_file, **kwargs):
Expand All @@ -81,10 +92,10 @@ def from_sdf(cls, sdf_file, **kwargs):
kwargs : object
Other arguments passed on initialization
"""
df = pd.DataFrame([{"SMILES": Chem.MolToSmiles(mol),
**mol.GetPropsAsDict()}
df = pd.DataFrame([mol_to_record(mol)
for mol in Chem.SDMolSupplier(sdf_file) if mol])
return cls(df, **kwargs)
mol_col = kwargs.pop("mol_col", "mol")
return cls(df, mol_col=mol_col, **kwargs)

@property
def template(self):
Expand All @@ -101,7 +112,8 @@ def template(self, value):
"Use one of 'pages' or 'table'")
self._template = value

def draw_mol(self, mol, size=(160, 120), **kwargs):
def draw_mol(self, mol, size=(160, 120), use_coords=True, remove_Hs=True,
**kwargs):
"""Draw a molecule

Parameters
Expand All @@ -110,6 +122,10 @@ def draw_mol(self, mol, size=(160, 120), **kwargs):
The molecule to draw
size : tuple
The size of the drawing canvas
use_coords : bool
Use the 2D or 3D coordinates of the molecule
remove_Hs : bool
Remove hydrogen atoms from the drawing
**kwargs : object
Attributes of the rdkit.Chem.Draw.rdMolDraw2D.MolDrawOptions class
like `fixedBondLength=35, bondLineWidth=2`
Expand All @@ -127,6 +143,11 @@ def draw_mol(self, mol, size=(160, 120), **kwargs):
for key, value in kwargs.items():
setattr(opts, key, value)
d2d.SetDrawOptions(opts)
if not use_coords:
mol = deepcopy(mol)
mol.RemoveAllConformers()
if remove_Hs:
mol = Chem.RemoveHs(mol)
d2d.DrawMolecule(mol)
d2d.FinishDrawing()
return d2d.GetDrawingText()
Expand All @@ -145,7 +166,7 @@ def smi_to_img(self, smi, **kwargs):
the molecule"""
mol = Chem.MolFromSmiles(smi)
return self.mol_to_img(mol, **kwargs)

def render(self, template="pages", **kwargs):
"""Returns the HTML document corresponding to the "pages" or "table"
template. See `to_pages` and `to_table` for the list of arguments
Expand Down Expand Up @@ -215,7 +236,7 @@ def to_pages(self, subset=None, tooltip=None,
if you want to color the text corresponding to the "Solubility"
column in your dataframe.
"""
df = self.dataframe.copy()
df = self.dataframe.drop(columns=self.mol_col).copy()
if subset is None:
subset = df.columns.tolist()
subset = [subset.pop(subset.index("img"))] + subset
Expand Down Expand Up @@ -253,7 +274,6 @@ def to_pages(self, subset=None, tooltip=None,
value_names = value_names[:-1] + f", {{ attr: 'style', name: {name!r} }}]"

item = f'<div class="cell" data-mols2grid-id="0">{"".join(content)}</div>'
df["mols2grid-id"] = [str(i) for i in range(len(df))]
df = df[final_columns].rename(columns=column_map)

template = env.get_template('pages.html')
Expand Down Expand Up @@ -328,7 +348,7 @@ def to_table(self, subset=None, tooltip=None, n_cols=6,
"""
tr = []
data = []
df = self.dataframe
df = self.dataframe.drop(columns=self.mol_col)

if subset is None:
subset = df.columns.tolist()
Expand Down
9 changes: 8 additions & 1 deletion mols2grid/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from functools import wraps
from importlib.util import find_spec
from rdkit.Chem import MolToSmiles

def requires(module):
def inner(func):
Expand Down Expand Up @@ -31,4 +32,10 @@ def tooltip_formatter(s, subset, fmt, style):
for k, v in s[subset].to_dict().items():
v = f'<span style="{style[k](v)}">{v}</span>' if style.get(k) else v
items.append(fmt.format(key=k, value=v))
return "<br>".join(items)
return "<br>".join(items)

def mol_to_record(mol):
"""Function to create a dict of data from an RDKit molecule"""
return {"SMILES": MolToSmiles(mol),
**mol.GetPropsAsDict(),
"mol": mol}