# __Pandas examples with ChemAxon molecules__

You can see simpler examples of using pandas.DataGrame.read_table with ChemAxon molecules in the [Calculators](02_calculators.ipynb) and [Molecular similiarity notebook](03_molecular_similarity.ipynb). But here are some additional examples showing how to work with ChemAxon molecules in pandas DataFrames.

By default, molecules are printed in the _cxsmiles_ format. In case of issues with that, it is written in the _cxsmarts_ format. If none of them are possible for some reason, representation falls back to use the original format, which was recognized during the import process.

In [2]:
import pandas as pd
from chemaxon import import_mol, open_for_import

mol = import_mol('CC(=O)NC1=CC=C(O)C=C1')

mol_lst = []
with open_for_import('/home/lnagy/IdeaProjects/python-api/chemaxon/resources/test.sdf') as mol_importer:
    for m in mol_importer:
        mol_lst.append(m)

d = {'molecule': [mol] + mol_lst }
df_mols = pd.DataFrame(data=d)

df_mols

Unnamed: 0,molecule
0,"CC(=O)NC1=CC=C(O)C=C1 |c:9,t:4,6|"
1,[Li]C1=C(I)C(Br)=C(F)C2=C1C(Cl)=C(N)C(O)=C2S |...


In case of creating HTML output from pandas.DataFrame, you can use the helper function _mol_to_svg_formatter_ to visualize the molecules as SVG images.

In [3]:
from chemaxon import mol_to_svg_formatter

df_mols.to_html('web_view.html', escape=False, formatters=dict(molecule=mol_to_svg_formatter))

Since the __Molecule__ objects are being stored in the __DataFrame__, not just their representation, you can easily calculate properties for them and store the results in new columns.

In [4]:
from chemaxon import logp

df_mols['LogP'] = df_mols['molecule'].apply(lambda m: logp(m))
df_mols

Unnamed: 0,molecule,LogP
0,"CC(=O)NC1=CC=C(O)C=C1 |c:9,t:4,6|",0.92
1,[Li]C1=C(I)C(Br)=C(F)C2=C1C(Cl)=C(N)C(O)=C2S |...,4.06


You can also easily create new molecule columns based on existing columns, that contain molecules in any supported format.

In [6]:
d = {'SMILES': ['CN1C=NC2=C1C(=O)N(C)C(=O)N2C'], 'name': ['coffein'] }
df = pd.DataFrame(data=d)

df = pd.concat([df, pd.DataFrame.from_records([{'SMILES' : 'CC[C@H](C)[C@@H]1NC(=O)[C@H](CC2=CC=C(O)C=C2)NC(=O)[C@@H](N)CSSC[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(N)=O)NC1=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(=O)NCC(N)=O', 'name' : 'oxytocin'}])])

df['molecule'] = df['SMILES'].apply(lambda s: import_mol(s))
df


Unnamed: 0,SMILES,name,molecule
0,CN1C=NC2=C1C(=O)N(C)C(=O)N2C,coffein,"CN1C=NC2=C1C(=O)N(C)C(=O)N2C |c:2,4|"
0,CC[C@H](C)[C@@H]1NC(=O)[C@H](CC2=CC=C(O)C=C2)N...,oxytocin,CC[C@H](C)[C@@H]1NC(=O)[C@H](CC2=CC=C(O)C=C2)N...


You can also use molecule properties in __DataFrame__ objects:

In [11]:
from chemaxon import open_for_import

with open_for_import('mol_with_properties.mrv') as mol_iterator:
    mols = list(mol_iterator)

d = {'molecule': mols }
df_props = pd.DataFrame(data=d)
df_props['string property'] = df_props['molecule'].apply(lambda m: m.get_property('test_str_property'))
df_props['int array property'] = df_props['molecule'].apply(lambda m: m.get_property('test_int_array_property'))

df_props

Unnamed: 0,molecule,string property,int array property
0,"C1=CC=C(C=C1)C1=CC=C(C=C1)C1=CC=CC=C1 |c:0,2,4...",asd,"[1, 2, 3, 4, 5]"
