In [1]:
from rdkit import Chem

te (aromatic Te) is accepted. Here is an example with tellurophene-2-carboxylic acid:

In [2]:
m = Chem.MolFromSmiles('OC(=O)c1[te]ccc1')
m.GetAtomWithIdx(4).GetIsAromatic()

True

<- and -> create a dative bond between the atoms, direction does matter.

In [7]:
bipycu = Chem.MolFromSmiles('c1cccn->2c1-c1n->3cccc1.[Cu]23(Cl)Cl')
bipycu.GetBondBetweenAtoms(4,12).GetBondType()
Chem.rdchem.BondType.DATIVE
Chem.MolToSmiles(bipycu)

'Cl[Cu]1(Cl)<-n2ccccc2-c2ccccn->12'

Dative bonds have the special characteristic that they don’t affect the valence on the start atom, but do affect the end atom. So in this case, the N atoms involved in the dative bond have the valence of 3 that we expect from bipy, while the Cu has a valence of 4:

In [9]:
bipycu.GetAtomWithIdx(4).GetTotalValence()

3

In [10]:
bipycu.GetAtomWithIdx(12).GetTotalValence()

4

Ring closures

%(N) notation is supported for ring closures, where N is a single digit %(N) up to five digits %(NNNNN). Here is an example

In [11]:
m = Chem.MolFromSmiles('C%(1000)OC%(1000)')
m.GetAtomWithIdx(0).IsInRing()

True

In [12]:
m.GetAtomWithIdx(2).IsInRing()

True

Specifying atoms by atomic number
> The [#6] construct from SMARTS is supported in SMILES.

Quadruple bonds
> The token $ can be used to represent quadruple bonds in SMILES and SMARTS.

CXSMILES/CXSMARTS extensions
> The RDKit supports parsing and writing a subset of the extended SMILES/SMARTS functionality introduced by ChemAxon [4].

The features which are parsed include:

- atomic coordinates ()

- atomic values $_AV:

- atomic labels/aliases $ (recognized aliases are _AP, star_e, Q_e, QH_p, AH_P, X_p, XH_p, M_p, MH_p, *)

- atomic properties atomprop

- coordinate/dative bonds C (these are translated into dative bonds)

- radicals ^

- enhanced stereo (these are converted into StereoGroups)

- linknodes LN

- variable/multi-center attachments m

- ring bond count specifications rb

- non-hydrogen substitution count specifications s

- unsaturation specification u

- wedged bonds (only when atomic coordinates are present): wU, wD

- wiggly bonds w

- double bond stereo (only for ring bonds) c, t, ctu

- SGroup Data SgD

- polymer SGroups Sg

- SGroup Hierarchy SgH

The features which are written by :
> rdkit.Chem.rdmolfiles.MolToCXSmiles()

>  rdkit.Chem.rdmolfiles.MolToCXSmarts()

Note the specialized writer functions include:

- atomic coordinates

- atomic values

- atomic labels

- atomic properties

- dative bonds (only if dative bonds are not also being written to the SMILES/SMARTS)

- radicals

- enhanced stereo

- linknodes

- wedged bonds (only when atomic coordinates are also written)

- wiggly bonds

- double bond stereo (only for ring bonds)

- SGroup Data

- polymer SGroups

- SGroup Hierarchy

In [14]:
m = Chem.MolFromSmiles('OC')
m.GetAtomWithIdx(0).SetProp('p1','2')
m.GetAtomWithIdx(1).SetProp('p1','5')
m.GetAtomWithIdx(1).SetProp('p2','A1')
m.GetAtomWithIdx(0).SetProp('atomLabel','O1')
m.GetAtomWithIdx(1).SetProp('atomLabel','C2')
Chem.MolToCXSmiles(m)

'CO |$C2;O1$,atomProp:0.p1.5:0.p2.A1:1.p1.2|'

Reading molecule names

If the SMILES/SMARTS and the optional CXSMILES extensions are followed by whitespace and another string, the SMILES/SMARTS parsers will interpret this as the molecule name

In [16]:
m = Chem.MolFromSmiles('CO carbon monoxide')
m.GetProp('_Name')

'carbon monoxide'

In [17]:
m2 = Chem.MolFromSmiles('CO |$C2;O1$| carbon monoxide')
m2.GetAtomWithIdx(0).GetProp('atomLabel')

'C2'

In [18]:
m2.GetProp('_Name')

'carbon monoxide'

This can be disabled while still parsing the CXSMILES

In [19]:
ps = Chem.SmilesParserParams()
ps.parseName = False
m3 = Chem.MolFromSmiles('CO |$C2;O1$| carbon monoxide',ps)
m3.HasProp('_Name')

0

In [20]:
m3.GetAtomWithIdx(0).GetProp('atomLabel')

'C2'

Note that if we disable CXSMILES parsing but pass in a string which includes CXSMILES it will be interpreted as (part of) the name

In [21]:
ps = Chem.SmilesParserParams()
ps.allowCXSMILES = False
m4 = Chem.MolFromSmiles('CO |$C2;O1$| carbon monoxide',ps)
m4.GetProp('_Name')

'|$C2;O1$| carbon monoxide'

Finally, if you disable parsing of both CXSMILES and names, then extra text in the SMILES/SMARTS string will result in errors: .. doctest

In [22]:
ps = Chem.SmilesParserParams()
ps.allowCXSMILES = False
ps.parseName = False
m5 = Chem.MolFromSmiles('CO |$C2;O1$| carbon monoxide',ps)
m5 is None

[16:25:15] SMILES Parse Error: syntax error while parsing: CO |$C2;O1$| carbon monoxide
[16:25:15] SMILES Parse Error: Failed parsing SMILES 'CO |$C2;O1$| carbon monoxide' for input: 'CO |$C2;O1$| carbon monoxide'


True

In [23]:
m5 = Chem.MolFromSmiles('CO carbon monoxide',ps)
m5 is None

[16:25:53] SMILES Parse Error: syntax error while parsing: CO carbon monoxide
[16:25:53] SMILES Parse Error: Failed parsing SMILES 'CO carbon monoxide' for input: 'CO carbon monoxide'


True