Skip to content

IvanChernyshov/cTopo

Repository files navigation

cTopo

cTopo is a small Python package for analyzing the chemical space of multidentate ligands (and, optionally, coordination complexes). It focuses on concepts that matter for coordination chemistry—donor atoms, ligand skeletons, and reduced topologies—and provides tooling to:

  • organize ligand datasets hierarchically (denticity => topology => skeleton => ligand),
  • visualize these abstractions (SVG/SMILES),
  • compute role-aware fingerprints using donor/skeleton/substituent atom typing,
  • extract ligands from datasets of metal complexes.

The core idea: unlike classic “organic” chemical space maps (2D embeddings of ECFP), ligand behavior in complexes is heavily controlled by the coordination cage formed by the metal + ligand skeleton. cTopo makes that cage explicit and measurable.


Key concepts

Donor atoms

A ligand is defined by a set of donor atom indices (e.g., N/O/S/P). In cTopo, donors are either:

  • explicitly provided (via ligand_from_mol(..., donor_atoms=...)), or
  • marked in SMILES using atom-map numbers (ligand_from_smiles).

Skeleton

The ligand skeleton is the donor atoms plus the atoms that connect donors to each other—formally, the union of atoms on shortest paths between all donor pairs.

Topology

The topology is a reduced representation of the skeleton where donor-to-donor paths are contracted to the shortest non-reducible form (removing purely “length” information while keeping branching/connection patterns).

For many denticities this produces only a few common topologies (e.g. for tridentates: “linear” vs “tripod”), which makes it ideal for dataset overview.


Installation

With pip (package installed)

pip install ctopo

RDKit note

ctopo depends on RDKit and NetworkX. If RDKit isn’t already available in your environment, the most reliable route is Conda:

conda create -n ctopo python=3.10 -y
conda activate ctopo
conda install -c conda-forge rdkit networkx -y
pip install ctopo

Development install

pip install -e .

Quickstart

1) Build a ligand (donors from atom-map numbers)

ligand_from_smiles treats atoms with :1, :2, … as donors.

from ctopo import ligand_from_smiles

# Ethylenediamine (bidentate) with both nitrogens marked as donors
lig = ligand_from_smiles("[NH2:1]CC[NH2:2]")

print(lig.denticity)           # 2
print(sorted(lig.donor_atoms)) # donor atom indices in the RDKit molecule

2) Visualize ligand / skeleton / topology (SVG)

from ctopo import ligand_from_smiles

lig = ligand_from_smiles("[NH2:1]CC[NH:2]CC[NH2:3]")  # diethylenetriamine (example)

v_lig = lig.visualize_ligand()
v_skel = lig.visualize_skeleton()
v_topo = lig.visualize_topology()

# In a notebook you can do:
# from IPython.display import SVG, display
# display(SVG(v_topo.svg))

Each visualize_*() returns a simple container with:

  • smiles (for that abstraction),
  • svg (a depiction suitable for HTML reports or dataset browsers).

Dataset “chemical space” as a hierarchy

Build a tree grouped by abstraction levels and export it to a single self-contained HTML.

from pathlib import Path
from ctopo import ligand_from_smiles
from ctopo.trees import build_ligand_tree, tree_to_html

ligands = [
    ligand_from_smiles("[NH2:1]CC[NH2:2]"),
    ligand_from_smiles("[NH2:1]CC[NH:2]CC[NH2:3]"),
    ligand_from_smiles("[O-:1]C(=O)CC(=O)[O-:2]"),  # example bidentate carboxylate
]

tree = build_ligand_tree(ligands)
html = tree_to_html(tree)

Path("ligand_tree.html").write_text(html, encoding="utf-8")
print("Wrote ligand_tree.html")

Default grouping levels are:

denticity → topology → skeleton → skeleton+bonds → skeleton+donors+bonds → ligand

This is designed to answer: “What does my dataset actually contain?” in a coordination-chemistry-relevant way.


Complexes: encoding convention (dative bonds)

cTopo can also construct a Complex if coordination is represented using RDKit dative bonds:

  • coordination bonds must be dative,
  • each dative bond must be oriented donor → metal (metal is the end atom),
  • a metal center must not have non-dative bonds to non-metals (metal–metal bonds may exist but must be non-dative).
from ctopo import complex_from_smiles

# Minimal sketch example (exact SMILES depends on your RDKit encoding)
# Donor -> metal must be a dative bond.
cx = complex_from_smiles("[NH3]->[Cu+2]<-[NH3]")

print(cx.metal_atoms)
print(cx.donor_atoms)

If you already have an RDKit molecule and metal indices, you can also build directly via complex_from_mol(mol, metal_atoms=...).


Role-aware fingerprints (donor / skeleton / substituent)

cTopo assigns each atom an AtomType such as:

  • DONOR
  • SKELETON
  • SUBSTITUENT (and for complexes: CENTER for metals)

You can then compute fingerprints that focus on what matters (e.g. skeleton-only with skeleton bond types preserved).

from ctopo import ligand_from_smiles
from ctopo.descriptors import MorganSpec, make_fingerprinter, DEFAULT_PROPERTIES
from ctopo.distances import tanimoto_similarity_bits

lig1 = ligand_from_smiles("[NH2:1]CC[NH2:2]")
lig2 = ligand_from_smiles("[NH2:1]CCC[NH2:2]")

fp = make_fingerprinter(
    kind="morgan",
    spec=MorganSpec(radius=2, use_chirality=False),
    atomic_properties=DEFAULT_PROPERTIES,
    graph_view="skeleton",        # focus on the cage-defining part
    bond_mode="skeleton_only",    # keep bond types only in skeleton
    output="bits",
    fp_size=2048,
)

f1 = fp(lig1)
f2 = fp(lig2)

print(tanimoto_similarity_bits(f1, f2))

Extract ligands from complexes

To go from a complex dataset to unique ligands:

from ctopo import complex_from_smiles
from ctopo.fragments import ligands_from_complex

cx = complex_from_smiles("[NH3]->[Cu+2]<-[NH3]")  # example sketch
ligs = ligands_from_complex(cx)

print(len(ligs))

Note: bridging ligands are not handled specially in v1; removing metals may split a bridging ligand into multiple fragments.


Project status

cTopo is a research-oriented library aimed at coordination-chemistry workflows. The API is compact and intended to support:

  • ligand dataset browsing and reporting,
  • reproducible topology/skeleton extraction,
  • feature engineering for ML on complexes/ligands.

License

MIT License (see LICENSE.md).


Citing

If you use cTopo in academic work, please cite the associated paper (citation details to be added once finalized).

About

Python package for analyzing the chemical space of multidentate ligands

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors