Skip to content

A lightweight Python library and CLI tool for converting PDB Chemical Component Dictionary (CCD) files to RDKit molecule objects.

License

Notifications You must be signed in to change notification settings

N283T/ccd2rdmol

Repository files navigation

ccd2rdmol

CI codecov PyPI version License: MIT Python 3.10+

A lightweight Python library and CLI tool for converting PDB Chemical Component Dictionary (CCD) files to RDKit molecule objects.

This project is a simplified implementation inspired by pdbeccdutils, focusing solely on CCD to RDKit conversion with 3D conformer support.

Features

  • Fast CIF parsing using gemmi
  • Conversion to RDKit molecule objects
  • Support for both Ideal and Model 3D conformers
  • Automatic metal bond to dative bond conversion
  • Stereochemistry assignment from 3D coordinates
  • Deuterium isotope handling
  • Degenerate conformer detection and rejection
  • CLI tool with rich output

Installation

# Library only
uv add ccd2rdmol

# With CLI support
uv add ccd2rdmol[cli]

Or with pip:

pip install ccd2rdmol
pip install ccd2rdmol[cli]

For development:

git clone https://github.com/N283T/ccd2rdmol.git
cd ccd2rdmol
uv sync  # CLI is included in dev dependencies

Quick Start

from ccd2rdmol import read_ccd_file

result = read_ccd_file("ATP.cif")
print(f"Atoms: {result.mol.GetNumAtoms()}")
print(f"Sanitized: {result.sanitized}")

Usage

Reading from a CIF File

from ccd2rdmol import read_ccd_file

# Default: sanitize, add conformers, remove hydrogens
result = read_ccd_file("ATP.cif")
mol = result.mol

print(f"Atoms: {mol.GetNumAtoms()}")
print(f"Bonds: {mol.GetNumBonds()}")
print(f"Conformers: {mol.GetNumConformers()}")  # 2 (IDEAL + MODEL)
print(f"Sanitized: {result.sanitized}")

# With options
result = read_ccd_file(
    "ATP.cif",
    sanitize_mol=True,      # Sanitize molecule (default: True)
    add_conformers=True,    # Add 3D conformers (default: True)
    remove_hydrogens=True,  # Remove hydrogens (default: True)
)

Reading from a gemmi CIF Block

import gemmi
from ccd2rdmol import read_ccd_block

doc = gemmi.cif.read("components.cif")
for block in doc:
    result = read_ccd_block(block)
    print(f"{block.name}: {result.mol.GetNumAtoms()} atoms")

Low-Level API: chemcomp_to_mol

import gemmi
from ccd2rdmol import chemcomp_to_mol

doc = gemmi.cif.read("ATP.cif")
block = doc.sole_block()
cc = gemmi.make_chemcomp_from_block(block)

result = chemcomp_to_mol(
    cc, block,
    sanitize_mol=False,       # Skip sanitization
    add_conformers=True,
    remove_hydrogens=False,   # Keep all hydrogens
)

Generating SMILES and InChI

from rdkit import Chem
from rdkit.Chem.inchi import MolToInchi
from ccd2rdmol import read_ccd_file

result = read_ccd_file("ATP.cif")

smiles = Chem.MolToSmiles(result.mol)
inchi = MolToInchi(result.mol)

print(f"SMILES: {smiles}")
print(f"InChI: {inchi}")

Accessing Conformer Coordinates

from ccd2rdmol import read_ccd_file

result = read_ccd_file("ATP.cif", add_conformers=True)
mol = result.mol

for conf in mol.GetConformers():
    name = conf.GetProp("name")  # "IDEAL" or "MODEL"
    print(f"\n{name} conformer:")
    for i in range(mol.GetNumAtoms()):
        pos = conf.GetAtomPosition(i)
        atom = mol.GetAtomWithIdx(i)
        print(f"  {atom.GetSymbol()} ({pos.x:.3f}, {pos.y:.3f}, {pos.z:.3f})")

Handling Conversion Errors

from ccd2rdmol import read_ccd_file

result = read_ccd_file("complex_molecule.cif")

if result.errors:
    print("Errors:", result.errors)

if result.warnings:
    print("Warnings:", result.warnings)

if not result.sanitized:
    print("Sanitization failed — molecule may have valence issues")

API Reference

Functions

read_ccd_file(path, *, sanitize_mol=True, add_conformers=True, remove_hydrogens=True) → ConversionResult

Read a CCD CIF file and convert to RDKit molecule.

Parameter Type Default Description
path str Path to CIF file
sanitize_mol bool True Sanitize the molecule (fix valence, kekulize)
add_conformers bool True Add IDEAL and MODEL 3D conformers
remove_hydrogens bool True Remove hydrogen atoms from the molecule

Raises FileNotFoundError if file does not exist.

read_ccd_block(cif_block, *, sanitize_mol=True, add_conformers=True, remove_hydrogens=True) → ConversionResult

Convert a gemmi.cif.Block to RDKit molecule. Same parameters as read_ccd_file except takes a pre-parsed CIF block.

chemcomp_to_mol(cc, cif_block, *, sanitize_mol=True, add_conformers=True, remove_hydrogens=True) → ConversionResult

Convert a gemmi.ChemComp and gemmi.cif.Block to RDKit molecule. Lowest-level API for maximum control.

Data Classes

ConversionResult

Frozen dataclass returned by all conversion functions.

Field Type Description
mol Chem.Mol RDKit molecule object
sanitized bool Whether sanitization succeeded
errors list[str] Errors encountered during conversion
warnings list[str] Warnings (e.g., missing conformer data)

SanitizationResult

Frozen dataclass returned by sanitize().

Field Type Description
mol Chem.Mol Sanitized molecule (always a copy)
success bool Whether sanitization succeeded

How It Works

The conversion pipeline:

  1. Parse CIF — gemmi reads the CIF file and creates a ChemComp (atoms, bonds, charges) and a cif.Block (coordinate data)
  2. Build molecule — Atoms are added to an RDKit RWMol with element types, charges, and isotope labels (Deuterium → isotope 2). Bonds are mapped from gemmi bond types to RDKit bond types via BOND_TYPE_MAP
  3. Set hydrogen flags — Atoms without explicit hydrogen neighbors are flagged NoImplicit=True to prevent RDKit from adding implicit hydrogens
  4. Add conformers — IDEAL and MODEL 3D coordinates are read from the CIF coordinate columns. Conformers with all-missing coordinates or degenerate positions (>1 atom at origin) are rejected
  5. Sanitize — The sanitizer fixes valence errors caused by metal-ligand bonds by converting them to dative bonds. Uses Chem.DetectChemistryProblems() to identify problematic atoms and iteratively fixes them (up to 11 attempts). The original molecule is never modified
  6. Assign stereochemistryAssignStereochemistryFrom3D is called using the IDEAL conformer (preferred) or MODEL conformer
  7. Remove hydrogens — Optionally strips hydrogen atoms from the final molecule

Comparison with pdbeccdutils

ccd2rdmol pdbeccdutils
Focus CCD → RDKit conversion only Full CCD processing toolkit
Dependencies gemmi + rdkit gemmi + rdkit + scipy + numpy + ...
Scope Single molecules from CIF Depictions, scaffolds, fragments, PDB integration
Install size Minimal ~50+ transitive dependencies
Use case "I just need an RDKit Mol from a CCD entry" Full cheminformatics pipeline

If you only need to convert CCD entries to RDKit molecules, ccd2rdmol provides a simpler, lighter alternative.

CLI

Note: CLI requires extra dependencies. Install with pip install ccd2rdmol[cli]

# Output SMILES to stdout
ccd2rdmol convert ATP.cif

# Write to MOL file
ccd2rdmol convert ATP.cif -o ATP.mol

# Write to SDF format
ccd2rdmol convert ATP.cif -o ATP.sdf

# Output InChI
ccd2rdmol convert ATP.cif -f inchi

# Keep hydrogen atoms
ccd2rdmol convert ATP.cif --keep-hydrogens

# Show verbose information
ccd2rdmol convert ATP.cif -v

# Show molecule information only
ccd2rdmol info ATP.cif

CLI Options

ccd2rdmol convert [OPTIONS] INPUT_FILE

Arguments:
  INPUT_FILE  Input CCD CIF file path [required]

Options:
  -o, --output PATH       Output file path (.mol, .sdf)
  -f, --format TEXT       Output format (mol, sdf, smiles, inchi)
  --no-sanitize           Skip sanitization step
  --no-conformers         Skip adding 3D conformers
  -H, --keep-hydrogens    Keep hydrogen atoms
  -v, --verbose           Show detailed information
  --help                  Show help message

Development

# Install dev dependencies
uv sync

# Format code
uv run ruff format src/ tests/

# Lint
uv run ruff check src/ tests/

# Lint and auto-fix
uv run ruff check --fix src/ tests/

# Type check
uv run ty check src/

# Run tests
uv run pytest tests/ -v

# Run tests with coverage
uv run pytest tests/ -v --cov=ccd2rdmol --cov-report=term-missing

# Multi-version testing with nox (3.10, 3.11, 3.12, 3.13, 3.14)
uv run nox

Acknowledgments

This project is inspired by and built upon concepts from pdbeccdutils by PDBe (Protein Data Bank in Europe). Test data files are derived from the pdbeccdutils test suite.

We thank the PDBe team for their excellent work on chemical component processing tools.

License

MIT License

Test data files in tests/data/ are from pdbeccdutils (Apache-2.0 License).

About

A lightweight Python library and CLI tool for converting PDB Chemical Component Dictionary (CCD) files to RDKit molecule objects.

Topics

Resources

License

Stars

Watchers

Forks

Languages