#CafChem tools docking and rescoring with the UMA MLIP

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MauricioCafiero/CafChem/blob/main/notebooks/Rescore_Docking_UMA_CafChem.ipynb)

## This notebook allows you to:
- dock a single SMILES string, a list of string, or a CSV file with SMILES in one column.
- save poses as SDF files.
- Calculate the interaction between the ligand and the protein using Meta's UMA MLIP

## Requirements:
- This notebook will install deepchem, dockstring, openBabel, Fairchem and py3Dmol
- It will pull the CafChem tools from Github.
- It will install all needed libraries.
- You need to have a HF_Token set as a secret to access the UMA MLIP.

# set-up

This block:

- Loads all needed modules/libraries
    

    


### Install a few libraries

In [1]:
! pip install deepchem
! pip install dockstring
! pip install openbabel-wheel

Collecting deepchem
  Downloading deepchem-2.8.0-py3-none-any.whl.metadata (2.0 kB)
Collecting rdkit (from deepchem)
  Downloading rdkit-2025.3.5-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.1 kB)
Downloading deepchem-2.8.0-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rdkit-2025.3.5-cp311-cp311-manylinux_2_28_x86_64.whl (36.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m36.3/36.3 MB[0m [31m62.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdkit, deepchem
Successfully installed deepchem-2.8.0 rdkit-2025.3.5
Collecting dockstring
  Downloading dockstring-0.3.4-py3-none-any.whl.metadata (19 kB)
Downloading dockstring-0.3.4-py3-none-any.whl (4.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.4/4.4 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dockstring
Successfully ins

In [2]:
! pip install py3Dmol
! pip install fairchem-core

Collecting py3Dmol
  Downloading py3dmol-2.5.2-py2.py3-none-any.whl.metadata (2.1 kB)
Downloading py3dmol-2.5.2-py2.py3-none-any.whl (7.2 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.5.2
Collecting fairchem-core
  Downloading fairchem_core-2.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting ase-db-backends>=0.10.0 (from fairchem-core)
  Downloading ase_db_backends-0.10.0-py3-none-any.whl.metadata (600 bytes)
Collecting ase>=3.25.0 (from fairchem-core)
  Downloading ase-3.25.0-py3-none-any.whl.metadata (4.2 kB)
Collecting clusterscope (from fairchem-core)
  Downloading clusterscope-0.0.10-py3-none-any.whl.metadata (3.1 kB)
Collecting e3nn>=0.5 (from fairchem-core)
  Downloading e3nn-0.5.6-py3-none-any.whl.metadata (5.4 kB)
Collecting hydra-core (from fairchem-core)
  Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting lmdb (from fairchem-core)
  Downloading lmdb-1.7.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadat

### Import libraries, pull CafChem from Github

In [13]:
!git clone https://github.com/MauricioCafiero/CafChem.git

Cloning into 'CafChem'...
remote: Enumerating objects: 610, done.[K
remote: Counting objects: 100% (187/187), done.[K
remote: Compressing objects: 100% (142/142), done.[K
remote: Total 610 (delta 134), reused 45 (delta 45), pack-reused 423 (from 1)[K
Receiving objects: 100% (610/610), 40.80 MiB | 33.27 MiB/s, done.
Resolving deltas: 100% (352/352), done.


In [4]:
import torch
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from google.colab import files
from fairchem.core import FAIRChemCalculator, pretrained_mlip
import CafChem.CafChemReDock as ccr

cpuCount = os.cpu_count()
print(cpuCount)

Instructions for updating:
experimental_relax_shapes is deprecated, use reduce_retracing instead


2


## Set-up Fairchem
- Must have HF_TOKEN saved as a secret

In [5]:
device = "cuda" if torch.cuda.is_available() else "cpu"

predictor = pretrained_mlip.get_predict_unit("uma-s-1", device=device)
calculator = FAIRChemCalculator(predictor, task_name="omol")
model = "UMA-OMOL"

checkpoints/uma-s-1.pt:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

iso_atom_elem_refs.yaml:   0%|          | 0.00/9.00k [00:00<?, ?B/s]

# Calculations

## Dock molecules
- tools available include ccr.dock_dataframe, ccr.dock_list and ccr.dock_smiles
- for each you must supply as arguments the SMILES input (either a filename, a list, or a SMILES string), the target protein, and the number of CPU cores to use. For ccr.dock_dataframe, you must also provide the key for the SMILES column in the CSV file.
- xyz structures can be visualized via the ccr.visualize_molecule tool. This accepts an XYZ string as an argument. This may be easily extracted from an XYZ file as seen below.

In [None]:
scores = ccr.dock_dataframe("file.csv","HMGCR",cpuCount, "smiles",)
print(scores)

Docking 1 molecules in HMGCR.
Docking molecule 1.
SDF file written for score -4.5
[-4.5]


In [None]:
df = pd.read_csv("file.csv")
smiles_list = df["smiles"].tolist()
scores = ccr.dock_list(smiles_list,"HMGCR",cpuCount)
print(scores)

Docking 1 molecules in HMGCR.
Docking molecule 1.
SDF file written for score -4.5
[-4.5]


In [9]:
statin = "OC(=O)C[C@H](O)C[C@H](O)\C=C\c1c(C(C)C)nc(N(C)S(=O)(=O)C)nc1c2ccc(F)cc2"
resveratrol  ="Oc1ccc(cc1)\C=C\c2cc(O)cc(O)c2"
score = ccr.dock_smiles(resveratrol,"DRD2",cpuCount)
print(f"score: {score}")

Docking molecule.
SDF file written for score -8.5
score: -8.5


## Calculate interaction energies between a docking pose and the protein using Meta's UMA MLIP
- If CafChem has an XYZ QM active site pepared for the protein, then the interaction between a ligand (SDF file) and the protein active site (from the library) may be calculated using Meta's UMA MLIP.
- supply as arguments the name of the SDF file (without .sdf), the protein information (in the form ccr.[your protein]_data), the ASE calculator, ans the charge and spin multiplicty of the ligand.
- returns a list of XYZ strings for the ligands in the input SDF files.
- the XYZ strings may be visualized with the ccr.visualize_molecule tool, which accepts as its argument the XYZ string.
- the complex XYZ file can be transformed into a G16 counterpoise input file using complexG16, which takes as its arguments the complex XYZ file, the target object, the ligand charge and the ligand spin multiplicity.
- Test data: docking Rosuvastatin ("OC(=O)C[C@H](O)C[C@H](O)\C=C\c1c(C(C)C)nc(N(C)S(=O)(=O)C)nc1c2ccc(F)cc2") should give a score of -8.1. passing that SDF into the uma_interaction function with optimzation on should give an energy of -285 kcal/mol. Making a G16 file and running that as is (wB87XD/def2-tzvpp) should give a CP corrected interaction of -275 kcal/mol; a difference of only 3.5%.

In [None]:
total_xyz = ccr.uma_interaction("trial_1", ccr.DRD2_data, calculator, 0, 1, False)

The size of the complex is: 245
Energy of complex is: -6116.220 ha
The size of the ligand is: 29
Energy of ligand is: -766.377 ha
The size of the active site is: 216
Energy of active site is: -5349.794 ha
Energy difference is: -30.501 kcal/mol


In [None]:
total_xyz = ccr.uma_interaction("trial_1", ccr.DRD2_data, calculator, 0, 1, True)

The size of the complex is: 245
      Step     Time          Energy          fmax
BFGS:    0 15:45:15  -166430.928651        4.866301
BFGS:    1 15:45:15  -166432.582790        6.651774
BFGS:    2 15:45:16  -166434.612979        1.908395
BFGS:    3 15:45:16  -166435.247077        1.086299
BFGS:    4 15:45:17  -166436.100641        1.691052
BFGS:    5 15:45:17  -166436.586570        1.570512
BFGS:    6 15:45:18  -166436.911283        1.904616
BFGS:    7 15:45:18  -166437.335115        2.254813
BFGS:    8 15:45:19  -166438.167941        3.363757
BFGS:    9 15:45:19  -166438.890375        3.532845
BFGS:   10 15:45:20  -166440.174909        8.536960
BFGS:   11 15:45:20  -166441.228078        2.831573
BFGS:   12 15:45:20  -166441.778791        3.241074
BFGS:   13 15:45:21  -166442.140051        1.445429
BFGS:   14 15:45:21  -166442.305537        0.773789
BFGS:   15 15:45:22  -166442.413109        0.773316
BFGS:   16 15:45:22  -166442.463093        0.939834
BFGS:   17 15:45:24  -166442.51974

In [None]:
ccr.complexG16("optimized_complex.xyz",ccr.DRD2_data,0,1)

In [None]:
ccr.visualize_molecule(total_xyz[1][0])

In [None]:
f = open("/content/total_complex.xyz","r")
structure = f.read()
f.close()

ccr.visualize_molecule(structure)

# Solvation

## Create a new molecule from SMILES and solvate
- Creates the new atoms object
- creates a solvation instance and then solvates the atoms object
- saves atoms to an XYZ file and visualizes
- optimizes solvated molecule and visualizes again

In [None]:
atoms = ccr.smiles_to_atoms("c1cc(F)ccc1")

In [None]:
ccr.atoms_to_xyz(atoms,"test")

In [None]:
solvate = ccr.solvation("test.xyz",2)

add_waters class initialized


In [None]:
solvated = solvate.add_waters(20)

Maximum dimensions after augmentation are:
x - Max: 5.747877792534911, Min: -3.658880771899452
y - Max: 6.049240354686117, Min: -3.878139181087324
z - Max: 3.605391383716146, Min: -3.451909003884898
Volume is 659.0422031891669 A^3
Added 14/20 waters.


In [None]:
ccr.visualize_molecule(solvated)

In [None]:
f = open("test_solvated.xyz","w")
f.write(solvated)
f.close()

In [None]:
atoms2 = ccr.XYZ_to_atoms("test_solvated.xyz")

In [None]:
energy = ccr.opt_energy(atoms2[0],calculator)

Initial energy: -1400.640732 ha
      Step     Time          Energy          fmax
BFGS:    0 15:48:13   -38113.398945       13.268503
BFGS:    1 15:48:13   -38123.124885        9.154819
BFGS:    2 15:48:13   -38128.399667        6.077850
BFGS:    3 15:48:14   -38131.936663        3.101922
BFGS:    4 15:48:14   -38133.472583        4.246012
BFGS:    5 15:48:14   -38134.874689        1.945612
BFGS:    6 15:48:14   -38136.354073        2.073375
BFGS:    7 15:48:14   -38137.098312        1.943051
BFGS:    8 15:48:14   -38137.702662        1.251167
BFGS:    9 15:48:14   -38138.149193        1.486121
BFGS:   10 15:48:15   -38138.481368        1.434557
BFGS:   11 15:48:15   -38138.759452        1.351181
BFGS:   12 15:48:15   -38138.972488        1.280200
BFGS:   13 15:48:15   -38139.189689        1.161082
BFGS:   14 15:48:15   -38139.382103        0.980640
BFGS:   15 15:48:15   -38139.541806        0.899366
BFGS:   16 15:48:16   -38139.643345        0.729426
BFGS:   17 15:48:16   -38139.76365

In [None]:
ccr.atoms_to_xyz(atoms2[0],"test_solv_opt")

In [None]:
f = open("test_solv_opt.xyz","r")
solv_text = f.read()
f.close()

ccr.visualize_molecule(solv_text)

## Dock a molecule, then solvate the ligand
- Docks molecule
- solvates ligand and visualizes
- optmizes solvated ligand and visualizes again

In [None]:
score = ccr.dock_smiles("c1cc(F)ccc1", "MAOB", cpuCount)

Docking molecule.
SDF file written for score -5.1


In [None]:
xyz_string = ccr.sdf_to_xyz("trial_1.sdf", "new.xyz")

In [None]:
solvate = ccr.solvation("new.xyz",2)

add_waters class initialized


In [None]:
new_string = solvate.add_waters(20)

Maximum dimensions after augmentation are:
x - Max: 51.66445280344513, Min: 43.82454562980532
y - Max: 165.85770858406514, Min: 154.28602574341997
z - Max: 37.36743850242533, Min: 25.875391636810598
Volume is 1042.5690564419203 A^3
Added 14/20 waters.


In [None]:
ccr.visualize_molecule(new_string)

In [None]:
f = open("mol_to_opt.xyz","w")
f.write(new_string)
f.close()

atoms3 = ccr.XYZ_to_atoms("mol_to_opt.xyz")
energy = ccr.opt_energy(atoms3[0],calculator)

Initial energy: -1401.475109 ha
      Step     Time          Energy          fmax
BFGS:    0 15:49:45   -38136.103530        5.542143
BFGS:    1 15:49:45   -38136.849682        3.265410
BFGS:    2 15:49:45   -38137.608087        1.941174
BFGS:    3 15:49:45   -38138.081427        1.360634
BFGS:    4 15:49:45   -38138.315323        0.820210
BFGS:    5 15:49:45   -38138.514705        1.143375
BFGS:    6 15:49:45   -38138.754487        1.174291
BFGS:    7 15:49:46   -38138.962992        0.922679
BFGS:    8 15:49:46   -38139.124187        0.980227
BFGS:    9 15:49:46   -38139.256009        1.187606
BFGS:   10 15:49:46   -38138.956460        3.957392
BFGS:   11 15:49:46   -38139.402984        0.657055
BFGS:   12 15:49:46   -38139.478624        0.483265
BFGS:   13 15:49:46   -38139.523238        2.322676
BFGS:   14 15:49:47   -38139.662148        0.443250
BFGS:   15 15:49:47   -38139.705490        0.381057
BFGS:   16 15:49:47   -38139.744717        1.654108
BFGS:   17 15:49:47   -38139.82712

In [None]:
ccr.atoms_to_xyz(atoms3[0],"mol_opt")
f = open("mol_opt.xyz","r")
solv_text = f.read()
f.close()

ccr.visualize_molecule(solv_text)

# Calculate ligand strain energy
- Calculates the difference between the bound ligand structure and the relaxed ligand structure (in gas phase).
- strain = bound - relaxed

In [15]:
strain = ccr.ligand_relaxation("/content/trial_1","DRD2",calculator, 0, 1)

      Step     Time          Energy          fmax
BFGS:    0 10:44:04   -20854.201998        2.637528
BFGS:    1 10:44:04   -20854.471429        1.125575
BFGS:    2 10:44:04   -20854.637658        1.152140
BFGS:    3 10:44:04   -20854.785260        1.087608
BFGS:    4 10:44:04   -20854.841045        0.557638
BFGS:    5 10:44:04   -20854.887277        0.417100
BFGS:    6 10:44:04   -20854.906212        0.298064
BFGS:    7 10:44:05   -20854.913521        0.209430
BFGS:    8 10:44:05   -20854.918678        0.151998
BFGS:    9 10:44:05   -20854.922812        0.147228
BFGS:   10 10:44:05   -20854.925831        0.107107
BFGS:   11 10:44:05   -20854.927489        0.072066
BFGS:   12 10:44:05   -20854.928734        0.076378
BFGS:   13 10:44:05   -20854.930015        0.100502
BFGS:   14 10:44:06   -20854.931247        0.089114
BFGS:   15 10:44:06   -20854.932100        0.056821
BFGS:   16 10:44:06   -20854.932615        0.041458
Strain energy is: 16.848 kcal/mol


## Loop over a set of files and calculate all strain energies

In [16]:
strains = []
for i in range(1):
  strain = ccr.ligand_relaxation(f"/content/trial_{i+1}","DRD2",calculator, 0, 1)
  strains.append(strain)

print("Strain energies:")
for strain in strains:
  print(strain)

      Step     Time          Energy          fmax
BFGS:    0 10:46:22   -20854.201998        2.637528
BFGS:    1 10:46:22   -20854.471429        1.125575
BFGS:    2 10:46:22   -20854.637658        1.152140
BFGS:    3 10:46:22   -20854.785260        1.087605
BFGS:    4 10:46:22   -20854.841045        0.557604
BFGS:    5 10:46:22   -20854.887277        0.417118
BFGS:    6 10:46:23   -20854.906211        0.298071
BFGS:    7 10:46:23   -20854.913522        0.209422
BFGS:    8 10:46:23   -20854.918677        0.151998
BFGS:    9 10:46:23   -20854.922812        0.147243
BFGS:   10 10:46:23   -20854.925830        0.107080
BFGS:   11 10:46:23   -20854.927490        0.072175
BFGS:   12 10:46:23   -20854.928734        0.076367
BFGS:   13 10:46:24   -20854.930014        0.100584
BFGS:   14 10:46:24   -20854.931248        0.089139
BFGS:   15 10:46:24   -20854.932100        0.056782
BFGS:   16 10:46:24   -20854.932615        0.041476
Strain energy is: 16.848 kcal/mol
Strain energies:
16.848279650550

# Generate a constraints list

In [None]:
f = open("/content/HMGCR_dude_QM_site.pdb","r")
lines = f.readlines()
f.close()

constraints = []
for line in lines:
  parts = line.split()
  if len(parts) > 1 and parts[2] == "CA":
    constraints.append(int(parts[1])-1)

print(constraints)

[1, 11, 16, 24, 33, 41, 54, 60, 72, 83, 92, 98, 107, 124, 132, 140, 148, 159, 168, 181]


In [None]:
print(len(constraints))

20
