#CafChem tools for exploring a binding site with fragments; energy evaluation with the UMA MLIP

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MauricioCafiero/CafChem/blob/main/notebooks/FragGrow_CafChem.ipynb)

## This notebook allows you to:
- Load a binding site and a set of chemical fragments.
- Randomly distribute the fragments into the binding site, rejecting poses with repulsive interaction energies.
- View all poses, view best poses. View combined poses from all fragments at once.
- Calculate the average pose for all fragments; view them individually or all together.

## Requirements:
- This notebook will install deepchem, Fairchem and py3Dmol
- It will pull the CafChem tools from Github.
- It will install all needed libraries.
- You need to have a HF_Token set as a secret to access the UMA MLIP.

## Install and set-up

### Install libraries

In [1]:
! pip install py3Dmol
! pip install fairchem-core
!pip install deepchem

Collecting py3Dmol
  Downloading py3dmol-2.5.2-py2.py3-none-any.whl.metadata (2.1 kB)
Downloading py3dmol-2.5.2-py2.py3-none-any.whl (7.2 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.5.2
Collecting fairchem-core
  Downloading fairchem_core-2.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting ase-db-backends>=0.10.0 (from fairchem-core)
  Downloading ase_db_backends-0.10.0-py3-none-any.whl.metadata (600 bytes)
Collecting ase>=3.25.0 (from fairchem-core)
  Downloading ase-3.25.0-py3-none-any.whl.metadata (4.2 kB)
Collecting clusterscope (from fairchem-core)
  Downloading clusterscope-0.0.10-py3-none-any.whl.metadata (3.1 kB)
Collecting e3nn>=0.5 (from fairchem-core)
  Downloading e3nn-0.5.6-py3-none-any.whl.metadata (5.4 kB)
Collecting hydra-core (from fairchem-core)
  Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting lmdb (from fairchem-core)
  Downloading lmdb-1.7.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadat

### Load CafChem from Github and import libraries

In [25]:
!git clone https://github.com/MauricioCafiero/CafChem.git

Cloning into 'CafChem'...
remote: Enumerating objects: 590, done.[K
remote: Counting objects: 100% (167/167), done.[K
remote: Compressing objects: 100% (122/122), done.[K
remote: Total 590 (delta 123), reused 45 (delta 45), pack-reused 423 (from 1)[K
Receiving objects: 100% (590/590), 40.13 MiB | 15.65 MiB/s, done.
Resolving deltas: 100% (341/341), done.


In [1]:
import py3Dmol
import os
import torch
import numpy as np
from fairchem.core import FAIRChemCalculator, pretrained_mlip
import CafChem.CafChemFragGrow as ccfg

cpuCount = os.cpu_count()
print(cpuCount)

2


### Set-up Fairchem
- Must have HF token enabled

In [2]:
device = "cuda" if torch.cuda.is_available() else "cpu"

predictor = pretrained_mlip.get_predict_unit("uma-s-1", device=device)
calculator = FAIRChemCalculator(predictor, task_name="omol")
model = "UMA-OMOL"

## Set-up fragments and binding site
- define the fragments and fragment cutoffs (physical size of fragments plus an optional vdW distance). This will dictate how close a fragment can be to an atom in the protein.
- get a pre-loaded binding site, and calculate the physical dimensions. Optional parameter to enlarge or shrink the size of the active site.

In [4]:
frags = ccfg.define_fragments()

In [5]:
for frag in frags:
    cutoff = ccfg.get_fragment_cutoff(frag, 0.0)

OHH
cutoff is 0.7839758934366993
FH
cutoff is 0.44
CHHCHHCHH
cutoff is 1.5668431822419242
CHCH
cutoff is 1.6705999999999999
HHHCOH
cutoff is 1.4160551209684602
CCCCCCHHHHHH
cutoff is 2.494682117530007


In [7]:
all_molecules, atom_symbols = ccfg.get_binding_site_xyz(ccfg.DRD2_data["file_location"])

There are 1 molecules with size: 216
for 2, 218


In [8]:
max_values, min_values = ccfg.get_binding_site_dims(all_molecules, -2.5)

Maximum dimensions after augmentation are:
x - Max:     17.569, Min:      5.390
y - Max:     12.493, Min:      0.854
z - Max:      2.368, Min:    -16.614
Volume is 2690.583895289619 A^3


## Grow Fragments
- Loops through and adds fragments.
- Number of attempts to add fragments is user-definable. This is a maximum, as poses are rejected based on sterics and interaction energies.
- Find the best poses for each fragment (most attractive energies).
- Save XYZ files for each pose for each fragment in the /frag_files folder.

In [9]:
new_molecules, ies = ccfg.grow_fragments(all_molecules, frags, atom_symbols, 50, calculator, ccfg.DRD2_data, max_values, min_values)

temp_files directory already exists
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
adding fragment: water
Added 14 water fragments
adding fragment: hydrogen fluoride
adding fragment: hydrogen fluoride
adding fragment: hydrogen fluoride
adding fragment: hydrogen fluoride
Added 4 hydrogen fluoride fragments
adding fragment: cyclopropyl
adding fragment: cyclopropyl
adding fragment: cyclopropyl
Added 3 cyclopropyl fragments
adding fragment: acetylene
adding fragment: acetylene
adding fragment: acetylene
adding fragment: acetylene
adding fragment: acetylene
adding fragment: acetylene
Added 6 acetylene fragments
adding fragment: methanol
adding fragment: methanol
adding fragment: methanol
adding fragment: methanol
adding fragment: methanol
Adde

In [10]:
best_pose_for_fragments = ccfg.get_best_poses(ies, frags)

best pose for water is: -3.514 at location: 6
best pose for hydrogen fluoride is: -0.929 at location: 0
best pose for cyclopropyl is: -3.437 at location: 0
best pose for acetylene is: -5.427 at location: 3
best pose for methanol is: -1.219 at location: 4
best pose for phenyl is: -2.515 at location: 0


In [12]:
ccfg.save_xyz_files(new_molecules, frags, ccfg.DRD2_data, atom_symbols)

frag_files directory already exists
14 files written for water.
4 files written for hydrogen fluoride.
3 files written for cyclopropyl.
6 files written for acetylene.
5 files written for methanol.
2 files written for phenyl.


## Viewing options
- View an individual pose for an individual fragment
- Combine the best poses for each fragment and display all at once.
- View all poses for a specific fragment at once (lowest energy pose in green, all others in pink).
- Calculate and view an average pose for a specific fragment.
- Calculate the average pose for all fragments and display them all at once.

In [18]:
ccfg.view_frag_pose(5, 1, frags, ccfg.DRD2_data)

In [19]:
combined_poses, combined_atoms = ccfg.combine_best_poses(frags, ccfg.DRD2_data, new_molecules, best_pose_for_fragments)

In [20]:
ccfg.view_combined_poses(ccfg.DRD2_data, frags)

In [27]:
ccfg.view_all_poses_for_frag(5, frags, ccfg.DRD2_data, new_molecules, best_pose_for_fragments)

In [28]:
average_coords = ccfg.average_pose_for_frag(5, frags, ccfg.DRD2_data, new_molecules, atom_symbols, display_flag = True)

In [29]:
average_poses, averagae_atoms = ccfg.combine_average_poses(frags, ccfg.DRD2_data, new_molecules, atom_symbols)