#CafChem tools for co-folding proteins and ligands using Boltz2.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MauricioCafiero/CafChem/blob/main/notebooks/Boltz_CafChem.ipynb)

## This notebook allows you to:
- Input a protein sequence and a list of ligands to generate .yaml files for use in Boltz2.
- Run Boltz2 co-folding on the list of ligands in an automated fashion. Produces a list of IC50 values and binary binder/decoy scores.
- convert the .cif file to XYZ files for the complex, the protein and the ligand.
- visualize predicted data.

## Requirements:
- This notebook will install boltz2 and py3Dmol
- It will pull the CafChem tools from Github.
- It will install all needed libraries.
- small proteins can run on the L4 runtime, but larger ligands will need the memory provided by the A100 runtime.

### Install Boltz library
- will require an answer (y) and restart
- can take ~5 minutes

In [1]:
!pip uninstall torch torchvision

Found existing installation: torch 2.8.0+cu126
Uninstalling torch-2.8.0+cu126:
  Would remove:
    /usr/local/bin/torchfrtrace
    /usr/local/bin/torchrun
    /usr/local/lib/python3.12/dist-packages/functorch/*
    /usr/local/lib/python3.12/dist-packages/torch-2.8.0+cu126.dist-info/*
    /usr/local/lib/python3.12/dist-packages/torch/*
    /usr/local/lib/python3.12/dist-packages/torchgen/*
Proceed (Y/n)? Y
Y
  Successfully uninstalled torch-2.8.0+cu126
Found existing installation: torchvision 0.23.0+cu126
Uninstalling torchvision-0.23.0+cu126:
  Would remove:
    /usr/local/lib/python3.12/dist-packages/torchvision-0.23.0+cu126.dist-info/*
    /usr/local/lib/python3.12/dist-packages/torchvision.libs/libcudart.45e7f3ed.so.12
    /usr/local/lib/python3.12/dist-packages/torchvision.libs/libjpeg.bd6b9199.so.8
    /usr/local/lib/python3.12/dist-packages/torchvision.libs/libnvjpeg.e5f20359.so.12
    /usr/local/lib/python3.12/dist-packages/torchvision.libs/libpng16.0481ee11.so.16
    /usr/local

In [2]:
!pip install torch torchvision boltz[cuda] -Uq
! pip install -q py3Dmol

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.3/266.3 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.4/114.4 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.8/66.8 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.6/60.6 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m392.5/392.5 kB[0m [31m31.6 MB/s[0m eta [3

### Import libraries

In [3]:
!git clone https://github.com/MauricioCafiero/CafChem.git

Cloning into 'CafChem'...
remote: Enumerating objects: 1055, done.[K
remote: Counting objects: 100% (377/377), done.[K
remote: Compressing objects: 100% (120/120), done.[K
remote: Total 1055 (delta 340), reused 257 (delta 257), pack-reused 678 (from 1)[K
Receiving objects: 100% (1055/1055), 44.85 MiB | 39.63 MiB/s, done.
Resolving deltas: 100% (621/621), done.


In [4]:
import os
import py3Dmol
import CafChem.CafChemBoltz as ccb

## Co-fold ligands and proteins with Boltz2
- make_boltz_files tool accepts a list of SMILES, a protein sequence (single chain for now), the name of the protein, and an option list of ligand names.
  - returns .yaml files for each ligand with the protein, ready for analysis by Boltz.
- cofold tool accepts the names list returned by the previous tool and runs the analysis on each. returns a list of pIC50 values.

In [5]:
ccb.get_sequences()

Available proteins are:
MAOB
HMGCR
SULT1A3
ADRB1
ADRB2
DRD2

reference a protein sequence with: sequence_bank['YOUR_PROTEIN_NAME']


In [9]:
# sult1a3 example
mols = ["dop","ldop","para","napqi"]
smiles = ["NCCc1cc(O)c(O)cc1","C1=CC(=C(C=C1C[C@@H](C(=O)O)N)O)O","CC(=O)Nc1ccc(O)cc1","CC(=O)N=c1ccc(=O)cc1"]
seq = ccb.sequence_bank["SULT1A3"]

In [None]:
# hmgcr example
mols = ["rosuvastatin"]
smiles = ["OC(=O)C[C@H](O)C[C@H](O)\C=C\c1c(C(C)C)nc(N(C)S(=O)(=O)C)nc1c2ccc(F)cc2"]
seq = ccb.sequence_bank["HMGCR"]

In [6]:
# semaglutide and water
mols = ["semaglutide"]
smiles = ["O"]
seq = "MLEGTFTSDVSSYLEGQALKEAIAWLERLRG"

In [7]:
files = ccb.make_boltz_files(smiles,seq,"test",mols)

In [8]:
pic50s = ccb.cofold(files)

semaglutide done
pIC50 is: 7.7196283290386205
IC50 is: 1.9070921168757313e-08
Binder or Decoy: 0.17689204216003418


In [13]:
files = ccb.make_boltz_files(smiles,seq,"Novel_SGT",mols)

In [14]:
pIC50s = ccb.cofold(files)

semaglutide done
pIC50 is: 7.396324897527696
IC50 is: 4.014903418592768e-08
Binder or Decoy: 0.17538872361183167


## Get structure files
- retreive and download the CIF file mane by Boltz
- produce two XYZ files for the folded protein and the ligand (no Hs on either)

In [9]:
ccb.download_cif(files[0])

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [11]:
ccb.get_XYZ_files("semaglutide", "sgt")

In [12]:
f = open("/content/semaglutide_sgt_protein.xyz","r")
xyz = f.read()
f.close()

ccb.visualize_molecule(xyz)