#CafChem tools for co-folding proteins and ligands using Boltz2.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MauricioCafiero/CafChem/blob/main/notebooks/Boltz_CafChem.ipynb)

## This notebook allows you to:
- Input a protein sequence and a list of ligands to generate .yaml files for use in Boltz2.
- Run Boltz2 co-folding on the list of ligands in an automated fashion. Produces a list of IC50 values and binary binder/decoy scores.
- convert the .cif file to XYZ files for the complex, the protein and the ligand.
- visualize predicted data.

## Requirements:
- This notebook will install boltz2 and py3Dmol
- It will pull the CafChem tools from Github.
- It will install all needed libraries.
- small proteins can run on the L4 runtime, but larger ligands will need the memory provided by the A100 runtime.

### Install Boltz library
- will require an answer (y) and restart
- can take ~5 minutes

In [1]:
!pip uninstall torch torchvision

Found existing installation: torch 2.6.0+cu124
Uninstalling torch-2.6.0+cu124:
  Would remove:
    /usr/local/bin/torchfrtrace
    /usr/local/bin/torchrun
    /usr/local/lib/python3.11/dist-packages/functorch/*
    /usr/local/lib/python3.11/dist-packages/torch-2.6.0+cu124.dist-info/*
    /usr/local/lib/python3.11/dist-packages/torch/*
    /usr/local/lib/python3.11/dist-packages/torchgen/*
Proceed (Y/n)? y
y
  Successfully uninstalled torch-2.6.0+cu124
Found existing installation: torchvision 0.21.0+cu124
Uninstalling torchvision-0.21.0+cu124:
  Would remove:
    /usr/local/lib/python3.11/dist-packages/torchvision-0.21.0+cu124.dist-info/*
    /usr/local/lib/python3.11/dist-packages/torchvision.libs/libcudart.41118559.so.12
    /usr/local/lib/python3.11/dist-packages/torchvision.libs/libjpeg.1c1c4b09.so.8
    /usr/local/lib/python3.11/dist-packages/torchvision.libs/libnvjpeg.02b6d700.so.12
    /usr/local/lib/python3.11/dist-packages/torchvision.libs/libpng16.0364a1db.so.16
    /usr/local

In [2]:
! pip install torch torchvision torchaudio
! pip install py3Dmol
! pip install boltz -U

Collecting torch
  Downloading torch-2.7.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (29 kB)
Collecting torchvision
  Downloading torchvision-0.22.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (6.1 kB)
Collecting sympy>=1.13.3 (from torch)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.6.77 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.6.77 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.6.80 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.5.1.17 (from torch)
  Downloading nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.6.4.

### Import libraries

In [1]:
!git clone https://github.com/MauricioCafiero/CafChem.git

Cloning into 'CafChem'...
remote: Enumerating objects: 129, done.[K
remote: Counting objects: 100% (129/129), done.[K
remote: Compressing objects: 100% (125/125), done.[K
remote: Total 129 (delta 65), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (129/129), 1.71 MiB | 8.24 MiB/s, done.
Resolving deltas: 100% (65/65), done.


In [2]:
import os
import CafChem.CafChemBoltz as ccb

## Co-fold ligands and proteins with Boltz2
- make_boltz_files tool accepts a list of SMILES, a protein sequence (single chain for now), the name of the protein, and an option list of ligand names.
  - returns .yaml files for each ligand with the protein, ready for analysis by Boltz.
- cofold tool accepts the names list returned by the previous tool and runs the analysis on each. returns a list of pIC50 values.

In [3]:
ccb.get_sequences()

Available proteins are:
MAOB
HMGCR
SULT1A3
ADRB1
ADRB2

reference a protein sequence with: sequence_bank['YOUR_PROTEIN_NAME']


In [None]:
# sult1a3 example
mols = ["dop","ldop","para","napqi"]
smiles = ["NCCc1cc(O)c(O)cc1","C1=CC(=C(C=C1C[C@@H](C(=O)O)N)O)O","CC(=O)Nc1ccc(O)cc1","CC(=O)N=c1ccc(=O)cc1"]
seq = ccb.sequence_bank["SULT1A3"]

In [4]:
# hmgcr example
mols = ["rosuvastatin"]
smiles = ["OC(=O)C[C@H](O)C[C@H](O)\C=C\c1c(C(C)C)nc(N(C)S(=O)(=O)C)nc1c2ccc(F)cc2"]
seq = ccb.sequence_bank["HMGCR"]

In [5]:
files = ccb.make_boltz_files(smiles,seq,"HMGCR",mols)

In [6]:
pIC50s = ccb.cofold(files)

rosuvastatin done
pIC50 is: 7.41160203742981
IC50 is: 3.876126682671495e-08
Binder or Decoy: 0.34502899646759033


## Get structure files
- retreive and download the CIF file mane by Boltz
- produce two XYZ files for the folded protein and the ligand (no Hs on either)

In [8]:
ccb.download_cif(files[0])

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [9]:
ccb.get_XYZ_files("rosuvastatin", "HMGCR")

In [10]:
f = open("rosuvastatin_HMGCR_ligand.xyz","r")
xyz = f.read()
f.close()

ccb.visualize_molecule(xyz)