# Chroma Demo

In this demo we will conduct a simplified enzyme enhancement workflow of the IsPETase enzyme. The main goal is to improve the thermal stability of the enzyme while retaining the high PET degradation efficiency.

This will be done by retaining the PETase capabilities of the enzyme while de novo generating a backbone; both the docking site and the catalytic triad will be kept fixed. The catalytic triad is composed of Ser160, Asp206, and His237. The docking site is composed of two subsites; subsite I positions one monomer with a benzene ring from PET between Tyr 87 and Trp185 and subsite II is used to hold three more monomers, and exists of amino acids including Thr88, Ala89, Trp159, Ile232, Asn233, Ser236, Ser238, Asn241, Asn244, Ser245, Asn246, and Arg280.

## Installing Chroma

In [None]:
#Install condacolab to battle the dependency issues
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
#Optional cell to check
#When this cell fails, run it again, I have no idea why this error occurs but it does, and just by running it again it works
import condacolab
condacolab.check()

In [None]:
#Make the environment
!conda create --name chromademo python=3.9

In [None]:
#Shell cell to install chroma
%%shell
eval "$(conda shell.bash hook)"
conda activate chromademo
pip install git+https://github.com/generatebio/chroma.git
python --version

In [None]:
#General imports needed for chroma
import sys
import os
import contextlib
import torch
from pathlib import Path

#Demo of chroma set preferred encoding, so for safety sake
import locale
locale.getpreferredencoding = lambda: "UTF-8"

#Google collab can't find the conda environment, so added to path
sys.path.append("//usr//local//envs//chromademo//lib//python3.9//site-packages")

#Chroma imports as described in demo
from chroma import Chroma, Protein, conditioners
from chroma.models import graph_classifier, procap
from chroma.utility.api import register_key
from chroma.utility.chroma import letter_to_point_cloud, plane_split_protein

In [None]:
#Register the API key and select device
api_key = ""
register_key(api_key)

#T4 runtime uses nvidia
device = "cuda"

In [None]:
#Instantiate chroma
with contextlib.redirect_stdout(None):
    chroma = Chroma()

## Generating alternative IsPETase enzymes

In [None]:
output_dir_alt = Path("alt_proteins")
output_dir_alt.mkdir(exist_ok = True)

#Select IsPETase PDB ID
PDB_ID = "5XJH"

#Number of proteins to generate
NUM_PROTEINS = 100

#List to save all protein sequences
alternative_proteins = []

#Load protein
protein = Protein.from_PDBID(PDB_ID, device = device)

#Fix the catalytic site
selection_string = "not (resid 160 or resid 206 or resid 237 or resid 87 or resid 185 or resid 88 or resid 89 or resid 159 or resid 232 or resid 233 or resid 236 or resid 238 or resid 241 or resid 244 or resid 245 or resid 246 or resid 280)"

#SubstructureConditioner
substructure_conditioner = conditioners.SubstructureConditioner(
    protein=protein,
    backbone_model=chroma.backbone_network,
    selection=selection_string
).to(device)

#Generate alternative proteins
for i in range(NUM_PROTEINS):
    print(f"Generating protein {i + 1}/{NUM_PROTEINS}")
    alt_protein, trajectories = chroma.sample(
        protein_init=protein,
        conditioner=substructure_conditioner,
        langevin_factor=2.0,
        langevin_isothermal=True,
        inverse_temperature=8.0,
        sde_func="langevin",
        steps=200,
        full_output=True,
    )

    #Extract amino acid sequence, and add it to the list
    sequence = alt_protein.sequence()
    name = f"alt_protein_{i + 1}"
    alternative_proteins.append((name, sequence))

    #Save pdb structure to map
    alt_protein.to(f"alt_proteins/generated_protein_{i + 1}.pdb")
    print(f"\nAlternative protein {i + 1}\n : {alt_protein})")

print(f"All {NUM_PROTEINS} alternative proteins have been generated!")

### Chroma citation
@Article{Chroma2023,
  author  = {Ingraham, John B. and Baranov, Max and Costello, Zak and Barber, Karl W. and Wang, Wujie and Ismail, Ahmed and Frappier, Vincent and Lord, Dana M. and Ng-Thow-Hing, Christopher and Van Vlack, Erik R. and Tie, Shan and Xue, Vincent and Cowles, Sarah C. and Leung, Alan and Rodrigues, Jo\~{a}o V. and Morales-Perez, Claudio L. and Ayoub, Alex M. and Green, Robin and Puentes, Katherine and Oplinger, Frank and Panwar, Nishant V. and Obermeyer, Fritz and Root, Adam R. and Beam, Andrew L. and Poelwijk, Frank J. and Grigoryan, Gevorg},
  journal = {Nature},
  title   = {Illuminating protein space with a programmable generative model},
  year    = {2023},
  volume  = {},
  number  = {},
  pages   = {},
  doi     = {10.1038/s41586-023-06728-8}
}

## Calculating thermal stability
In order to test whether one of the generated alternative proteins are more thermal stable we use the [DeepSTABp](https://csb-deepstabp.bio.rptu.de/) predictor.

The online GUI of this algorithm takes as input the FASTA-file of all alternative proteins, the growth temperature, and the environment temperature prediction.

In [None]:
#Generate a combined fasta file of all generated alternative proteins
fasta_file = output_dir_alt / "alternative_proteins.fasta"

with open(fasta_file, "w") as f:
    for name, sequence in alternative_proteins:
        f.write(f">{name}\n{sequence}\n")

print(f"All {len(alternative_proteins)} alternative proteins have been saved to {fasta_file}")

In [None]:
import pandas as pd

tm = pd.read_csv("20251114_105022_DeepStabP.csv", header=None, names=["name", "Tm"])

tm.nlargest(10, "Tm")

In [None]:
!zip -r /content/alt_proteins.zip /content/alt_proteins

### DeepSTABp citation
Jung F, Frey K, Zimmer D, MÃ¼hlhaus T. DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. International Journal of Molecular Sciences. 2023; 24(8):7444. https://doi.org/10.3390/ijms24087444