# Uni-Fold Notebook

This notebook provides protein structure prediction service of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/) as well as [UF-Symmetry](https://www.biorxiv.org/content/10.1101/2022.08.30.505833v1). Predictions of both protein monomers and multimers are supported. The homology search process in this notebook is enabled with the [MMSeqs2](https://github.com/soedinglab/MMseqs2.git) server provided by [ColabFold](https://github.com/sokrypton/ColabFold). For more consistent results with the original AlphaFold(-Multimer), please refer to the open-source repository of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/), or our convenient web server at [Hermite™](https://hermite.dp.tech/).

Please note that this notebook is provided as an early-access prototype, and is NOT an official product of DP Technology. It is provided for theoretical modeling only and caution should be exercised in its use. 

**Licenses**

This Colab uses the [Uni-Fold model parameters](https://github.com/dptech-corp/Uni-Fold/#model-parameters-license) and its outputs are under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode. The Colab itself is provided under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).


**Citations**

Please cite the following papers if you use this notebook:

*   Ziyao Li, Xuyang Liu, Weijie Chen, Fan Shen, Hangrui Bi, Guolin Ke, Linfeng Zhang. "[Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold.](https://www.biorxiv.org/content/10.1101/2022.08.04.502811v1)" biorxiv (2022)
*   Ziyao Li, Shuwen Yang, Xuyang Liu, Weijie Chen, Han Wen, Fan Shen, Guolin Ke, Linfeng Zhang. "[Uni-Fold Symmetry: Harnessing Symmetry in Folding Large Protein Complexes.](https://www.biorxiv.org/content/10.1101/2022.08.30.505833v1)" bioRxiv (2022)
*   Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. "[ColabFold: Making protein folding accessible to all.](https://www.nature.com/articles/s41592-022-01488-1)" Nature Methods (2022)

**Acknowledgements**

The model architecture of Uni-Fold is largely based on [AlphaFold](https://doi.org/10.1038/s41586-021-03819-2) and [AlphaFold-Multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1). The design of this notebook refers directly to [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We specially thank [@sokrypton](https://twitter.com/sokrypton) for his helpful suggestions to this notebook.

Copyright © 2022 DP Technology. All rights reserved.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
from prtm import protein
from prtm.models.unifold.data.protein import to_pdb as uf_pdb
from prtm.models.unifold.modeling import UniFoldForFolding
from prtm.visual import view_superimposed_structures





[2024-02-15 04:03:32,990] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
PyRosetta-4 2023 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python310.Release 2023.47+release.5fe66cd241adb376f3a0af661ea0dcd77ea0dbbe 2023-11-21T10:47:25] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.


## Fold Monomer

In [3]:
monomer_sequence = (
    "LILNLRGGAFVSNTQITMADKQKKFINEIQEGDLVRSYSITDETFQQNAVTSIV"
    "KHEADQLCQINFGKQHVVCTVNHRFYDPESKLWKSVCPHPGSGISFLKKYDYLLS"
    "EEGEKLQITEIKTFTTKQPVFIYHIQVENNHNFFANGVLAHAMQVSI"
)
monomer_sequence_dict = {"A": monomer_sequence}

In [5]:
uf_folder = UniFoldForFolding(model_name="model_2_ft", use_templates=False, random_seed=0)
af_folder = UniFoldForFolding(
    model_name="model_1_af2", use_templates=False, random_seed=0
)

In [6]:
uf_monomer_structure, uf_aux_output = uf_folder(monomer_sequence_dict, max_recycling_iters=3, num_ensembles=2)
af_monomer_structure, af_aux_output = af_folder(monomer_sequence_dict, max_recycling_iters=3, num_ensembles=2)

Loaded result from cache.
Loaded result from cache.


In [7]:
view_superimposed_structures(uf_monomer_structure, af_monomer_structure, color1="green")

<py3Dmol.view at 0x7f717a6bbca0>

## Fold Multimer

In [8]:
complex_sequence_a = (
    "TTPLVHVASVEKGRSYEDFQKVYNAIALKLREDDEYDNYIGYGPVLVRLAWHTSGTW"
    "DKHDNTGGSYGGTYRFKKEFNDPSNAGLQNGFKFLEPIHKEFPWISSGDLFSLGGVTA"
    "VQEMQGPKIPWRCGRVDTPEDTTPDNGRLPDADKDADYVRTFFQRLNMNDREVVALMGAH"
    "ALGKTHLKNSGYEGPWGAANNVFTNEFYLNLLNEDWKLEKNDANNEQWDSKSGYMMLPTDY"
    "SLIQDPKYLSIVKEYANDQDKFFKDFSKAFEKLLENGITFPKDAPSPFIFKTLEEQGL"
)
complex_sequence_b = (
    "TEFKAGSAKKGATLFKTRCLQCHTVEKGGPHKVGPNLHGIFGRHSGQAEGYSYTDA"
    "NIKKNVLWDENNMSEYLTNPKKYIPGTKMAIGGLKKEKDRNDLITYLKKACE"
)
complex_sequence_dict = {"A": complex_sequence_a, "B": complex_sequence_b}

### UniFold Weights

In [9]:
uf_folder = UniFoldForFolding(model_name="multimer_ft", use_templates=False, random_seed=0)
af_folder = UniFoldForFolding(
    model_name="multimer_4_af2_v3", use_templates=False, random_seed=0
)

Downloading model from https://github.com/dptech-corp/Uni-Fold/releases/download/v2.0.0/unifold_params_2022-08-01.tar.gz...


100%|█████████████████████████████████████████████| 668M/668M [00:04<00:00, 170MB/s]


Model downloaded and cached in /home/ubuntu/.cache/torch/hub/checkpoints/unifold_multimer.unifold.pt.


Downloading: "https://huggingface.co/conradry/unifold-alphafold-weights/resolve/main/params_model_4_multimer_v3.pth" to /home/ubuntu/.cache/torch/hub/checkpoints/unifold_multimer_4_af2_v3.pth
100%|████████████████████████████████████████████| 357M/357M [00:06<00:00, 54.6MB/s]


In [10]:
uf_complex_structure, uf_aux_output = uf_folder(complex_sequence_dict, max_recycling_iters=3, num_ensembles=2)
af_complex_structure, af_aux_output = af_folder(complex_sequence_dict, max_recycling_iters=3, num_ensembles=2)

Loaded result from cache.
Loaded result from cache.
Loaded result from cache.
Loaded result from cache.


In [17]:
view_superimposed_structures(
    uf_complex_structure.get_chain("A"), af_complex_structure.get_chain("A"), color1="green"
)

<py3Dmol.view at 0x7f7172b73ee0>

In [18]:
view_superimposed_structures(
    uf_complex_structure.get_chain("B"), af_complex_structure.get_chain("B"), color1="green"
)

<py3Dmol.view at 0x7f715cc5bf40>

## Fold Symmetric

In [3]:
symmetric_sequence = (
    "PPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGD"
    "LTLYQSNTILRHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYV"
    "KALPGQLKPFETLLSQNQGGKTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAY"
    "VGRLSARPKLKAFLASPEYVNLPINGNGKQ"
)
symmetric_sequence_dict = {"A": symmetric_sequence}

In [5]:
sym_folder = UniFoldForFolding(
    model_name="uf_symmetry", use_templates=True, random_seed=0, symmetry_group="C2"
)

In [12]:
sym_structure, sym_aux_output = sym_folder(symmetric_sequence_dict, max_recycling_iters=3, num_ensembles=2)

Loaded result from cache.
Loaded result from cache.




aatype shape (418,)
atom_positions shape (418, 37, 3)
atom_mask shape (418, 37)
residue_index shape (418,)
chain_index shape (418,)
b_factors shape (418, 37)


In [13]:
sym_structure.show()

<py3Dmol.view at 0x7fa4f87afbe0>