Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine colab notebook #52

Merged
merged 8 commits into from
Sep 22, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions notebooks/unifold.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,11 @@
"id": "jMGcXXPabEN4"
},
"source": [
"# Uni-Fold Colab\n",
"# Uni-Fold Notebook\n",
"\n",
"This Colab notebook provides an online runnable version of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/) for users to predict the structure of a protein, single chain or multimer, with custom settings.\n",
"This notebook provides protein structure prediction service of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/) as well as [UF-Symmetry](https://www.biorxiv.org/content/10.1101/2022.08.30.505833v1). Predictions of both protein monomers and multimers are supported. The homology search process in this notebook is enabled with the [MMSeqs2](https://github.com/soedinglab/MMseqs2.git) server provided by [ColabFold](https://github.com/sokrypton/ColabFold). For more consistent results with the original AlphaFold(-Multimer), please refer to the open-source repository of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/), or our convenient web server at [Hermite™](https://hermite.dp.tech/).\n",
"\n",
"Thanks to [MMSeqs2](https://github.com/soedinglab/MMseqs2.git) and the server provided by [ColabFold](https://github.com/sokrypton/ColabFold), the homogeneous searching in this notebook is very fast and is comparable with the original AlphaFold(-Multimer). If you want more consistent results with the original AlphaFold(-Multimer), you can use the [full open source Uni-Fold](https://github.com/dptech-corp/Uni-Fold/), or the convenient web server at [Hermite™](https://hermite.dp.tech/).\n",
"\n",
"Please note that this Colab notebook is not a finished product and is provided as an early-access prototype. It is provided for theoretical modeling only and caution should be exercised in its use. \n",
"Please note that this notebook is provided as an early-access prototype, and is NOT an official product of DP Technology. It is provided for theoretical modeling only and caution should be exercised in its use. \n",
"\n",
"**Licenses**\n",
"\n",
Expand All @@ -23,16 +21,15 @@
"\n",
"Please cite the following papers if you use this notebook:\n",
"\n",
"* Jumper et al. \"[Highly accurate protein structure prediction with AlphaFold.](https://doi.org/10.1038/s41586-021-03819-2)\" Nature (2021)\n",
"* Evans et al. \"[Protein complex prediction with AlphaFold-Multimer.](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1)\" biorxiv (2021)\n",
"* Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. \"[ColabFold: Making protein folding accessible to all.](https://www.nature.com/articles/s41592-022-01488-1)\" Nature Methods (2022) \n",
"* Ziyao Li, Xuyang Liu, Weijie Chen, Fan Shen, Hangrui Bi, Guolin Ke, Linfeng Zhang. \"[Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold.](https://www.biorxiv.org/content/10.1101/2022.08.04.502811v1)\" biorxiv (2022)\n",
"* Ziyao Li, Shuwen Yang, Xuyang Liu, Weijie Chen, Han Wen, Fan Shen, Guolin Ke, Linfeng Zhang. \"[Uni-Fold Symmetry: Harnessing Symmetry in Folding Large Protein Complexes.](https://www.biorxiv.org/content/10.1101/2022.08.30.505833v1)\" bioRxiv (2022)\n",
"\n",
"* Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. \"[ColabFold: Making protein folding accessible to all.](https://www.nature.com/articles/s41592-022-01488-1)\" Nature Methods (2022)\n",
"\n",
"**Acknowledgements**\n",
"\n",
"We thank [@sokrypton](https://twitter.com/sokrypton) for many helpful suggestions to this notebook.\n"
"The model architecture of Uni-Fold is largely based on [AlphaFold](https://doi.org/10.1038/s41586-021-03819-2) and [AlphaFold-Multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1). The design of this notebook refers directly to [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We specially thank [@sokrypton](https://twitter.com/sokrypton) for his helpful suggestions to this notebook.\n",
"\n",
"Copyright © 2022 DP Technology. All rights reserved."
]
},
{
Expand Down Expand Up @@ -127,7 +124,6 @@
"output_dir_base = \"./prediction\"\n",
"os.makedirs(output_dir_base, exist_ok=True)\n",
"\n",
"\n",
"def clean_and_validate_sequence(\n",
" input_sequence: str, min_length: int, max_length: int) -> str:\n",
" \"\"\"Checks that the input sequence is ok and returns a clean version of it.\"\"\"\n",
Expand Down Expand Up @@ -203,21 +199,25 @@
"def add_hash(x,y):\n",
" return x+\"_\"+hashlib.sha1(y.encode()).hexdigest()[:5]\n",
"\n",
"jobname = 'unifold_colab' #@param {type:\"string\"}\n",
"\n",
"sequence_1 = 'LILNLRGGAFVSNTQITMADKQKKFINEIQEGDLVRSYSITDETFQQNAVTSIVKHEADQLCQINFGKQHVVCTVNHRFYDPESKLWKSVCPHPGSGISFLKKYDYLLSEEGEKLQITEIKTFTTKQPVFIYHIQVENNHNFFANGVLAHAMQVSI' #@param {type:\"string\"}\n",
"sequence_2 = '' #@param {type:\"string\"}\n",
"sequence_3 = '' #@param {type:\"string\"}\n",
"sequence_4 = '' #@param {type:\"string\"}\n",
"\n",
"#@markdown Use symmetry group `C1` for default Uni-Fold predictions.\n",
"#@markdown Or, specify a **cyclic** symmetry group (e.g. `C4``) and\n",
"#@markdown the sequences of the asymmetric unit (i.e. **do not copy\n",
"#@markdown them multiple times**) to predict with UF-Symmetry.\n",
"\n",
"symmetry_group = 'C1' #@param {type:\"string\"}\n",
"\n",
"use_templates = True #@param {type:\"boolean\"}\n",
"msa_mode = \"MMseqs2\" #@param [\"MMseqs2\",\"single_sequence\"]\n",
"\n",
"input_sequences = [sequence_1, sequence_2, sequence_3, sequence_4]\n",
"\n",
"jobname = 'unifold_colab' #@param {type:\"string\"}\n",
"\n",
"basejobname = \"\".join(input_sequences)\n",
"basejobname = re.sub(r'\\W+', '', basejobname)\n",
"target_id = add_hash(jobname, basejobname)\n",
Expand Down Expand Up @@ -1046,7 +1046,7 @@
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3.8.10 ('ProteinMD')",
"display_name": "Python 3.8.10 64-bit",
"language": "python",
"name": "python3"
},
Expand All @@ -1056,7 +1056,7 @@
},
"vscode": {
"interpreter": {
"hash": "af92dc656850d97b5469b75c9ef2009aaa936e713f0093b069a7ff14eeb2ca8d"
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
}
}
},
Expand Down