## 1. Structure Prediction

### 1.1 Introduction
- Goal: predict structures of target enzymes for the basis of functional analyses.
- Tools:
  - ColabFold (Alphafold2-based): driven by deep learning, can be deployed locally or implemented online.
  - SwissModel: Homology modeling, relied on templates.

### 1.2 Target enzymes information
- **BfIMTD**: An exo-dextranase, the primary target enzyme in this study, possess a novel loop region outside the catalytic tunnel to be explored computationally.
- DexKQ: An endo-dextranase (GH49 family) used as the template enzyme for BfIMTD in homology modeling. DexKQ typically produces a mixture of isomaltooligosaccharides containing glucose. In contrast, BfIMTD shows exo-activity and specifically produces isomaltotriose (IG3), exihibiting a functional distinction between these two.
- **CcDex** and **MrDex**: Two uncharacterized dextranases were identified from databases. Both share high sequence similarity with BfIMTD and contain similar novel loops. These candidates were mined and analyzed in my master’s thesis.
- IPU (Isopullulanase): A related enzyme with structural similarity, included here as an additional reference for comparative analysis.

- fasta files are included in [ data/fasta ]

- <span style="color:blue">Notes: **loop** regions were recognized during master's thesis via structure alignments. More details are included in [docs].</span>  


Sequences Information Table

| Enzyme | Genebank | Organism | Length(amino acid) | Loop Region | 
|:----:|:----:|----:|----:|----:|
| BfIMTD |  BAA76382.1  |  Brevibacterium fuscum  |  641 | 392-402 |
| DexKQ |  MK118723.1/QDQ17819.1  |  Pseudarthrobacter oxydans | 640 | - |
| CcDex |  SDF96286  |  Cellulosimicrobium cellulans |  730 | 392-402 |
| MrDex |  WP_087138477  |  Mycetocola reblochoni |  719 | 381-491 |
| Isopullulanase (IPU) |  1WMR_A  |  Aspergillus niger  |  549 | - |

### 1.3 Practical workflow

### ColabFold

#### 1. Deployment attempts (notes):

At first I tried the web version of ColabFold, but it quickly went overloaded. Then I hit some errors with jaxlib version mismatch, so I thought maybe better to switch to local deployment.
I started with my Windows GPU (RTX 3060), tried several times to install ColabFold, but each time something wrong with environment or CUDA setup. After a few failed attempts, I decided to upgrade my system and install Ubuntu under WSL2. Managed to set up conda, install dependencies, even created the colabfold-env successfully… but still GPU not recognized, no matter how many times I reinstalled drivers or tried different CUDA versions. Got stuck in this loop for days, trying again and again, each time hoping the GPU would show up, but always ending with frustration.
In the end, I went back to the web version once more. This time luckily it worked and I finally got the predicted structures.


#### 2. Web server: 
Input the amino acid sequence, set 'num_relax = 1', keep all other options as default, and then Run all cells. All results predicted by colabfold are included in [ results/protein_prediction/colabfold ]


#### 3. Extract key results after structure prediction

- run "extract_pLDDT_colab.py" in [scripts] to extract **pLDDT** from colabfold results folders
- input result path, target name, loop range, and fasta file to proceed

In [None]:
python extract_pLDDT_colab.py

- run "extract_pae_colab.py" in [scripts] to extract **pae** from colabfold results folders
- input result path, target name, and loop range to proceed

In [None]:
python extract_pae_colab.py

#### <span style="color:green">4. Results summary</span>  

| Enzyme | Mean_pLDDT score | pLDDT ≥ 70 (%) |  pLDDT ≥ 90 (%)  | Loop Mean pLDDT | PAE_global_mean_A | PAE_loop_loop_mean_A | PAE_loop_core_mean_A | PAE_core_core_mean_A |
|:----:|:----:|----:|----:|----:|:----:|----:|----:|----:|
| BfIMTD | 93.3	| 92.8  | 91.7  | 91.98 | 6.15 | 1.58 | 6.75 | 6.13
| DexKQ | 92.17 | 92.0 | 91.1  | - | 6.78 |
| CcDex |  92.73 | 93.7 |  91.5 | 92.09 | 7.33 | 1.52 | 7.28 | 7.33 |
| MrDex |  92.49 | 93.7 |  89.0 | 89.25 | 7.44 | 1.96 | 8.35 | 7.41 |
| Isopullulanase (IPU) | 95.74 | 98.5 | 92.3 | - | 4.04 |

#### ➡️ Conclusions:
1. The mean pLDDT scores of all five relaxed models are >90, with >90% of residues above the 70 threshold. This indicates that the overall folds are highly reliable.

2. For three enzymes (BfIMTD, CcDex, MrDex), the loop regions also show high mean pLDDT (>85~90), suggesting that the internal conformations of these loops are stable.

3. However, the PAE values for loop_core interactions are consistently higher (6-8 Å), in contrast to loop_loop (1-2 Å). This indicates that while loops themselves are well-defined, their relative positioning to the protein core is less certain.

4. <span style="color:green">These results suggest that the loop regions are **flexible in relation to the core**. They might undergo restricted motions during enzymatic activity, potentially contributing to functional dynamics.</span>  

### Swiss Model

#### Web server: 
Input the amino acid sequence, and enter to build models.

All results are included in [results/SwissModel]


| Enzyme | Template_PDB | Seq_identity_% | Coverage_% | GMQE | QMEANDisCo Global | model |
|:----:|:----:|----:|----:|----:|----:|----:|
| BfIMTD |  6nzs.1.A  |  85.18  | 92 | 0.88 | 0.92 ± 0.05 | 02 |
| DexKQ |  6nzs.1.A  |  100.00  | 0.92 | 0.90 | 0.96 ± 0.05 | 01 |
| CcDex |  6nzs.1.A  |  68.27 |  0.54 | 0.49 | 0.85 ± 0.05 | 01
| MrDex |  6nzs.1.A	  |  64.32 |  0.26 | 0.22 | 0.81 ± 0.05| 02
| Isopullulanase (IPU) |  2z8g.1.A  |  100.00  |  0.99 | 0.99 |0.96 ± 0.05 | 01 |

➡️ 
SWISS-MODEL results show that BfIMTD, DexKQ, and IPU models are highly reliable (GMQE >0.85, QMEANDisCo ≥0.9, coverage >90%), while CcDex and MrDex models are of low confidence (GMQE <0.5, coverage <60%). 

Therefore, only BfIMTD, DexKQ, and IPU models were considered robust for comparison with ColabFold predictions, while CcDex and MrDex need to be treated with caution.

#### References
1. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15. PMID: 34265844; PMCID: PMC8371605.
2. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022 Jun;19(6):679-682. doi: 10.1038/s41592-022-01488-1. Epub 2022 May 30. PMID: 35637307; PMCID: PMC9184281.
3. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/plddt-understanding-local-confidence/

**Bold text**  
*Italic text*  
<u>Underlined text (HTML-based)</u>  
<mark>Highlighted text (HTML-based)</mark>  
<span style="color:blue">Blue colored text (HTML-based)</span>  

---

### More Examples:

***Bold and italic combined***  
~~Strikethrough (not supported natively, but can use HTML)~~  
<s>Strikethrough using HTML</s>  

[Link to github](https://www.github.com)  
![Sample Image]()

> This is a blockquote.

---

### Ordered List:
1. First item  
2. Second item  
3. Third item  

### Unordered List:
- Item A  
- Item B  
  - Sub-item B.1  
  - Sub-item B.2  

---

`Inline code`  

deploy necessary packages on windows :
GPU 驱动 + CUDA 11.7 ✅ 已安装

cuDNN 9.1.2.0 ✅ 下载完成（假设已复制到 CUDA 目录）

Conda 虚拟环境 ✅ deepfri-env 已建立并激活

PyTorch + torchvision + torchaudio ✅ 成功安装 GPU 版本 .whl 包

GPU 测试脚本 ✅ 输出正确，设备为 RTX 3060
