## 1. deepFRI

### 1. Deployment on windows + RTX 3060
 

#### 1.1 Systems requirements:
- OS: windows 11 (x64)
- GPU: nvidia RTX 3060
- CUDA toolkit: version 11.7
- cuDNN: version 8.9.7
- Python: version 3.8
  
#### 1.2 CUDA and cuDNN installation
- download CUDA package from cuda website and add to system PATH
    - 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin'
- download cudnn package and extract the zip, then copy the contents of bin/, lib/, and include/ into
    - `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\'
      
#### 1.3 conda environment setup
- 'conda create -n deepfri-env python=3.8 -y'
    - 'conda activate deepfri-env'
- install PyTorch(GPU-enabled)
    - 'pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117'
      
#### 1.4 verify GPU
- type the below commands
- ✔️expected output:
    - 2.0.1+cu117
    - 11.7
    - True
    - NVIDIA GeForce RTX 3060

In [None]:
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0))

#### 1.5 install deepFRI packages

In [None]:
pip install numpy pandas biopython scikit-learn matplotlib networkx tensorflow==2.9.1

- Clone DeepFRI source code from its web https://github.com/flatironinstitute/DeepFRI
- Put it in working directory
- download its pretrained models: "trained_models.tar.gz" and extract it to the path

### 2. Running DeepFRI
#### 2.1 PDB files preparation
- All PDB files are from Colabfold prediction. PDB files included in [results/protein_prediction/colabfold]
- I used the **relaxed models** for the following molecular function prediction.
- eg: "BAA763821BfIMTD_989bc_relaxed_rank_001_alphafold2_ptm_model_1_seed_000.pdb"
  

#### 2.2 run deepfri via pdb module
- follow the commands of cOption 6: predicting functions of a protein from a directory with PDB files"</span> 

In [None]:
>> python predict.py --pdb_dir ./examples/pdb_files -ont mf --saliency --use_backprop

#### <mark>2.2.2 Debug 1:</mark>

In [None]:
python predict.py -pdb "D:\pro_pred\input\BAA763821BfIMTD_989bc_unrelaxed_rank_005_alphafol.pdb" -ont mf ec -v --saliency --use_guided_grads --output_fn_prefix "D:\pro_pred\output\BfIMTD"

In [None]:
''' tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-08-14 10:56:37.853453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3477 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
### Computing predictions on a single protein...
D:\miniconda3\envs\deepfri-env\lib\site-packages\Bio\SeqIO\PdbIO.py:322: BiopythonParserWarning: 'HEADER' line not found; can't determine PDB ID.
  warnings.warn(
Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path! '''

- This hint means the model began to load and already detect GPU
- Hearder not found is ok
- Key probelm: **Could not locate cudnn_ops_infer64_8.dll.**
  - solution: copy the .dll file to the right pathway (bin)
  - I just copied all ".dll" files to the bin folder and reopen the computer.
  - detect situation use this command:

In [None]:
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

- it showed:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')],
- which meaned the problem was solved.

#### 2.2.3 Rerun predict.py file to get <span style="color:blue">**saliency**</span> -related predictions.

- I copied all target pdb files in one folder and use it to predict them in the same batch.
- I need saliency results for following analysis.
- notes: the path included in the code were my previopus working directories, now all the results are moved to related folders shown in adjacent cells.

In [None]:
python predict.py --pdb_dir "/Volumes/APFS/protein_prediction/data/ColabFold/colabfold_pre_str" -ont mf ec --saliency --use_guided_grads -v --output_fn_prefix "/Volumes/APFS/protein_prediction/data/deepfri_viz/deepfri_output"

- The mf and ec prediction with saliency results are included in [results/deepfri/deepfri_output]
- Results are the following 6 files:
    - output_dpfri_EC_pred_scores.json
    - output_dptri_EC_predictions.csv
    - output_dpfri_EC_saliency_maps.json
    - output_dpfri_MF_pred_scores.json
    - output_dpfri_MF_predictions.csv
    - output_dpfri_MF_saliency_maps.json

#### Results summary:
##### 🧬 Residue Index Summary per Protein and GO Term

| **Protein ID**         | **GO Term**     | **GO Term Name**                                             | **Residue Indices (saliency > 0.8)**                       |
|------------------------|------------------|---------------------------------------------------------------|------------------------------------------------------------|
| BfIMTD_relaxed_r1      | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 304, 303, 305, 302, 99, 100, 59, 98                        |
|                        | GO:0005215       | transporter activity                                          | 59, 99, 100, 98, 58                                        |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 304, 303, 305, 302, 99, 100, 59                            |
|                        | GO:0022857       | transmembrane transporter activity                            | 59, 99, 100, 98, 58                                        |
|                        | GO:0030246       | carbohydrate binding                                          | 99, 100, 304, 98, 303, 59, 58, 302, 305, 134               |
| KQ640_relaxed_r1       | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 309, 310, 308, 307, 311, 332, 333                          |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 308, 309, 310, 307, 311, 332                               |
|                        | GO:0030246       | carbohydrate binding                                          | 310, 309, 308, 311, 106, 105                               |
| MrDex719_relaxed_r1    | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 294, 293, 292, 137, 138, 295, 291                          |
|                        | GO:0005215       | transporter activity                                          | 138, 137                                                   |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 292, 294, 293, 295, 291, 137, 138                          |
|                        | GO:0022857       | transmembrane transporter activity                            | 138, 137                                                   |
|                        | GO:0030246       | carbohydrate binding                                          | 137, 138                                                   |
| ccdex730_relaxed_r1    | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 145, 180, 146, 179, 147, 144                               |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 145, 180, 146, 179, 147, 144                               |
|                        | GO:0030246       | carbohydrate binding                                          | 147, 84                                                    |
| ipu549_relaxed_r1      | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 57, 58, 56                                                 |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 57, 58, 56                                                 |
|                        | GO:0030246       | carbohydrate binding                                          | 58, 57, 56                                                 |

### 3. saliency Visualization

#### 3.1 visualize saliency residues
- use viz_gradCAM.py to visualize residues with high saliency
- locate functional residues based on GO term saliency

In [None]:
python viz_gradCAM.py -i "D:\pro_pred\output\output_dpfri_go_term\output_dpfri_MF_saliency_maps.json" -p query_prot -go GO:0030246

#### <mark>3.1.1 Debug 2:</mark>

In [None]:
'''Traceback (most recent call last):
  File "viz_gradCAM.py", line 142, in <module>
    raise ValueError("Protein ID not in the list.")
ValueError: Protein ID not in the list.'''

- Key problem: it cannot recognize valid protein_id of .json due to my input query protein name, so it cannot find related saliency map.
    - solution: clean target protein query names using the folloiwing scripts:

In [None]:
import json
import os

input_file = r"D:\pro_pred\output\output_dpfri_go_term\output_dpfri_MF_saliency_maps.json"
output_file = r"D:\pro_pred\output\output_dpfri_go_term\output_dpfri_MF_saliency_maps_cleaned.json"

with open(input_file, 'r') as f:
    data = json.load(f)

new_data = {}
for full_path, value in data.items():
    short_id = os.path.basename(full_path)
    new_data[short_id] = value

with open(output_file, 'w') as f:
    json.dump(new_data, f, indent=2)

print(f"✅ Cleaned JSON saved to: {output_file}")

- run the script to clean target names 
- cleaned file was stored  as "output_dpfri_MF_saliency_maps_cleaned.json" also in [results/deepfri/deepfri_output]

#### 3.1.2 rerun visualization script using clean json

- To be convinient, I specified output pathway and modified orginal "viz_gradCAM.py" to "viz_gradCAM_output_path.py", which is inculded in [scripts]
- run the command below to output each Go term and each enzyme's mapping saliency visialization results.
- results are included in [results/deepfri/viz]

In [None]:
python viz_gradCAM_output_path.py -i "/Volumes/APFS/protein_prediction/protein_structure_loop_analysis/results/deepfri/deepfri_output/output_dpfri_MF_saliency_maps_cleaned.json" -p BfIMTD_relaxed_r1 -go GO:0004553 --out_dir "D:\pro_pred\output\viz"

#### Results summary:
##### 🧬 Residue Index Summary per Protein and GO Term

| **Protein ID**         | **GO Term**     | **GO Term Name**                                             | **Residue Indices (saliency > 0.8)**                       |
|------------------------|------------------|---------------------------------------------------------------|------------------------------------------------------------|
| BfIMTD_relaxed_r1      | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 304, 303, 305, 302, 99, 100, 59, 98                        |
|                        | GO:0005215       | transporter activity                                          | 59, 99, 100, 98, 58                                        |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 304, 303, 305, 302, 99, 100, 59                            |
|                        | GO:0022857       | transmembrane transporter activity                            | 59, 99, 100, 98, 58                                        |
|                        | GO:0030246       | carbohydrate binding                                          | 99, 100, 304, 98, 303, 59, 58, 302, 305, 134               |
| KQ640_relaxed_r1       | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 309, 310, 308, 307, 311, 332, 333                          |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 308, 309, 310, 307, 311, 332                               |
|                        | GO:0030246       | carbohydrate binding                                          | 310, 309, 308, 311, 106, 105                               |
| MrDex719_relaxed_r1    | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 294, 293, 292, 137, 138, 295, 291                          |
|                        | GO:0005215       | transporter activity                                          | 138, 137                                                   |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 292, 294, 293, 295, 291, 137, 138                          |
|                        | GO:0022857       | transmembrane transporter activity                            | 138, 137                                                   |
|                        | GO:0030246       | carbohydrate binding                                          | 137, 138                                                   |
| ccdex730_relaxed_r1    | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 145, 180, 146, 179, 147, 144                               |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 145, 180, 146, 179, 147, 144                               |
|                        | GO:0030246       | carbohydrate binding                                          | 147, 84                                                    |
| ipu549_relaxed_r1      | GO:0004553       | hydrolase activity, hydrolyzing O-glycosyl compounds          | 57, 58, 56                                                 |
|                        | GO:0016798       | hydrolase activity, acting on glycosyl bonds                  | 57, 58, 56                                                 |
|                        | GO:0030246       | carbohydrate binding                                          | 58, 57, 56                                                 |

### Summary and Conclusions:

### Saliency Map analysis
I visualized the saliency maps from deepfri predictioans for five enzymes structures across related go terms. Each saliency heatmap highlights residues most important for predicting a molecular function.
#### 1. BfIMTD
5 GO terms are predicted by deepfri:
   - ① GO:0004553 and GO:0016798 are responsible for hydrolase activity, whose residues are almost the same (resi 304, 303, 305, 302, 99, 100, 59, except for 98 of 0004533).
   - ② GO:0005215 and GO:0022857 are responsible for transporter activity, whose residues are exactly the same (resi 59, 99, 100, 98, 58).
   - ③ GO:0030246 is responsible for carbohydrate binding (resi 99, 100, 304, 98, 303, 59, 58, 302, 305, 134), which somehow are nearly the same as the residues of GO:0004553.
#### 2. DexKQ
3 GO terms are predicted by deepfri:
   - ① GO:0004553 and GO:0016798 are responsible for hydrolase activity, whose residues are almost the same (resi 304, 303, 305, 302, 99, 100, 59, except for 333 of 0004533).
   - ② GO:0030246 is responsible for carbohydrate binding (resi 310, 309, 308, 311, 106, 105), which somehow are nearly the same as the residues of last two GO terms.
#### 3. MrDex
5 GO terms are predicted:
   - ① GO:0004553 and GO:0016798 are responsible for hydrolase activity, whose residues are exactly the same (resi 294, 293, 292, 137, 138, 295, 291).
   - ② GO:0005215 and GO:0022857 are responsible for transporter activity, whose residues are exactly the same (resi 138, 137, these two are part of the last two GO terms).
   - ③ GO:0030246 is responsible for carbohydrate binding (resi 138, 137), which somehow are the same as the residues of GO:0005215 and GO:0022857.
#### 4. CcDex
3 GO terms are predicted by deepfri:
   - ① GO:0004553 and GO:0016798 are responsible for hydrolase activity, whose residues are exactly the same (resi 145, 180, 146, 179, 147, 144).
   - ② GO:0030246 is responsible for carbohydrate binding (resi 147, 84). resi 147 is part of hydrolase activity residues.
#### 5. CcDex
3 GO terms are predicted by deepfri:
   - ① GO:0004553 and GO:0016798 are responsible for hydrolase activity, whose residues are exactly the same (resi 57, 58, 56).
   - ② GO:0030246 is responsible for carbohydrate binding (resi 57, 58, 56), identical residues to hydrolase activity.
#### Common GO terms
- All predictive results from these 5 enzymes share hydrolase activity (GO:0004553 and GO:0016798) and carbohydrate binding (GO:0030246), while BfIMTD, MrDex possess transporter activity (GO:0005215 and GO:0022857).
- This exhibits sequence, functional similarities between these 5 enzymes, especially BfIMTD and MrDex, which proves my former selection from some aspects and also shows the structural relevance between these enzymes.
#### Abnormal situation


![Residues exihibition]