Etienne JEAN

# Introduction
The *Imidazole Glycerol Phosphate Synthase* (IGPS) is a heterodimeric protein of the organism *Thermotoga maritima* composed of two subunits : a lyase enzyme named *hisF* and a transferase enzyme named *hisH*. IGPS is involved in the histidine biosynthesis pathway by catalyzing the conversion of PRFAR and glutamine to IGP, AICAR and glutamate. The HisF subunit catalyzes the cyclization activity that produces IGP and AICAR from PRFAR using the ammonia provided by the HisH subunit. It is a protein found in the cytoplasm of the cells.  

The goal of this study is to simulate a docking *in silico* between the two unbound subunits of IGPS, using both a rigid-body technique and an energy-based technique, and finally evaluate and discuss the quality of the docking results.


# Material & Methods

## Structures  
The structures of the unbound proteins and of the complex were taken from the *Protein Data Bank* (PDB).  
- Unbound hisF subunit (253 residues) : code 1THF, one chain D  https://www.rcsb.org/structure/1THF  
- Unbound hisH subunit (201 residues) : code 1K9V, one chain F  https://www.rcsb.org/structure/1K9V  
- Complex IGPS : code 1GPW, chains A and B  https://www.rcsb.org/structure/1GPW  

## Z-Dock 
From these structures, two docking were computed using two different methods. The first method used was a rigid-body docking, based on geometry complementarity of the two subunits and fast fourier transform (FFT) based search, and it was realised with the program Z-Dock 2.1.  

The docking was conducted as follows :  
1. **Preparation of the input files by only keeping  the "ATOM" lines of the pdb files :**  
```
cat 1THF.pdb | grep ATOM > 1THFatom.pdb
cat 1K9V.pdb | grep ATOM > 1K9Vatom.pdb
```
As there is only one chain in each file, there were no need to edit them anymore.  
<br>

2. **Preparation of Ligand and Receptor files.**  
From this point, hisH will be considered the receptor and hisF will be considered the ligand. Note that this choice is a little awkward, as hisF is a slightly bigger protein than hisH (253 residues and 201), but its only consequence was to increase the calculation time.  
Thus, the receptor and ligand files were created using the command `mark_sur`.
```
mark_sur 1K9Vatom.pdb 1GPW_r.pdb
mark_sur 1THFatom.pdb 1GPW_l.pdb
```
<br>

3. **Docking and PDB generation.**
Finally, the docking was computed using `zdock`.
```
zdock -R 1GPW_r.pdb -L 1GPW_l.pdb -o 1GPW_zdock.out
```
The output file contains 2000 docking poses, sorted from top to bottom according to their Z-Dock score. Only the first ten were kept, thus supposedly the best docking, in a modified output file, and the PDB outputs were then generated.
```
head -n 14 1GPW_zdock.out > 1GPW_zdock_10.out
create.pl 1_zdock_10.out
```
<br>


## PyDock
The second method used to dock hisH and hisF was an energy-based method, using PyDock 3.0.

1. **Initial setup**  
In order to generate the pdb files that pydock uses for docking, a `1gpw_pydock.ini` file was made with the following text :  
```  
[receptor]  
pdb = 1K9Vatom_scwrl.pdb  
mol = F  
newmol = F    
[ligand]  
pdb = 1THFatom.pdb  
mol = D  
newmol = D  
```
Note that the pdb file 1K9V missed some atoms, so the program `scwrl3` was used to fix this issue beforehand.
```
scwrl3 -i 1K9Vatom.pdb -o 1K9Vatom_scwrl.pdb
```
Finally, the agument `setup` was passed to pydock to create the receptor and ligand files.
```
pyDock3 1gpw_pydock setup
```
<br>

2. **Rigid-body docking as a base for PyDock**  
Rigid-body docking orientations were generated using the zdock algorithm as base positions for later energy calculations.   
```
pyDock3 1gpw_pydock zdock
```
The output of that command creates a file named `1gpw_pydock.zdock`.  
<br>

3. **Computation of translation and rotation matrix**  
The translation and rotation matrix need to be computed from the zdock output file.
```
pyDock3 1gpw_pydock rotzdock
```
The output of that command creates a file named `1gpw_pydock.rot`. For calculations time reasons, only 100 conformations were kept from this file (out of 2000) for the next step.  
<br>

4. **Computation of energy scores**  
Finally, the energy were computed to score and rank all docking poses, using the argument `dockser`.
```
pyDock3 1gpw_pydock dockser > dockser.log
```
The output of that command creates the final file named `1gpw_pydock.ene`.  
<br>


## PyMOL
The visualisation and analysis of the results was done with PyMOL. After loading both the reference structure of the complexed hisF and hisH (pdb 1GPW) and the results of the docking, the structure of the receptors were aligned together, and the Root Mean Square Deviation of the ligands (L-RMSD) were computed to evaluate the distance of the docking proposed by Zdock and FTdock to the reference docking pose.




# Results & Discussion

## Z-Dock
The results of the z-dock docking and their ligand-RMSD are listed in the table below.

Rank in Zdock scoring|Zdock score|Ligand-RMSD compared to the reference structure (in Angströms)
:------------:|:--------------:|:------------------------------------:
**1**   |**18.34**|**14.886**
**2**   |**18.32**|**14.660**
3   |17.60|21.302
4   |17.48|15.545
5   |17.36|15.963
6   |16.88|19.840
7   |16.64|20.133
8   |16.56|15.243
**9**   |**16.54**|**13.980**
10  |16.38|15.989

All L-RMSD range from 14 Angströms to 21 Angströms. The docking show 3 remarkable docking positions from the top 10 scores : Docking 1 and 2 have both a high z-dock score and a low L-RMSD. Docking 9 has a lower score but has the best L-RMSD compared to the reference complex, which still make it an interesting (maybe the most) docking pose. 


## PyDock
The PyDock energy calculations and rankings are listed in the table below.

Zdock rank|E.electrostatic|E.desolvatation|E.VDW|    E.Total  |    RANK
:-------:|:-----------:|:---------:|:----------:|:-----------:|:-------:
    97   |  -18.843    |  -3.831   |   44.166   |  -18.257    |       1
    83   |  -33.248    |  15.107   |   70.255   |  -11.116    |       2
    63   |  -20.052    |   3.819   |   70.430   |   -9.190    |       3
    14   |  -18.476    |   7.023   |   56.848   |   -5.768    |       4
     9   |  -23.479    |  15.693   |   26.901   |   -5.096    |       5
    90   |  -39.603    |  31.659   |   68.477   |   -1.096    |       6
    88   |  -24.285    |  20.566   |   41.517   |    0.432    |       7
     3   |  -45.283    |  22.002   |  240.869   |    0.806    |       8
    22   |  -31.099    |  22.158   |  129.200   |    3.979    |       9
    77   |  -38.364    |  14.441   |  282.226   |    4.299    |      10

Once again, only the 10 best results are shown. The total energies of the docking range from -18 kcal/mol to +4 kcal/mol. What is the most surprising is the complete absence of correlation between the Z-dock ranking and the Pydock ranking, since the best docking according to pydock is almost the lowest ranked of the 100 zdock subset that was taken (ranked 97/100 by pydock). This emphasizes the limitations of rigid-body docking methods. Those methods are rapid and simple to execute, but the resulting docking often lack precision compared to other methods.  
The docking ranked 9th by Zdock is here ranked 5th by pydock. Thus this docking might be considered one of the best among all the results, because both his L-RMSD and its total energy are low. 





# Conclusion
In this study, thanks to two different and complementary docking methods, a rigid-body docking and an energy-based ranking, a hundred docking poses of the lyase subunit hisF and the transferase subunit hisH from the Imidazole glycerol phosphate synthase were generated. One position in particliar was found in the top 10 of both methods, and showed both a low ligand-RMSD and a low docking energy.  
This study also allowed to compare both techniques and there quality in protein docking : the rigid-body docking allows to generate thousands of positionings quickly and simply, but its score function lacks a lot of precision and is often unrelated with what really happens *in vivo*. The energy-based method however allows to improve a lot the scoring function. Thus those two methods combined allow to return a set of dockings of pretty good precision.