HydraProt: A New Deep Learning Tool for Fast and Accurate Prediction of Explicit Water Positions for Protein Structures.
Figure: Schematic representation of the deep Hydration of Protein (HydraProt) pipeline. The process starts by transforming the protein's coordinates into a 3D grid array, which is then divided into submaps. These submaps are processed by a 3D U-net for water molecule sampling. The resulting hydration submaps are reconstructed into a full grid and converted back to 3D coordinates. This procedure is reiterated twice, each time adding the newly produced 3D coordinates to the previous set. The candidate water molecule points are subsequently embedded and evaluated via a Multilayer Perceptron (MLP). The final refinement step prunes and improves the placement of predicted water molecules.
HydraProt is a deep learning methodology for predicting explicit water positions in protein structures. It combines a 3D U-net and a Multi-Layer Perceptron (MLP) to accurately sample and evaluate water coordinates. The methodology has been validated using a high-resolution dataset and offers valuable insights for protein structure studies and drug discovery.
- Utilizes a 3D U-net architecture for accurate sampling of water coordinates in protein structures.
- Incorporates a Multi-Layer Perceptron (MLP) to evaluate water positions in relation to protein atoms.
- Rapid inference runtimes for fast predictions of explicit water positions.
- Supports PDB and CIF file formats.
1. Open the env_hydraprot.yml file and modify the last line to specify the desired installation location. For example, change prefix: /home/********/miniconda3/envs/hydraprot to prefix: /home/myplace/miniconda3/envs/hydraprot.
2. Create the HydraProt environment by running the following command in your terminal:
conda env create -f ./env_hydraprot.yml
3. Whenever you want to work on the project, activate the HydraProt environment by executing the following command in the terminal:
conda activate hydraprot
4. Download from zenodo https://doi.org/10.5281/zenodo.10517963, checkpoints of the models, the datasets used in this work, and the pdb of the presented results. Extract the files in the main directory of the project.
-
Open the params/prediction_params.py file.
-
Add your path to the directory with the PDB files you want to hydrate, line 9 config.pdb_path.
-
If you want to hydrate a subset of the PDBs within this directory, give the list of these files to config.pdb_list_path in line 11.
-
Select the cap for your prediction based on your preference, default is at 0.05 for high recall predictions, line 42 config.final_cap.
-
Finally, define the directory you want to save the results, line 48 config.results_dir.
-
Optional, if it is needed based on your machine specification you will need to adjust line 24 config.device, line 25 config.unet_batch_size, and line 40 config.mlp_batch_size. These parameters have been adjusted for a CUDA device with 4GB of RAM.
# Then execute
python predict.py
-
Open with Jupyter Lab the Jupyter Notebook
datasets/create_datasets/create_unet_dataset.ipynb
. -
Change the variables
h5_dir
,training_list
,pdb_path
, andvalidation_list
to your parameters. -
Run all cells and then update the paths to datasets at
params/unet_params.py
.
Check and change the parameters in params/unet_params.py
.
# Then execute
python train_unet.py
For the evaluation of checkpoints check and modify the parameters in params/unet_params.py
and evaluate_unet.py
, then:
# Execute the following command
python evaluate_unet.py
-
Open the Jupyter Notebook 'datasets/create_datasets/unet_prediction.ipynb' using Jupyter Lab.
-
Modify the variables
train_list
,validation_list
,pdb_path
,config
, checkpoint of 3D-Unet, andpdb_dir
based on your parameters. -
Run all cells.
-
Next, open the Jupyter Notebook 'datasets/create_datasets/create_mlp_dataset.ipynb'.
-
Modify the variables
train_list
,validation_list
,pdb_path
, andprediction_dir
according to your parameters. -
Run all cells and update the dataset paths in
params/mlp_params.py
.
Check and modify the parameters in params/mlp_params.py
.
# Execute the following command
python train_mlp.py
For the evaluation of checkpoints check and modify the parameters in params/mlp_params.py
and evaluate_mlp.py
, then:
# Execute the following command
python evaluate_mlp.py
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have suggestions for improvement, please create an issue on GitHub. We appreciate your contribution!
For queries and suggestions, please contact: andreas.zamanos@athenarc.gr
LINK TO PAPER PUBLICATION