IF-SitePred is a method for predicting ligand-binding sites on protein structures. It first generates an embedding for each residue of the protein using the ESM-IF1 (inverse folding) model, then performs point cloud clustering to identify binding site centres.
Follow the following steps to prepare two virtual environments for IF-SitePred:
git clone https://github.com/annacarbery/binding-sites
- Create and activate virtual environment
conda create -n esm_env
conda activate esm_env
- Install esm
pip install fair-esm
- Install torch (system-dependent, see instructions in torch documentation)
- Install remaining dependencies
pip install scipy
pip install torch-geometric
pip install torch-scatter
pip install biotite
pip install lightgbm
pip install scikit-learn
- Create and activate virutal environment with pymol installed
conda create -n pymol_env -c conda-forge -c schrodinger pymol-bundle -y
conda install -c conda-forge scikit-learn
conda activate pymol_env
- Place PDB file of target of interest in 'input' directory
- Activate environment with ESM installed
conda activate esm_env
- Run residue prediction script:
python src/predict_residues.py -t <target_name>
- The residues predicted to be binding are saved in the 'predictions' directory
- Activate environment with PyMOL installed
conda activate pymol_env
- Run centre prediction script:
python src/predict_centres.py -t <target_name>
- The three top-ranked sites and their centres will be saved in the 'predictions' directory
Please report issues at https://github.com/oxpig/binding-sites
Carbery, A., Buttenschoen, M., Skyner, R. et al. Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures. J Cheminform 16, 32 (2024). https://doi.org/10.1186/s13321-024-00821-4