# Talktorial 11

# Binding Site Detection (DogSiteScorer)

Abishek

## Aim of the talktorial

To get a protein structure and **detect its ligand binding** site at the example of **EGFR**.

Be more precise (can be done once you outlined the practical).

## Learning goals

### Theory
* Introduction to pocket detection
* Pocket detection approaches
    * DogSiteScorer
    * (Fpocket)    
* Pocket detection methods used in this talktorial   
* Validation
  * Ligand-based detection
  * KLIFS pocket definitions
  

### Practical

TBA

## References

* Volkamer et al.,2011 Noordwijkerhout Cheminformatics **Combining Global and Local Measures for Structure-Based Druggability Predictions**
 
* Volkamer et al.,2012 **DoGSiteScorer: a web server for automatic binding site prediction, analysis and druggability assessment**

* Le Guilloux et al., 2009 **Fpocket: An open source platform for ligand pocket detection**

* Protein–ligand complex (https://en.wikipedia.org/wiki/Protein%E2%80%93ligand_complex)

* Voronoi tessellation (https://en.wikipedia.org/wiki/Voronoi_diagram)

* KLIFS database (http://klifs.vu-compmedchem.nl/)

* **Fragments based drug Discovery** edited by Daniel A. Erlanson, Wolfgang Jahnke

* J. Kooistra et al., 2015 **KLIFS: a structural kinase-ligand interaction database**

* P. J. van Linden et al., 2013 Journal of medicinal Chemistry **KLIFS: A Knowledge-Based Structural Database To Navigate Kinase−Ligand Interaction Space**


## Theory

### Protein binding sites

The main objective of a pharmaceutical research is to find a new drug inorder to cure a specific disease. Structure-based approaches in drug design, develop drugs with respect to a target protein which is associated with the disease under investigation. However, in order to develop such a drug it is crucial to know where the drug is supposed to bind in the protein and therefore inhibit or activate the protein. Not always such a small molecule binding site is known and therefore it is important to have tools for binding site detection.

Pocket detection has a special mention in the process of drug prediction. Pockets are a **hollow 3 dimensional space** on the surface of a protein molecule, which serves as the region for **ligand**, **peptide**, or **protein** binding. There are various factors which determine the presence of pocket in 3 dimensional space of a protein:

* The shape of the binding region
* The possible interactions in a region such as
    * Van der Waals forces
    * Hydrogen bond interaction
    * Hydrophobic forces
    * Electrostatic interactions
    * π-π interactions
   
Sometimes ligand information is available (protein-ligand-complex), in that case usually the ligand-surrounding protein region is defined as pocket (e.g. using a 6 Å radius).

### Binding site detection approaches

When the presence and the details of the ligand is known, then the ligand radius can be used to detect the pocket. When it is absent, then the detection tools can be used in the process of pocket detection. This comprises of 2 standard approaches. They include,
* Geometry-based approach
* Energy-based approach 

<div style="width:500; font-size:90%; text-align:center;">
    <img src="images/img2.png" alt="alternate text"  width="500" style="padding:0.8em;"/>
    Figure 1: Energy-based and geometry-based binding site detection methods.
</div>

#### Geometry-based approaches

This method analyzes the shape of a molecular surface to locate cavities. It is based upon the 3D spatial arrangement of the atoms in space. They are broadly divided into two namely Grid-Based and Grid-free approaches.

* **Grid-based approaches**

    * POCKET and LIGSITE are the common algorithms for the grid based methods.
    * Grids with cartesian co-ordinates are used to embed a protein with spacing between 0.4 Å and 2.0 Å
    * *This is assigned based upon the Van der Waal's radii* surrounding an atom of a protein.  
   
* **Grid-free approaches**

    * SURFNET is a common algorithm for the grid-free methods.
    * In this method spheres are placed on the protein surface. 
    * In SURFNET, probe spheres are placed midway between the pairs of atoms on the protein. Incase a probe overlaps any nearby atom, its radius is reduced until no overlap occurs. The resulting probes defines the pocket and the cavities. 

#### Energy-based approaches

Interaction of a molecular fragment with protein is recorded. Favourable energitic responses are assigned to pockets. DrugSite and Docking are the well known energy based algorithms.
<div class="center" align="right" width="500"> </div>

* **Grid-based approaches**

    * DrugSite is a grid-based approach under the class of Energy based algorithms.
    * This method uses carbon probes placed on each grid point, and van der Wall's energies between the probes within a distance of 8 Å are calculated. Mean energy and Standard deviation are calculated for these grid points. 
    * Grid points with unfavourable energies are removed. Grid points fulfilling these cut-off are merged to pockets.
   
* **Grid-free approaches**

    * Docking serves as an important method in the Energy-based grid-free approach.
    * In this method, a scoring function is incorporated to evaluate the fragments that are placed at a position on the protein surface.
    * These pockets are assigned based on the quantity of fragments that bind to a specific area.

### DogSiteScorer 

Certain geometric and physico-chemical properties are calculated in an automatic manner for the predicted pockets and subpockets. In **DogSiteScorer** , pockets and subpockets are predicted with DoGSite using a Difference of **Gaussian filter**. DogSiteScorer falls under the category of the grid based approaches. Pocket volume and surface are calculated by counting the **grid points** constituting the pocket volume or its surface and multiplying this number with the grid box volume or surface, respectively. A **breadth-first search** is used for pocket depth computation, starting from the solvent exposed pocket parts toward the most deeply buried regions. 

#### Algorithm steps

* The chosen protein of interest is covered around with a grid of co-ordinates.
* Each of the **grid points** are **labelled**.
* Difference of **Gaussian filter** is applied across the grid.
* This operation helps to find the position on a protein surface where the location of a **sphere-like object** is favorable.
* The last step in this algorithm is the merging of neighbouring subpockets to the pockets.

#### Druggability score

* Druggability technique is based on Machine learning technique - **Subject Vector Machine (SVM)**.
* **Discriminative Analysis** - Best suited to seperate druggable from undruggable.
* The model has been trained and tested on both the redundant and non-redundant version of the druggable dataset
* This model showed a mean accuracy of 90%.
* The druggabilty score lies between 0 and 1. The higher the score the better the chance that the pocket is druggable.
    
<div style="width:400; font-size:90%; text-align:center;">
    <img src="images/img1.png" alt="alternate text"  width="400" style="padding:0.8em;"/>
    Figure 2:Grid point view 
</div>

### Validation methods

#### Ligand-based detection

Sometimes ligand information is available (protein-ligand-complex), in that case usually the ligand-surrounding protein region is defined as pocket (e.g. using a 6 Å radius).

If a bound ligand is known for a structure, it can be used for binding-site annotation.  


#### KLIFS pocket definitons

KLIFS stands for **Kinase-Ligand Interaction Fingerprints and structures database**. It is a structural repository of over 2900 human and mouse kinase enzymes along with their catalytic activity. It also deals with the kinase domains and also the inhibitors which can interact with them. Kinases share a similar conserved regions, which serves as a challenge for small molecules that can selectively bind to a specific kinase. 

KLIFS enables us to do systematic comparison and analysis of chemical features of all available kinases in the process of ligand binding. The classification helps us of an all-encompassing binding site of 85 residues. It is possible to compare the **interaction patterns** of **kinase-inhibitors** to each other to, for example, identify crucial interactions determining kinase-inhibitor selectivity. Therefore this could be an ideal method used for the validation of the obtained binding site. 

## Practical



In [1]:
def get_dogsite_pockets(pdb_code, analysis_detail, binding_site_prediction_granularity, ligand='', chain=''):
    pass

def select_dogsite_pocket():
    pass

In [2]:
def get_klifs_pocket(pdb_code, chain='', alternate_model=''):
    pass

In [3]:
def get_pocket_from_ligand(pdb_code, ligand_code, chain='', alternate_model='', radius=6.5):
    pass

In [4]:
def step_01(pdb_code, chain, alternate_model):
    '''
    Get pocket data.
    '''
    
    # Get pocket from DoGSiteScorer
    pockets_dogsite = get_dogsite_pockets(
        pdb_code, 
        analysis_detail, 
        binding_site_prediction_granularity, 
        ligand='', 
        chain=''
    )
    pocket_dogsite = select_dogsite_pocket(pockets_dogsite)
    
    # Get pocket from KLIFS
    pocket_klifs = get_klifs_pocket(
        pdb_code, 
        chain='', 
        alternate_model=''
    )
    
    # Get pocket based on ligand
    pocket_ligand = get_pocket_from_ligand(
        pdb_code,
        ligand_code, 
        chain='', 
        alternate_model='', 
        radius=6.5
    )
    
    return pocket_dogsite, pocket_klifs, pocket_ligand

In [5]:
def step_02(pocket_list):
    '''
    Compare different pocket results.
    '''
    
    pass

In [6]:
def step_03():
    '''
    Visualize different pocket results.
    # TODO:
    - NGLViewer
    '''
    
    pass

## Appendix

#### Introduction to the used methods
 
 There are 3 automatic methods for the pocket and the druggability predictions. They include,
 * Sitemap (2009)
 * Fpocket (2009/2010)
 * DogSiteScorer (2010)
 
 All 3 algorithms predict pockets solely based on the **atomic co-ordinates** of the protein.
 
 * **SiteMap** uses **van der Waals (VdW) energies** and a **buriedness value** calculated on a grid to predict pockets on the protein surface.
 * **Fpocket** investigates **α spheres** for active site predictions.
 * In **DoGsiteScorer** , pockets and subpockets are predicted with DoGSite using a Difference of **Gaussian filter**.

***The shape (and the molecular orientation) of the region

 * (London dispersion forces)
    * (Covalent interaction)
  
  align="right" width="260"

Ref

* D. Durrant et al., 2009 **POVME: An Algorithm for Measuring Binding-Pocket Volumes**


If we have a ligand: Use ligand radius

If we don't have a ligand: Detection tools

* Grid-based vs. grid-free

<div class="center" align="right" width="200">Figure 2: Energy based and geometry <br>based methods </div>

* The radii is scaled down to ensure that they don't clash with the surrounding atom of a protein.
    * Small virtual probes are used to cover and coat the protein surface.  Number of probes surrounding each atom is calculated. The spaces on the outer surfaces are filled and the potential binding site centers are selected.

* 
 
#### Fpocket 

Fpocket stands for "Finding Pocket". The pocket detection package based on Voronoi tessellation and alpha spheres. **Fpocket** investigates **α spheres** for active site predictions. Fpocket algorithm consists of three major steps.

* <i>Step 1:</i> The whole ensemble of alpha spheres is determined from the protein structure. Fpocket returns a   pre-filtered collection of spheres.
* <i>Step 2:</i> Identifying the clusters of spheres close together, to identify pockets, and to remove clusters of poor interest.
* <i>Step 3:</i> This finally calculates properties from the atoms of the pocket, in order to score each pocket.

<div style="width:400; font-size:90%; text-align:center;">
    <img src="images/img4.png" alt="alternate text"  width="400" style="padding:0.8em;"/>
    Figure 3: Voronoi tessellation based method
</div>

