Skip to content

Sliding window dN/dS vs. functional protein domain comparison tool. Given VectorBase gene id inputs it will aggregate various useful Bioinformatics information available, then allow the user to compare selective pressures acting along the protein sequence vs. an overlay of functional protein domain annotations.

Notifications You must be signed in to change notification settings

a1ultima/hpcleap_dnds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VectorBase gene aggregator and dN/dS analysis tool

What?

Web service (vg-genes.html) to do the following for available VectorBase (VB) gene IDs:

  1. dN/dS analysis: a.k.a. Ka/Ks analysis, reveals selective pressures along a protein sequence belonging to the user-specified VB gene id query (e.g. AGAP010815), using an orthologous VB gene id's protein as a reference (e.g. AAEL001802). Selective pressure is measured in dN/dS, as calculated using the pairwise Nei-Gojobori algorithm, which we modified in a way that allows for Gaussian-smoothed dN/dS sliding window output (see image below). Responsiveness was achieved by pickling pre-computed input statistics for the dN/dS outputs, for all possible codon, and amino acid pairs: ~900 x speedup achieved. Also accuracy benchmarks vs. MATLAB here. If any of this confuses you, please see here for detailed explanations, some hints at how to use scripts separately, links to publications, and some general tips for best practices for dN/dS analysis. Or here for mathematical notation. Relevant code: dnds.py, changes.py, start_server.py (contact: Andy).

  2. Functional Domains Overlay: Show functional protein domain annotations along the Query sequence, available from VB (see image below). Responsiveness was achieved using REST API calls to VB. Relevant code: vg-genes.html (contact: Wenping).

  3. Aggregate data: Bioinformatics data available for the user-specified VB gene id query (e.g. AGAP010815): General info, Transcript sequences (DNA) and Protein sequences (AA). Responsiveness was achived using REST API calls to VB. Relevant code: vg-genes.html (contact: John, Bob, Wenping, Andy).

  4. Codon alignment (coming soon): Codon alignments of Query and Reference sequences were necessary as input for dN/dS computations, but alignments are not yet shown in their own panel in the web app. Alignments were not optimised for responsiveness, this is currently a performance bottleneck, to be fixed soon. Relevant code: align.py, start_server.py (contact: Andy).

Why?

Combining the two types of information (1. dN/dS analysis, 2. functional domains annotations) could aid in exploring hypotheses concerning ancestral evolutionary selective pressures acting on a protein and it's functional domains. One example insight that can be drawn from the combination of information: 1. and 2., is as follows: if we observe that the majority of functional domains annotated onto a protein sequence, overlap well with dN/dS values ~ 1, it is likely that the protein as a whole is in the process of becoming a pseudogene, and so we can conclude that the corresponding functional domains are for some reason no longer essential to that particular species' survival; thus elucidating evolutionary history of the species-in-question's ancestors. Alternatively, if we measure dN/dS

Example

  • Query id: AGAP010815
  • Reference id: AAEL001802

alt-text

REQUIREMENTS (tested on):

  • Python 2.7.3
  • Biopython 1.66
  • Firefox 50.1.0
  • Bash

USAGE (Bash terminal):

1. Start the Python server (this is just so the web page can talk with Python):

  • In its own terminal: bash ./run_py_server.sh

2. Open the web-page:

  • In its own terminal: xdg-open ./vb-genes.html

3. Aggregate sequence data from VectorBase:

  • Enter a valid VectorBase gene id into the text field (Query)
  • Click "Go!"

4. Retrieve protein functional domains:

  • Click the "Protein domain annotation (from VectorBase)" panel

5. Compute sliding window dN/dS analysis curve (this is what we need the Python server for):

  • Click the "Sliding Window dN/dS analysis" panel
  • Enter a valid VectorBase gene id into the text field (Reference), must be an orthologue to the Query (see: Nei-Gojobori for help)
  • Click "Calculate dN/dS!"

About

Sliding window dN/dS vs. functional protein domain comparison tool. Given VectorBase gene id inputs it will aggregate various useful Bioinformatics information available, then allow the user to compare selective pressures acting along the protein sequence vs. an overlay of functional protein domain annotations.

Resources

Stars

Watchers

Forks

Packages

No packages published