TrinotateWeb: Graphical Interface for Navigating Trinotate Annotations and Expression Analyses
TrinotateWeb provides a web-based graphical interface to support local user-based navigation of Trinotate annotations and differential expression data. Simply run the included light-weight webserver, open your web browser and point it to your Trinotate SQLite database, and begin your data exploration. Note, Trinotate is not yet a full-featured application, but is instead in a very early state of development. As an open source community-driven software development project, contributions are always welcome.
Trinotate makes extensive use of the CanvasXpress infrastructure, as developed and made publicly available by Isaac Neuhaus. "CanvasXpress was developed as the core visualization component for bioinformatics and systems biology analysis at Bristol-Myers Squibb". More about CanvasXpress.
The current capabilities of TrinotateWeb are showcased below (note, the interface is continually improved across software releases):
Populating Trinotate.sqlite with Expression Data
TrinotateWeb currently revolves around using the differential expression analysis pipeline supported by the Trinity suite, as described here, specifically first generating abundance estimates using the RSEM pipeline and then performing the DE analysis using Bioconductor.
After having run the above abundance estimation and DE analysis steps, you can begin loading the expression data like so:
# import the fpkm and DE analysis stuff $TRINOTATE_HOME/util/transcript_expression/import_expression_and_DE_results.pl \ --sqlite Trinotate.sqlite \ --samples_file samples_n_reads_described.txt \ --count_matrix Trinity_trans.counts.matrix \ --fpkm_matrix Trinity_trans.counts.matrix.TMM_normalized.FPKM \ --DE_dir edgeR_trans/ \ --transcript_mode
The above is based on using the transcript-level abundance estimates and corresponding DE analyses. Abundance estimation and DE analysis can be performed at the 'gene' level as well. Optionally, run the above using '--gene_mode' with the corresponding gene-based files. Both transcript-level and gene-level data can be loaded into the single Trinotate.sqlite instance for analysis.
Populate Trinotate.sqlite with Transcript Expression Clusters
The Trinity analysis framework describes methods for clustering transcripts based on related expression patterns here. After defining clusters, load these into the Trinotate.sqlite database like so:
# import transcript clusters $TRINOTATE_HOME/util/transcript_expression/import_transcript_clusters.pl \ --group_name edgeR_DE_analysis \ --analysis_name edgeR_trans/diffExpr.P0.001_C2.matrix.R.all.RData.clusters_fixed_P_20 \ --sqlite Trinotate.sqlite \ edgeR_trans/diffExpr.P0.001_C2.matrix.R.all.RData.clusters_fixed_P_20/*matrix
The step of defining expression clusters can be run multiple times using different parameters according to the Trinity documentation. Each set of clusters can be loaded in separately as above and then separately studied within TrinotateWeb.
Populate Trinotate.sqlite with Annotation Text
Any free-text annotation can be applied to your input transcripts, and these data can then be queried by text searches within TrinotateWeb. This was the simplest way to get text-searching up and running in a flexible way, and more sophisticated methods for querying the data will be included in future releases. If you've used Trinotate to generate annotations (ex. generated the Trinotate_report.xls tab-delimited summary data file), you can simply import that data table back into Trinotate as the textual annotation for the transcripts (Highly Recommended). Alternatively, if you have a file containing a tab delimited format of:
gene_id (tab) transcript_id (tab) annotation text
you can use that file instead. Just realize that the TrinotateWeb text-querying is currently soley based on this 'annotation text'. Load the annotations into the Trinotate.sqlite database like so:
# Load annotations $TRINOTATE_HOME/util/annotation_importer/import_transcript_names.pl Trinotate.sqlite Trinotate_report.xls
A full set of sample data for loading a Trinotate.sqlite database and populating it with expression and annotation data according to the above steps is provided at
Simply './runMe.sh' in that directory to generate the fully populated 'Trinotate.sqlite' database that's ready for exploration using TrinotateWeb.
To run TrinotateWeb, you'll need a light-weight webserver installed. We recommend using lighttpd, which is free and easy to obtain for Linux and Mac. Once installed, you can launch it via:
cd $TRINOTATE_HOME ./run_TrinotateWebserver.pl 8080
and leave it running within your terminal window. To stop it, type cntrl-C or exit the terminal.
The number 8080 is the port at which it will be listening to connections from your web browser. You can use whatever open port is available.
Then, go to your web browser and visit the URL: 'http://localhost:8080/cgi-bin/index.cgi'
You should be prompted to enter in the path to your Trinotate sqlite database.
The overview tab will show basic summary statistics.
Differential Expression Analysis
Differential expression can be explored from interactive volcano plots, MA plots, and heatmaps, starting from the 'Differential Expression' tab.
An example volcano plot for a pairwise comparison between two samples is shown below:
Just double-click on a point to visit that gene or transcript's expression and annotation report. Drag and select a range to zoom in.
Gene/Transcript ID or Keyword Searches
You can search for genes or transcripts via keyword searches. For example, from the keyword search tab, I type in 'transporter' like so:
Submitting the search results in a list of all entries where 'transporter' was found among the annotation text.
Gene/Transcript ID Search
Given a gene or transcript identifier, the feature can be searched for directly via the 'Gene or Transcript ID Search' tab:
Gene/Transcript Expression and Annotation Reports
A given gene or transcript search will lead to a gene/transcript expression and annotation report page:
Annotation information will be displayed below, including a view of the position of the ORF on the transcript and any homology match information.