Skip to content

BiopiecesGenomeBrowser

Martin Asser Hansen edited this page Oct 1, 2015 · 5 revisions

Motivation

The Biopieces Genome Browser (BGB) was developed for browsing prokaryotic genomes, but should be scalable and allow browsing of eukaryotic genomes, even though some performance optimizations probably are in order.

A genome browser is a must for any genomic researcher, and several genome browsers exists such as the UCSC Genome Browser, [Ensembl], and [Gbrowse]. However, these browsers are either optimized for eukaryotic organisms, or are complex to setup, configure, and maintain - and finally, it is painful to install custom genomes and tracks. Recently, [Jbrowse] have been launched as an alternative to [Gbrowse], and while the browsing experience is superior the installation of new genomes and custom tracks is still a mess.

The BGB is a temporary solution until the day [Jbrowse] hopefully matures and allows easy installation of new genomes, manipulation of custom tracks, and fine-grained permission control of who can browse what.

In BGB the addition of new genomes and manipulation of custom tracks are done using Biopieces.

Features

  • Simple installation of new genomes using BGB_upload.
  • Simple addition of custom tracks using BGB_upload.
  • Simple removal of custom trakc using BGB_delete_track.
  • Intersection of custom tracks using BGB_intersect.
  • All tracks can be listed with BGB_list.
  • Simple Browser interface allowing moving, zooming, and centering.
  • Export of features as DNA sequence.
  • Searching for features using REGEXP.
  • No database. The data is located in a file tree allowing the use of UNIX filesystem permission control.
  • Features are stored in [KISS format] or in bytearrays for wiggle type tracks.
  • No index. Indexing data is reserved for future enhancements -> speed is OK for now.

Screenshot

http://farm5.static.flickr.com/4044/4445918896_2b9b1ac988_o.png

Examples

Here is a couple of examples demonstrating the use of BGB:

Installing a new genome:

read_fasta -i genome.fna | BGB_upload -u maasha -c Bacteria -g my_genome -a 2010-03-17 -x

Adding a custom feature track:

read_bed -i data.bed | BGB_upload -u maasha -c Bacteria -g my_genome -a 2010-03-17 -t my_bed_track -x

Adding another feature track:

read_blast_tab -i blast.tab | BGB_upload -u maasha -c Bacteria -g my_genome -a 2010-03-17 -t my_BLAST_track -x

Intersecting two tracks and upload the result as a new track:

BGB_intersect -c Bacteria -g my_genome -a 2010-03-17-T my_bed_track -t my_BLAST_track |
BGB_upload -u maasha -c Bacteria -g my_genome -a 2010-03-17 -t my_intersection -x

And the new track can of cause be read into a Biopiece stream:

BGB_read_track -u maasha -c Bacteria -g my_genome -a 2010-03-17 -t my_intersection

Adding a wiggle type track:

read_sam -i data.sam | BGB_upload -u maasha -c Bacteria -g my_genome -a 2010-03-17 -t my_wiggle_track -T wiggle -x

Installing BGB

Requirements

  • A working Biopieces installation ).
  • An Apache2 httpd server with SSL.

Edit $BP_DIR/bp_conf/bp_httpd.conf and change according to your system setup.

Add the following line to your Apache servers main httpd.conf file:

Include $BP_DIR/bp_conf/bp_httpd.conf

Note that you must expand the $BP_DIR to the full path.

(stuff about certificates)

Data directory layout

The BGB data is ordered in a file tree with the following levels:

Users > Clade > Genome > Assembly > Contig

  • Users is a directory that holds a subdirectory for each BGB user.
  • Clade is one or more subdirectories, such as Archaea, Bacteria, etc. under the BGB user directories.
  • Genomes are added as subdirectories under each BGB user's Clade directories.
  • Assemblies are added as subdirectories under each Genome directory.
  • Contigs are added as subdirectories under each Assembly direcotry.

Here is an example of an actual data layout:

$BP_DIR
`-- www
    `-- Data
        `-- Users
            `-- maasha
                |-- Archaea
                `-- Bacteria
                    `-- A.ikkense
                        `-- 2010-03-02
                            `-- NODE_1000_length_12738_cov_16.636206
                                |-- Sequence
                                |   `-- sequence.txt
                                `-- Tracks
                                    |-- 0010_Gaps
                                    |   `-- track_data.kiss
                                    |-- 0020_Solexa_Bowtie_m2
                                    |   `-- track_data.wig
                                    |-- 0030_RAST_annotation
                                    |   `-- track_data.kiss
                                    |-- 0040_Prodigal_gene_finder
                                    |   `-- track_data.kiss
                                    `-- 0050_Self_Self_MegaBlast
                                        `-- track_data.kiss

In the above example the user is maasha, the clade is Bacteria, the genome is A.ikkense, the assembly is 2010-03-02, and the contig is NODE_1000_length_12738_cov_16.636206.

Foreach Contig there are a subdirectory Sequence containing the contig sequence in the file sequence.txt. The sequence is stored in ASCII with no header, whitespace or newlines. Optionally, there may be subdirectory Tracks containing custom tracks. Each custom track is located in a subdirectory prefixed with a 4 digit number, that indicates the order of the tracks. Inside each track directory there is either a track_data.kiss file for features in KISS format or a track_data.wig for data in Wiggle format.

New BGB users are simple added by creating a directory in the $BP_DIR/www/Users/ directory and adding symlinks at the Clade, Genome, or Assembly level yields fine grained permission control.

Clone this wiki locally