Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: branch-1_47
Fetching contributors…

Cannot retrieve contributors at this time

464 lines (345 sloc) 18.447 kb
This document describes how to install the Generic Genome Browser.
1. PREREQUISITES
GBrowse runs on top of several software packages. These must be
installed and configured before you can run GBrowse. Most
preconfigured Linux systems will have some of these packages installed
already.
A) MySQL -- http://www.mysql.com
The MySQL database is a fast open source relational database
that is widely used for web applications.
B) Apache Web Server -- http://www.apache.org
The Apache web server is the industry standard open source
web server for Unix and Windows systems.
C) Perl 5.005 -- http://www.cpan.org
The Perl language is widely used for web applications.
Version 5.6 is preferred, but 5.00503 or higher will work.
D) Standard Perl modules -- http://www.cpan.org
The following Perl modules must be installed for GBrowse to work.
They can be found on the Comprehensive Perl Archive Network
(CPAN):
GD
DBI
DBD::mysql
Digest::MD5
Text::Shellwords
F) Bioperl version 1.2 or higher -- http://www.bioperl.org
Optional modules:
G) XML::Parser, XML::Writer, XML::Twig, XML::DOM
If these modules are present, the "Sequence Dumper" plugin
will be able to produce GAME and BSML output. They can be
downloaded from CPAN.
H) LWP
To load remote 3d party annotations.
2. INSTALLING THE BROWSER
Brief synopsis:
perl Makefile.PL
make
make install
The last step may need to be done as root. There is a test defined
for this package, but you must initialize and load a database before
you can run it.
This will install the software in the default location under
/usr/local/apache. See "Details" to change this.
To further configure GBrowse, see docs/CONFIGURE_HOWTO.txt. To run
GBrowse on top of Oracle and BioSQL databases see
docs/ORACLE_AND_BIOSQL.txt. To run GBrowse on top of Gadfly, see
README-berkeley-gadfly. To install on Microsoft Windows platforms, see
README-windows.txt.
Details:
The browser consists of a CGI script named "gbrowse", a Perl module
that handles some of the gory details, a small number of static image
files, and a configuration directory that contains configuration files
for each data source. By default, these will be installed in the
following locations:
CGI script: /usr/local/apache/cgi-bin/gbrowse
Static images: /usr/local/apache/htdocs/gbrowse
Config files: /usr/local/apache/conf/gbrowse.conf
The module: -standard site-specific Perl library location-
You can change change the location of the installation by passing
Makefile.PL one or more NAME=VALUE pairs, like so:
perl Makefile.PL CONF=/etc HTDOCS=/home/html
This will cause the configuration files to be installed in
/etc/gbrowse.conf and the static files to be installed in
/home/html/gbrowse.
The following arguments are recognized:
CONF Configuration file directory
HTDOCS Static files directory
CGIBIN CGI script directory
INSTALLSITELIB Perl site-specific modules directory
PREFIX Base directory for conf, htdocs and cgibin
For example, if you are on a RedHat system, where the default Apache
installation uses /var/www/html for HTML files, /var/www/cgi-bin for
CGI scripts, and /etc/httpd/conf for the configuration files, you
should specify the following configuration:
perl Makefile.PL HTDOCS=/var/www/html \
CONF=/etc/httpd/conf \
CGIBIN=/var/www/cgi-bin
(The backslashes are there to split the command across multiple lines
only). To make it easier when upgrading to new versions of the
software, you can put this command into a shell script.
As a convenience, you can use the configuration option PREFIX, in
which case the static and CGI files will be placed into PREFIX/conf,
PREFIX/htdocs and PREFIX/cgi-bin respectively, where PREFIX is the
location you specified:
perl Makefile.PL PREFIX=/home/www
Note that the configuration files are always placed in a subdirectory
named gbrowse.conf. You cannot change this. Similarly, the static
files are placed in a directory named gbrowse. The install script
will detect if there are already configuration files in the selected
directory and not overwrite them if so. The same applies to the
cascading stylesheet file (gbrowse.css) located in the gbrowse
subdirectory. However, neither the GIF files in the "buttons"
subdirectory nor the plugin modules in the gbrowse.conf/plugins
directory are checked before overwriting them, so be careful to copy
the new copies somewhere safe if you have modified them.
You can always manually move the files around after install. See
docs/configuration.txt for details.
When installing the static files, the install script also creates an
empty directory named "tmp". This directory is set to be world
writable so that the GBrowse server can use it to manage temporary image
files that it creates on the fly. If you would prefer not to have a
world writable directory on your system, simply change the ownership
and permissions to allow the web server account to write into it. The
directory is located in /usr/local/apache/htdocs/gbrowse/tmp by
default.
The first time you run Makefile.PL, a file named GGB.def will be
created your file path settings. When Makefile.PL is run again, it
will ask you whether you wish to use the
3. POPULATING THE DATABASE
Synopsis:
mysql -uroot -p password -e 'create database yeast'
mysql -uroot -p password -e 'grant all privileges on yeast.* to me@localhost'
mysql -uroot -p password -e 'grant file on *.* to me@localhost'
mysql -uroot -p password -e 'grant select on yeast.* to nobody@localhost'
bulk_load_gff.pl -d yeast sample_data/yeast_data.gff
make test
Details:
You will need a MySQL database in order to start using GBrowse. Using the
mysql command line, create a database (called "yeast" in the synopsis
above), and ensure that you have update and file privileges on it.
The example above assumes that you have a username of "me" and that
you will allow updates from the local machine only. It also gives all
privileges to "me". You may be comfortable with a more restricted set
of privileges, but be sure to provide at least SELECT, UPDATE and
INSERT privileges. You will need to provide the administrator's name
and correct password for these commands to succeed.
In addition, grant the "nobody" user the SELECT privilege. The web
server usually runs as nobody, and must be able to make queries on the
database. Modify this as needed if the web server runs under a
different account.
The next step is to load the database with data. This is accomplished
by loading the database from a tab-delimited file containing the
genomic annotations in GFF format. The Bioperl distribution comes
with three tools for loading Bio::DB::GFF databases:
1) bulk_load_gff.pl
This Perl script will initialize a new Bio::DB::GFF database with
a fresh schema, deleting anything that was there before. It will
then load the file. Only suitable for use the very first time
you create a database, or when you want to start from scratch!
2) load_gff.pl
This will incrementally load a database, optionally initializing
it if it does not already exist. This script is slower, but it
allows incremental loading.
3) fast_load_gff.pl
This will incrementally load a database. On UNIX systems, it will
activate a fast loader that makes the speed almost the same as
the bulk loader. Be careful, though, because this is an experimental
piece of software.
You will find these scripts in the Bioperl distribution, in the
subdirectory scripts/Bio-DB-GFF. Earlier versions of the
distribution will have these files directly in the scripts/
subdirectory.
For testing purposes, this distribution includes a GFF file with yeast
genome annotations. The file can be found in the test_data
subdirectory. If the load is successful, you should see a message
indicating that 13298 features were successfully loaded.
Provided that the yeast load was successful, you may now run "make
test". This invokes a small test script that tests that the database
is accessible by the "nobody" user and that the basic feature
retrieval functions are working.
4. TESTING THE BROWSER
You should now be able to browse the yeast genome. Type the following
URL into your favorite browser:
http://name.of.your.host/cgi-bin/gbrowse?source=yeast
This will display the genome browser instructions and a search field.
Type in "III" to start searching chromosome III, or search for
"glucose" to find a bunch of genes that are involved in glucose
metabolism.
5. LOADING OTHER DATA SETS
Each model organism database has its own flat file format for
representing the data. This package includes three small perl scripts
that massage the model-specific annotation files into GFF format
suitable for loading:
process_gadfly.pl For FlyBase D. melanogaster flat files
process_ncbi_human.pl For human annotations from NCBI
process_sgd.pl For SGD S. cerevisiae flat files
process_wormbase.pl For WormBase C. elegans flat files
You will find the scripts, along with information on downloading the
current model organism files, in the "bin" subdirectory of this
package. The scripts will also have been copied into your system
binaries directory when you made "install". Run the script with the
-h option to get some data-specific help:
% process_gadfly.pl -h
The process_wormbase.pl script requires the AcePerl package, which is
available from CPAN. It is not strictly necessary to run this script
because the unaltered GFF files distributed from WormBase are
compatible with GBrowse. process_wormbase.pl supplements the
information with the physical positions of genetic markers, GenBank
accession numbers and functional descriptions of gene products.
To display the DNA sequence and to run sequence-dependent glyphs such
as the three-frame translation, you will need to load the DNA as well
as the annotations. The DNA must be formatted as a series of one or
more FASTA-format files in which each entry in the file corresponds to
a top-level sequence such as a chromosome pseudomolecule. You can
then run the load_gff.pl or bulk_load_gff.pl script using the -fasta
argument. For example, if the yeast genome is contained in a FASTA
file named yeast.fa, you would run the command:
bulk_load_gff.pl -d yeast -fasta yeast.fa sample/yeast_data.gff
Alternatively, you may put several FASTA files into a directory, and
provide the directory name as the argument to -fasta.
(The yeast DNA is too large to be included in this distribution, but
you can get a copy of it from
ftp://genome-ftp.stanford.edu/pub/yeast/)
Run "bulk_load_gff.pl -h" to see usage instructions.
6. CREATING YOUR OWN GENOME DATABASE
See the file doc/configuration.txt for information on how to create
new databases from scratch, add new browser tracks, and how to get the
browser to dump the DNA from the region currently under display.
7. MAKING THE BROWSER RUN FASTER
Three factors are major contributors to the length of time it takes to
load a gbrowse page:
1) loading the Perl interpreter and parsing BioPerl and all the
other Perl libraries that gbrowse uses.
2) query speed on the database
3) the conversion at the Perl layer of database data into BioPerl
objects for rendering.
To improve (1), I recommend that you install the mod_perl module for
Apache. (http://perl.apache.org). By configuring an Apache::Registry
directory and placing gbrowse inside it (rather than in the default
cgi-bin directory). The overhead for loading Perl and its libraries
are eliminated, thereby increasing the performance of the script
noticeably.
Be aware that there is a bad interaction between the Apache::DBI
module (often used to speed up database accesses) and Bio::DB::GFF.
This will cause the GFF dumper plugin to fail intermittently. GBrowse
does not need Apache::DBI to achieve performance increases under
mod_perl and it is suggested that you disable Apache::DBI. If you
cannot do this, then you should remove the file GFFDumper.pm from the
gbrowse.conf/plugins directory.
Database query performance (2) is also a major factor. If you are
using MySQL as the backend, you will see dramatic performance
increases by increasing the amount of memory available to the key
buffer, sort buffer, table cache and other in-memory data structures.
I suggest that you replace the default MySQL configuration file
(usually stored in /etc/my.cnf) with one of the large-memory sample
configuration files provided in the support-files subdirectory of the
MySQL distribution. Of course, if you tell MySQL to use more memory
than you have, then performance will degrade again.
Finally, there is a slowdown when gbrowse converts the results of
database SQL queries into renderable biological objects. This becomes
particularly noticeable when there are lots of multi-segment objects
to be displayed. You can work around this slowdown by using semantic
zooming (see docs/CONFIGURE_HOWTO.txt). Otherwise, there's not much
that can be done about this short of buying a faster machine.
The GMOD team is working hard to reduce this performance hit.
8. MAKING THE BROWSER RUN SAFER
Whenever you are running a server-side Web script using information
provided by a web client, there is a risk that maliciously-formatted
data provided by the use will trick the server-side script into
performing some unintentional action, such as modifying a file on the
server. Perl's "taint" checks are designed to catch places in the
code where such malicious data could cause harm, and GBrowse has been
tested extensively with these taint checks activated.
Because of taint checks' noticeable impact on performance, they have
been turned off in the distributed version of gbrowse. If you wish to
reactivate the extra checking (at the expense of a performance hit),
go to the file "gbrowse" located in the Web scripts directory and edit
the top line of the file to read:
#!/usr/bin/perl -w -T
The -T switch turns on taint checks.
If you are running GBrowse under mod_perl, add the following line to
the httpd.conf configuration file:
PerlTaintCheck On
This will affect all mod_perl scripts globally.
9. BIOPERL VERSIONS
Some of GBrowse's advanced features (the use of histograms in semantic
zooming; labeling of the tracks in the overview) require features of
the Bioperl Bio::Graphics and Bio::DB::GFF libraries that are only
available in Bioperl version 1.2.
9. THE GBROWSE_IMG SCRIPT
The gbrowse_img CGI script (a new feature as of version 1.41), is a
stripped-down version of gbrowse which just generates images. It is
suitable for incorporating into <img> tags in order to make a
thumbnail of a region of interest. The thumbnail can then be linked
to the full-featured gbrowse. Here is an example of how this works
using the WormBase site:
<a href="http://www.wormbase.org/db/seq/gbrowse?source=wormbase;name=mec-3">
<img src="http://www.wormbase.org/db/seq/gbrowse_img?source=wormbase;name=mec-3;width=200">
</a>
This will generate a 200-pixel inline image of the region. Clicking
on the image will link to the fully-navigable gbrowse script.
You can also use gbrowse_img to superimpose temporary features (like
BLAST hits) on the existing genome features.
Read docs/gbrowse_img.txt for the CGI parameters and other
instructions. A copy of these instructions in HTML form will be
generated when gbrowse_img is called without any arguments. Type
http://your.host/cgi-bin/gbrowse_img into your favorite web browser.
10. PLUGINS
Gbrowse has a plugin architecture which makes it easy for third-party
developers to expand its functionality. The plugins are Perl .pm
files located in the directory gbrowse.conf/plugins/. To install
plugins, simply copy them into this directory. To uninstall, remove
them.
11. THE GENBANK/EMBL PROXY
Sample configuration number 5 ("05.embl.conf") corresponds to an
experimental pass-through proxy for Genbank. At least in theory, if
you enter a landmark that isn't recognized, gbrowse will go to EMBL
using the bioperl BioFetch facility, parse the record, and enter it
into the local database. This allows you to browse arbitrary
Genbank/EMBL/Refseq entries.
You are free to experiment with this, but don't expect it to be
entirely reliable. To get it to work, you must:
a) make sure you are using Bioperl 1.02 (or a patched version of
1.01)
b) create a local database named "embl" and initialize it
this way:
perl -MBio::DB::GFF -e "Bio::DB::GFF->new('embl')->initialize(1)"
c) set up permissions for this database so that "nobody@localhost"
has SELECT, INSERT, UPDATE and DELETE privileges
d) if you need to use a proxy to access remote web sites, uncomment the
-proxy line in the conf file, and adjust the URL of the proxy as
appropriate.
e) cross your fingers
12. UPDATING THE BROWSER WITHOUT REENTERING YOUR DIRECTORY SETTINGS
When updating GBrowse to a new version of the software, you can
configure it using your preferred directory settings by making a
backup copy of the file GGB.def that was generated the first time you
installed GBrowse and using `cat GGB.def` as the argument to
Makefile.PL. Here is the recipe:
cp Generic-Genome-Browser-1.40/GGB.def Generic-Genome-Browser-1.41/GGB.def
cd Generic-Genome-Browser-1.41/
perl Makefile.PL `cat GGB.def`
make
make install
13. KNOWN BUGS
Currently there is one known bug. The navigation buttons do not
operate properly when the client is using Internet Explorer 5.1 on a
Macintosh and accessing gbrowse running on a MacOS X server running
Apache. Other combinations of clients and servers work properly.
14. FEATURE WISH LIST (updated July 7, 2002)
- Full DAS support.
- Internationalization and localization support.
- A glyph for representing cytogenetic maps.
- Multiple alignment representations.
- A percent identity (PIP) glyph.
- An XY plot glyph.
- Support for Oracle and PostgreSQL databases.
If you are interested in working on any of these features, please
contact the developers at the address given in the next section.
15. SUPPORT AND BUG REPORTS
Please send requests for help to gmod-gbrowse@lists.sourceforge.net.
There is also a formal bug tracking and feature request system in
place at http://sourceforge.net/projects/gmod/
Have fun!
Lincoln Stein & the GMOD team
lstein@cshl.org
Jump to Line
Something went wrong with that request. Please try again.