Skip to content
This repository
tag: release-1_43
Fetching contributors…

Cannot retrieve contributors at this time

file 364 lines (268 sloc) 14.006 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364
This document describes how to install the Generic Genome Browser.

1. PREREQUISITES

GBrowse runs on top of several software packages. These must be
installed and configured before you can run GBrowse. Most
preconfigured Linux systems will have some of these packages installed
already.

  A) MySQL -- http://www.mysql.com
The MySQL database is a fast open source relational database
that is widely used for web applications.

  B) Apache Web Server -- http://www.apache.org
The Apache web server is the industry standard open source
web server for Unix and Windows systems.

  C) Perl 5.005 -- http://www.cpan.org
The Perl language is widely used for web applications.
Version 5.6 is preferred, but 5.00503 or higher will work.

  D) Standard Perl modules -- http://www.cpan.org
The following Perl modules must be installed for GBrowse to work.
They can be found on the Comprehensive Perl Archive Network
(CPAN):

GD
        DBI
        DBD::mysql
Digest::MD5
Text::Shellwords

  E) Bio::DB::GFF module -- http://www.bioperl.org
This is the middleware layer that translates between
the CGI script to the database. It is part of the
Bioperl package. See the note at the bottom of this file
about Bioperl versions.

  F) Bio::Graphics module -- http://www.gmod.org
This is the module that renders the information in the
database into graphics. It is part of the GMOD project
and can be downloaded from the same location you got this
package.

2. INSTALLING THE BROWSER

Brief synopsis:

perl Makefile.PL
make
make install

The last step may need to be done as root. There is a test defined
for this package, but you must initialize and load a database before
you can run it.

This will install the software in the default location under
/usr/local/apache. See "Details" to change this.

Details:

The browser consists of a CGI script named "gbrowse", a Perl module
that handles some of the gory details, a small number of static image
files, and a configuration directory that contains configuration files
for each data source. By default, these will be installed in the
following locations:

   CGI script: /usr/local/apache/cgi-bin/gbrowse
   Static images: /usr/local/apache/htdocs/gbrowse
   Config files: /usr/local/apache/conf/gbrowse.conf
   The module: -standard site-specific Perl library location-

You can change change the location of the installation by passing
Makefile.PL one or more NAME=VALUE pairs, like so:

  perl Makefile.PL CONF=/etc HTDOCS=/home/html

This will cause the configuration files to be installed in
/etc/gbrowse.conf and the static files to be installed in
/home/html/gbrowse.

The following arguments are recognized:

  CONF Configuration file directory
  HTDOCS Static files directory
  CGIBIN CGI script directory
  INSTALLSITELIB Perl site-specific modules directory
  PREFIX Base directory for conf, htdocs and cgibin

For example, if you are on a RedHat system, where the default Apache
installation uses /var/www/html for HTML files, /var/www/cgi-bin for
CGI scripts, and /etc/httpd/conf for the configuration files, you
should specify the following configuration:

  perl Makefile.PL HTDOCS=/var/www/html \
CONF=/etc/httpd/conf \
CGIBIN=/var/www/cgi-bin

(The backslashes are there to split the command across multiple lines
only). To make it easier when upgrading to new versions of the
software, you can put this command into a shell script.

As a convenience, you can use the configuration option PREFIX, in
which case the static and CGI files will be placed into PREFIX/conf,
PREFIX/htdocs and PREFIX/cgi-bin respectively, where PREFIX is the
location you specified:

  perl Makefile.PL PREFIX=/home/www

Note that the configuration files are always placed in a subdirectory
named gbrowse.conf. You cannot change this. Similarly, the static
files are placed in a directory named gbrowse. The install script
will detect if there are already configuration files in the selected
directory and not overwrite them if so. The same applies to the
cascading stylesheet file (gbrowse.css) located in the gbrowse
subdirectory. However, the GIF files in the "buttons" subdirectory
are not checked before they are overwritten, so be careful to copy the
new copies somewhere safe if you have modified them.

You can always manually move the files around after install. See
docs/configuration.txt for details.

When installing the static files, the install script also creates an
empty directory named "tmp". This directory is set to be world
writable so that the GBrowse server can use it to manage temporary image
files that it creates on the fly. If you would prefer not to have a
world writable directory on your system, simply change the ownership
and permissions to allow the web server account to write into it. The
directory is located in /usr/local/apache/htdocs/gbrowse/tmp by
default.


3. POPULATING THE DATABASE

Synopsis:

  mysql -uroot -p password -e 'create database yeast'

  mysql -uroot -p password -e 'grant all privileges on yeast.* to me@localhost'
  mysql -uroot -p password -e 'grant file on *.* to me@localhost'
  mysql -uroot -p password -e 'grant select on yeast.* to nobody@localhost'

  bulk_load_gff.pl -d yeast sample_data/yeast_data.gff
  make test

Details:

You will need a MySQL database in order to start using GBrowse. Using the
mysql command line, create a database (called "yeast" in the synopsis
above), and ensure that you have update and file privileges on it.
The example above assumes that you have a username of "me" and that
you will allow updates from the local machine only. It also gives all
privileges to "me". You may be comfortable with a more restricted set
of privileges, but be sure to provide at least SELECT, UPDATE and
INSERT privileges. You will need to provide the administrator's name
and correct password for these commands to succeed.

In addition, grant the "nobody" user the SELECT privilege. The web
server usually runs as nobody, and must be able to make queries on the
database. Modify this as needed if the web server runs under a
different account.

The next step is to load the database with data. This is accomplished
by loading the database from a tab-delimited file containing the
genomic annotations in GFF format. The Bioperl distribution comes
with two tools for loading Bio::DB::GFF databases:

  1) bulk_load_gff.pl
     This Perl script will initialize a new Bio::DB::GFF database with
     a fresh schema, deleting anything that was there before. It will
     then load the file. Only suitable for use the very first time
     you create a database, or when you want to start from scratch!

  2) load_gff.pl
     This will incrementally load a database, optionally initializing
     it if it does not already exist. This script is slower, but it
     allows incremental loading.

You will find these scripts in the Bioperl distribution, in the
subdirectory scripts/Bio-DB-GFF. Earlier versions of the
distribution will have these files directly in the scripts/
subdirectory.

For testing purposes, this distribution includes a GFF file with yeast
genome annotations. The file can be found in the test_data
subdirectory. If the load is successful, you should see a message
indicating that 13298 features were successfully loaded.

Provided that the yeast load was successful, you may now run "make
test". This invokes a small test script that tests that the database
is accessible by the "nobody" user and that the basic feature
retrieval functions are working.

4. TESTING THE BROWSER

You should now be able to browse the yeast genome. Type the following
URL into your favorite browser:

  http://name.of.your.host/cgi-bin/gbrowse?source=yeast

This will display the genome browser instructions and a search field.
Type in "III" to start searching chromosome III, or search for
"glucose" to find a bunch of genes that are involved in glucose
metabolism.

5. LOADING OTHER DATA SETS

Each model organism database has its own flat file format for
representing the data. This package includes three small perl scripts
that massage the model-specific annotation files into GFF format
suitable for loading:

  process_gadfly.pl For FlyBase D. melanogaster flat files
  process_sgd.pl For SGD S. cerevisiae flat files
  process_wormbase.pl For WormBase C. elegans flat files

You will find the scripts, along with information on downloading the
current model organism files, in the bin subdirectory of this package.
The scripts will also have been copied into your system binaries
directory when you made "install". Run the script with the -h option
to get some data-specific help:

  % process_gadfly.pl -h

The process_wormbase.pl script requires the AcePerl package, which is
available from CPAN. It is not strictly necessary to run this script
because the unaltered GFF files distributed from WormBase are
compatible with GBrowse. process_wormbase.pl supplements the
information with the physical positions of genetic markers, GenBank
accession numbers and functional descriptions of gene products.

6. CREATING YOUR OWN GENOME DATABASE

See the file doc/configuration.txt for information on how to create
new databases from scratch, add new browser tracks, and how to get the
browser to dump the DNA from the region currently under display.

7. MAKING THE BROWSER RUN FASTER

If you have mod_perl (http://perl.apache.org), you can install the
gbrowse script as an Apache::Registry script. This will increase the
performance of the script noticeably.

Be aware that there is a bad interaction between the Apache::DBI
module (often used to speed up database accesses) and Bio::DB::GFF.
This will result in the GFF dump feature failing intermittently.
GBrowse does not need Apache::DBI to achieve performance increases
under mod_perl and it is suggested that you disable Apache::DBI.

8. BIOPERL VERSIONS

GBrowse works with Bioperl version 1.0. However, one useful feature --
the ability to use wildcards in the search field -- requires Bioperl
version 1.01 or higher. In addition, there are a number of subtle
display problems present in Bioperl 1.0 that have since been fixed.


At the time this was written, Bioperl 1.01 had not been officially
released. There are two ways to get it:

  a. The Patch method

  This distribution contains a patch file named "bioperl-1.0.patch"
  located in the "extras" subdirectory. Applying it to a virgin
  Bioperl-1.0 distribution will generate a version 1.01 distribution.
  The steps to follow are these:

  a1. unpack bioperl-1.0.tar.gz:
      gunzip -c bioperl-1.0.tar.gz | tar xvf -

  a2. enter bioperl-1.0:
      cd bioperl-1.0

  a3. apply the patch:
      patch -p1 < ../Generic-Genome-Browser-1.XX/extras/bioperl-1.0.patch

  The last command will need to be modified to indicate the actual
  location of the patch file.

  b. By anonymous CVS.

  You can get Bioperl 1.01 by anonymous CVS using this incantation:

   cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl \
checkout -r branch-1-0-0 bioperl-live

   When prompted, the password is "cvs". For more details, see
   http://cvs.bioperl.org.

9. THE GBROWSE_IMG SCRIPT

The gbrowse_img CGI script (a new feature as of version 1.41), is a
stripped-down version of gbrowse which just generates images. It is
suitable for incorporating into <img> tags in order to make a
thumbnail of a region of interest. The thumbnail can then be linked
to the full-featured gbrowse. Here is an example of how this works
using the WormBase site:

  <a href="http://www.wormbase.org/db/seq/gbrowse?source=wormbase;name=mec-3">
    <img src="http://www.wormbase.org/db/seq/gbrowse_img?source=wormbase;name=mec-3;width=200">
  </a>

This will generate a 200-pixel inline image of the region. Clicking
on the image will link to the fully-navigable gbrowse script.

You can also use gbrowse_img to superimpose temporary features (like
BLAST hits) on the existing genome features.

Read docs/gbrowse_img.txt for the CGI parameters and other
instructions. A copy of these instructions in HTML form will be
generated when gbrowse_img is called without any arguments. Type
http://your.host/cgi-bin/gbrowse_img into your favorite web browser.

10. THE GENBANK/EMBL PROXY

Sample configuration number 5 ("05.embl.conf") corresponds to an
experimental pass-through proxy for Genbank. At least in theory, if
you enter a landmark that isn't recognized, gbrowse will go to EMBL
using the bioperl BioFetch facility, parse the record, and enter it
into the local database. This allows you to browse arbitrary
Genbank/EMBL/Refseq entries.

You are free to experiment with this, but don't expect it to be
entirely reliable. To get it to work, you must:

   a) get the most up to date version of Bioperl. The patch file
does *NOT* contain the code needed to support this.

   b) create a local database named "embl" and initialize it
this way:

      perl -MBio::DB::GFF -e "Bio::DB::GFF->new('embl')->initialize(1)"

   c) set up permissions for this database so that "nobody@localhost"
has SELECT, INSERT, UPDATE and DELETE privileges

   d) cross your fingers

11. UPDATING THE BROWSER WITHOUT REENTERING YOUR DIRECTORY SETTINGS

When updating GBrowse to a new version of the software, you can
configure it using your preferred directory settings by making a
backup copy of the file GGB.def that was generated the first time you
installed GBrowse and using `cat GGB.def` as the argument to
Makefile.PL. Here is the recipe:

  cp Generic-Genome-Browser-1.40/GGB.def Generic-Genome-Browser-1.41/GGB.def
  cd Generic-Genome-Browser-1.41/
  perl Makefile.PL `cat GGB.def`
  make
  make install

12. SUPPORT AND BUG REPORTS

Please send requests for help to gmod-devel@lists.sourceforge.net.
There is also a formal bug tracking and feature request system in
place at http://sourceforge.net/projects/gmod/


Have fun!

Lincoln Stein & the GMOD team
lstein@cshl.org
May 5, 2002
Something went wrong with that request. Please try again.