Find file History

README.AWS

GBrowse Amazon EC2 Image -- Early documentation

The development version of GBrowse is located on the private AMI
ami-2e48b347, named "GBrowse 2.40 Master".

To launch it:

  1. Find the image in the AWS Console.
  2. Right click and select "Launch Instance"
     - Launch 1 instance.
     - Select either the "micro" or "small" instance size.
     - Select termination protection (good idea)
     - Select your public/private keypair
     - Select the WebServer security group (ports 80 and 22 open)
  3. Wait for the instance to boot up, as indicated by "running" state
     in the console instance browser.

To access the web server:

  1. Identify the public DNS name for the running instance, as indicated
       in the console.
  2. Point your web browse to this DNS name, using:
       http://XXXX.amazonaws.com/gb2/gbrowse/elegans/
      or
       http://XXXX.amazonaws.com/gb2/gbrowse/yeast

To log in:

   1. Identify the public DNS name for the running instance
   2. ssh to the instance using your keypair file and the
       user name "gbrowse":

        ssh -i keypair_file.pem gbrowse@XXXX.amazonaws.com
   3. This should give you a command shell on the remote
        machine.

To launch slave processes:

   1. While logged in to the running instance, create a .eucarc
        file in your home directory. It should contain the
	following:

	EC2_ACCESS_KEY=<your access key here>
	EC2_SECRET_KEY=<your secret key here>

    2. Run the following command:

            ~/GBrowse/bin/gbrowse_attach_slaves.pl <count>

      Count is the number of rendering slaves you wish to launch.
      This will launch the indicated number of GBrowse slave
      instances and attach them to the running GBrowse process.

HOW THE SYSTEM WORKS

Filesystem Structure
--------------------

All GBrowse-related infrastructure, including libraries and
configuration files is mounted on /srv/gbrowse. For example, the
master GBrowse.conf script can be found at
/srv/gbrowse/etc/GBrowse.conf.

Species-specific datasets are mounted at /srv/gbrowse/species/XXXXX,
where XXXXX is the name of the species. Within each species directory,
you will find the following:

  species.conf -- Contains the data source definition for this species.
  tracks.conf  -- Contains detailed track configuration for this
                     data source.
  dbs/         -- SQLite databases for this data source.
  Source/      -- Source files used to construct the SQLite databases
  Source/README-- Description of how to get the source and regenerate
                     the SQLite databases (this may be incomplete)
  bin/         -- Scripts possibly used during the collection and
                     processing of source data.

/srv/gbrowse and each of the species mounts all occupy distinct EBS
volumes and have a corresponding snapshot. The idea is that by
mounting and unmounting the volumes, you can control what data sources
are available to GBrowse (and avoid paying for storage for species you
don't care about).

Here is the current mapping between EBS volumes and snapshots:

   /srv/gbrowse                         snap-c43e21aa
   /srv/gbrowse/species/s_cerevisiae    snap-c23e21ac
   /srv/gbrowse/species/c_elegans       snap-c03e21ae  

After mounting or unmounting a species-specific volume, you should
restart GBrowse using /etc/init.d/apache2 restart.

The gbrowse_attach_slaves.pl Script
-----------------------------------

This script uses the euca2ools command-line tools, which in turn uses
Amazon's REST API. The REST API is a lot faster than the SOAP API, so
I prefer it.

 1. Look up which species volumes are mounted on the currently-running
    master machine. This is done by inspecting the filesystem mount
    tables.
 2. Find out what EBS snapshots correspond to the mounted volumes.This
    is done via a series of euca2ools calls.
 3. Look up the AMI image for the current GBrowse Slave AMI. This is
    currently done by inspecting the file
    /srv/gbrowse/etc/ami_map.txt.
 4. Create a new security group for the slave instances that allows
    network connections between the currently running master instance
    and the slaves.
 5. Launch the desired number of GBrowse slave instances using the
    AMI identified in step (3), the security group created
    in step (4), and the EBS snapshots identified in step (2).
 6. As soon as the instances are running, update the configuration
    file /srv/gbrowse/etc/renderfarm.conf so that the running GBrowse
    process is aware of the slaves.
 7. Restart gbrowse.

To Do
-----

 1. The gbrowse_attach_slaves.pl script should record the instanceIds
    of the launched instances so that they can be shut down when no
    longer needed.

    Ideally this could be done by attaching a tag to
    the instance -- something like SlaveOf=i-12345, where i-12345 is
    the ID of the currently running master intance. The challenge is
    that euca2ools doesn't currently support the tagging API. I have
    started work on a Perl interface that supports just enough of the
    tagging API to get this done.
 
 2. There should be a gbrowse_detach_slaves.pl script that will use
    this recorded slave instance information to terminate one or more
    of the slaves (or all of them) and deregister them from
    renderfarm.conf.

 3. Using /srv/gbrowse/etc/ami_map.txt to find the slave AMI is a
    bit awkward. It means that every time we update the slave image
    we have to fix ami_map.txt and create a new snapshot of the
    /srv/gbrowse image. It would be better to use the tagging system
    to mark the latest slave.

 4. Web-based interface for mounting and unmounting species data
     volumes, and controlling the number of atached slaves.