Latest commit 46f4ed7
Jul 12, 2011
|Failed to load latest commit information.|
GBrowse Amazon EC2 Image -- Early documentation The development version of GBrowse is located on the private AMI ami-2e48b347, named "GBrowse 2.40 Master". To launch it: 1. Find the image in the AWS Console. 2. Right click and select "Launch Instance" - Launch 1 instance. - Select either the "micro" or "small" instance size. - Select termination protection (good idea) - Select your public/private keypair - Select the WebServer security group (ports 80 and 22 open) 3. Wait for the instance to boot up, as indicated by "running" state in the console instance browser. To access the web server: 1. Identify the public DNS name for the running instance, as indicated in the console. 2. Point your web browse to this DNS name, using: http://XXXX.amazonaws.com/gb2/gbrowse/elegans/ or http://XXXX.amazonaws.com/gb2/gbrowse/yeast To log in: 1. Identify the public DNS name for the running instance 2. ssh to the instance using your keypair file and the user name "gbrowse": ssh -i keypair_file.pem gbrowse@XXXX.amazonaws.com 3. This should give you a command shell on the remote machine. To launch slave processes: 1. While logged in to the running instance, create a .eucarc file in your home directory. It should contain the following: EC2_ACCESS_KEY=<your access key here> EC2_SECRET_KEY=<your secret key here> 2. Run the following command: ~/GBrowse/bin/gbrowse_attach_slaves.pl <count> Count is the number of rendering slaves you wish to launch. This will launch the indicated number of GBrowse slave instances and attach them to the running GBrowse process. HOW THE SYSTEM WORKS Filesystem Structure -------------------- All GBrowse-related infrastructure, including libraries and configuration files is mounted on /srv/gbrowse. For example, the master GBrowse.conf script can be found at /srv/gbrowse/etc/GBrowse.conf. Species-specific datasets are mounted at /srv/gbrowse/species/XXXXX, where XXXXX is the name of the species. Within each species directory, you will find the following: species.conf -- Contains the data source definition for this species. tracks.conf -- Contains detailed track configuration for this data source. dbs/ -- SQLite databases for this data source. Source/ -- Source files used to construct the SQLite databases Source/README-- Description of how to get the source and regenerate the SQLite databases (this may be incomplete) bin/ -- Scripts possibly used during the collection and processing of source data. /srv/gbrowse and each of the species mounts all occupy distinct EBS volumes and have a corresponding snapshot. The idea is that by mounting and unmounting the volumes, you can control what data sources are available to GBrowse (and avoid paying for storage for species you don't care about). Here is the current mapping between EBS volumes and snapshots: /srv/gbrowse snap-c43e21aa /srv/gbrowse/species/s_cerevisiae snap-c23e21ac /srv/gbrowse/species/c_elegans snap-c03e21ae After mounting or unmounting a species-specific volume, you should restart GBrowse using /etc/init.d/apache2 restart. The gbrowse_attach_slaves.pl Script ----------------------------------- This script uses the euca2ools command-line tools, which in turn uses Amazon's REST API. The REST API is a lot faster than the SOAP API, so I prefer it. 1. Look up which species volumes are mounted on the currently-running master machine. This is done by inspecting the filesystem mount tables. 2. Find out what EBS snapshots correspond to the mounted volumes.This is done via a series of euca2ools calls. 3. Look up the AMI image for the current GBrowse Slave AMI. This is currently done by inspecting the file /srv/gbrowse/etc/ami_map.txt. 4. Create a new security group for the slave instances that allows network connections between the currently running master instance and the slaves. 5. Launch the desired number of GBrowse slave instances using the AMI identified in step (3), the security group created in step (4), and the EBS snapshots identified in step (2). 6. As soon as the instances are running, update the configuration file /srv/gbrowse/etc/renderfarm.conf so that the running GBrowse process is aware of the slaves. 7. Restart gbrowse. To Do ----- 1. The gbrowse_attach_slaves.pl script should record the instanceIds of the launched instances so that they can be shut down when no longer needed. Ideally this could be done by attaching a tag to the instance -- something like SlaveOf=i-12345, where i-12345 is the ID of the currently running master intance. The challenge is that euca2ools doesn't currently support the tagging API. I have started work on a Perl interface that supports just enough of the tagging API to get this done. 2. There should be a gbrowse_detach_slaves.pl script that will use this recorded slave instance information to terminate one or more of the slaves (or all of them) and deregister them from renderfarm.conf. 3. Using /srv/gbrowse/etc/ami_map.txt to find the slave AMI is a bit awkward. It means that every time we update the slave image we have to fix ami_map.txt and create a new snapshot of the /srv/gbrowse image. It would be better to use the tagging system to mark the latest slave. 4. Web-based interface for mounting and unmounting species data volumes, and controlling the number of atached slaves.