Building the QIIME AMI

Jai Ram Rideout edited this page May 26, 2015 · 47 revisions

Building the QIIME AMI (i.e., the QIIME Amazon Virtual Machine)

  1. We begin with the StarCluster base x86_64 AMI. The latest StarCluster AMI can be found on the StarCluster website. For QIIME 1.9.1, we used ami-765b3e1f. This is an Ubuntu 12.04 virtual machine.

  2. Next, we launch an instance of that AMI (since the Amazon interface changes, we're assuming that the reader knows how to do that). We launch an m1.large for building the instance, which provides sufficient CPU, memory and storage for building and testing QIIME 1.9.1. Log into that instance over ssh.

  3. Add a repository in order to obtain a newer version of R than is available by default on the system:

    echo "deb http://cran.rstudio.com/bin/linux/ubuntu precise/" | sudo tee -a /etc/apt/sources.list 
    
  4. Update your system to retrieve the latest list of packages (this step is especially important because we added a new package repository in the previous step):

    sudo apt-get -y update
    
  5. Remove OpenBLAS from the image. OpenBLAS would produce annoying warnings when any QIIME command was run (see QIIME's #1704 for more detail). Also remove matplotlib, which comes pre-installed via aptitude (failure to remove matplotlib results in a broken installation when installing/upgrading via pip below):

    sudo apt-get -y remove libopenblas-base python-matplotlib*
    
  6. Install QIIME's requisite libraries:

    sudo apt-get --force-yes -y install python-dev libncurses5-dev libssl-dev libzmq-dev libgsl0-dev openjdk-6-jdk libxml2 libxslt1.1 libxslt1-dev ant git subversion build-essential zlib1g-dev libpng12-dev libfreetype6-dev mpich2 libreadline-dev gfortran unzip libmysqlclient18 libmysqlclient-dev ghc sqlite3 libsqlite3-dev libc6-i386 libbz2-dev tcl-dev tk-dev r-base r-base-dev libatlas-dev libatlas-base-dev liblapack-dev swig libhdf5-serial-dev
    echo "options(warn=2); install.packages(c('randomForest', 'optparse', 'vegan', 'biom', 'ape', 'RColorBrewer'), repos='http://cran.r-project.org')" | sudo R --slave --vanilla
    echo "options(warn=2); source('http://bioconductor.org/biocLite.R'); biocLite(c('metagenomeSeq', 'DESeq2'))" | sudo R --slave --vanilla
    
  7. Perform a base QIIME install:

    sudo easy_install -U distribute pip
    sudo pip install numpy numexpr h5py ipython[all] --upgrade
    sudo pip install qiime
    
  8. Add a matplotlib config file to specify a non-GUI backend:

    mkdir -p $HOME/.config/matplotlib
    echo 'backend : agg' > $HOME/.config/matplotlib/matplotlibrc
    
  9. Prepare to run qiime-deploy:

    sudo mkdir /qiime_software
    sudo chown $USER /qiime_software
    sudo chgrp $USER /qiime_software
    cd /qiime_software/
    git clone git://github.com/qiime/qiime-deploy.git
    git clone git://github.com/qiime/qiime-deploy-conf.git
    cd qiime-deploy
    
  10. Run qiime-deploy. If this fails, try re-running it as sometimes package downloads may timeout, etc.:

    python qiime-deploy.py /qiime_software/ -f /qiime_software/qiime-deploy-conf/qiime-1.9.1/qiime.conf --force-remove-failed-dirs --force-remove-previous-repos
    
  11. Adding sourcing of the QIIME activation script to /etc/profile:

    sed -i '$ d' ~/.bashrc
    echo -e '\nSOFTWARE_HOME=/qiime_software\n. $SOFTWARE_HOME/activate.sh' | sudo tee -a /etc/profile
    
  12. Make biom-format commands tab-completable (via pyqi):

    mkdir /qiime_software/.bash_completion.d
    pyqi make-bash-completion --command-config-module biom.interfaces.optparse.config --driver-name biom -o /qiime_software/.bash_completion.d/biom
    echo -e '\nfor f in /qiime_software/.bash_completion.d/*;\ndo\n    source $f;\ndone' | sudo tee -a /etc/profile
    
  13. Modify QIIME config file's temp directory:

    mkdir ~/temp
    echo -e 'temp_dir\t$HOME/temp/' >> /qiime_software/qiime_config
    
  14. Set up IPython Notebook server to be accessible from a web browser (modified from instructions in #1367):

    ipython profile create
    sed -i "s/# c.NotebookApp.ip = 'localhost'/c.NotebookApp.ip = '*'/" ~/.ipython/profile_default/ipython_notebook_config.py
    sed -i "s/# c.NotebookApp.open_browser = True/c.NotebookApp.open_browser = False/" ~/.ipython/profile_default/ipython_notebook_config.py
    sed -i "s/# c.NotebookApp.password = u''/c.NotebookApp.password = u'sha1:8f4908e22921:69d3122a66fba11bf1922b116fbb290c1ffec501'/" ~/.ipython/profile_default/ipython_notebook_config.py
    
  15. The following text should be included as the message of the day (so it is printed when users log into a QIIME virtual machine instance). To do this, first remove the StarCluster message of the day (we instead credit StarCluster in ours):

    sudo rm /etc/update-motd.d/00-starcluster
    

    Then create a new file, /etc/update-motd.d/00-qiime, with the following contents:

    #!/bin/sh
    cat<<"EOF"
       ___      _____  _____  ____    ____  ________
     .'   `.   |_   _||_   _||_   \  /   _||_   __  |
    /  .-.  \    | |    | |    |   \/   |    | |_ \_|
    | |   | |    | |    | |    | |\  /| |    |  _| _
    \  `-'  \_  _| |_  _| |_  _| |_\/_| |_  _| |__/ |
     `.___.\__||_____||_____||_____||_____||________|
    
    
    QIIME 1.9.1 AMI (derived from the StarCluster Ubuntu 12.04 AMI)
    www.qiime.org
    
    Getting help: help.qiime.org
    QIIME script index: scripts.qiime.org
    QIIME workshops: workshops.qiime.org
    QIIME help videos: videos.qiime.org
    StarCluster (building AWS-based clusters): star.mit.edu/cluster
    IPython, and the IPython Notebook: ipython.org
    Software Carpentry (educational resources for Linux and scientific computing):
        software-carpentry.org
    
    QIIME is powered by scikit-bio: scikit-bio.org
    Qiita, QIIME-powered microbiome data storage and analysis: qiita.microbio.me
    biocore, collaboratively developed bioinformatics software: github.com/biocore
    
    To print configuration and version info for QIIME and its dependencies, run:
        print_qiime_config.py
    
    Current System Stats:
    
    EOF
    
    landscape-sysinfo | grep -iv 'graph this data'
    

    Add execute permissions to the file since it is a script:

    sudo chmod a+x /etc/update-motd.d/00-qiime
    

    Log out and log back in.

  16. Before finalizing the instance and creating the AMI, run print_qiime_config.py to ensure that everything is installed correctly:

    print_qiime_config.py -tf
    

    This command should only have a single failure for usearch. Be sure that no warnings are printed in the output (we saw warnings at the top of the output when testing the release candidate).

    Next, run scikit-bio's test suite (this caught some installation errors while testing the release candidate):

    nosetests skbio --with-doctest -s -I DONOTIGNOREANYTHING
    

    Finally, run QIIME's full test suite:

    cd ~/temp
    wget https://pypi.python.org/packages/source/q/qiime/qiime-1.9.1.tar.gz
    tar xvf qiime-1.9.1.tar.gz
    

    We recommend running the tests from within a screen session, redirecting all output to a file:

    screen
    python qiime-1.9.1/tests/all_tests.py &> all-tests-output.txt
    

    All tests except for those related to usearch, sfffile, sffinfo, and torque must pass.

  17. Remove history (this must be the final step before creating the AMI with StarCluster):

    rm -rf ~/.bash_history ~/.viminfo ~/.lesshst ~/temp/*
    
  18. Logout and create the AMI with starcluster ebsimage. Using StarCluster is important because it cleans up private information that will be stored on the instance, such as your public key. WARNING: This will prevent you (and anyone else) from ever logging into this instance again!

    starcluster ebsimage <instance ID> qiime-191
    
  19. From the AWS console, terminate the instance.

  20. Tag the new AMI and corresponding snapshot to indicate it is the QIIME 1.9.1 AMI.

  21. Start a new instance using the AMI you just created. Follow the instructions in the QIIME AWS tutorial to start an IPython Notebook server. Make sure that you can log in to the server from a web browser and that running QIIME commands works as expected (e.g., print_qiime_config.py).

  22. From the AWS console, terminate the instance.

  23. Following the instructions here, use StarCluster to start a small cluster (e.g., 3 nodes) with the new AMI.

  24. Test out a few QIIME commands on the cluster, including serial and parallel commands. For example, you could run the commands in the Illumina overview tutorial. Be sure to pass -aO <n> to the parallel commands (e.g., pick_open_reference_otus.py) to enable parallel job execution. Make sure that jobs are being submitted to the queue (e.g., using qstat) and that processes are being run on the worker nodes (e.g., using htop). Test submitting jobs (serial and parallel) to the queue using start_parallel_jobs_sc.py, as well as running commands directly on the master node.

  25. Using StarCluster, terminate the cluster.

  26. Once done testing, make the AMI public (so that it is available under Community AMIs).

  27. Update the QIIME resources page to include the new AMI.

  28. Notify users via email, blog post, forum post, Twitter, etc.

Acknowledgements

This document was initially derived from our AMI release notes and the qiime-deploy README.md.