SMRT Analysis Software Installation v2.1

pb-jlandolin edited this page Mar 17, 2014 · 66 revisions
Clone this wiki locally

Important Changes

SMRT Analysis migrated to a completely new directory structure starting with v2.1. Instead of $SEYMOUR_HOME, we are now using $SMRT_ROOT, and you will not need to specify it explicitly. We still recommend that $SMRT_ROOT be set to /opt/smrtanalysis/, but the underlying folders will be as follows (arrows indicate softlinks):

/opt/smrtanalysis/
              admin/
                   bin/
                   log/

              current --> softlink to ../install/smrtanalysis-2.1.0

              install/
                 smrtanalysis-<other versions>/
                 smrtanalysis-2.1.0/

              userdata/  --> softlink to offline storage location

System Requirements

Operating System

  • SMRT® Analysis is only supported on:
    • English-language Ubuntu 12.04, Ubuntu 10.04, Ubuntu 8.04
    • English-language RedHat/CentOS 6.3, RedHat/CentOS 5.6, RedHat/CentOS 5.3
  • If you are using alternate versions of Ubuntu or CentOS (not recommended), you should download and install the SMRT Analysis executable that is older than the OS installed on your system. (For example, if you are running CentOS 6.4, you should run the CentOS 6.3 executable). The software assumes a uniform operating system across all compute nodes. If you have different OS versions on your cluster (not recommended), choose an executable that matches the oldest OS on your compute nodes.

  • Check for any library errors when running an initial RS_resequencing analysis job on lambda. Here are some common packages that need to be installed:

    • RedHat/CentOS 5.xxx: Enter sudo yum install mysql-server perl-XML-Parser openssl redhat-lsb
    • RedHat/CentOS 6.xxx: Enter sudo yum install mysql-server perl-XML-Parser openssl098e redhat-lsb
    • Ubuntu 10.xxx: Enter sudo aptitude install mysql-server libxml-parser-perl libssl0.9.8
  • SMRT Analysis cannot be installed on the Mac OS or Windows.

Running SMRT® Analysis in the Cloud

Users who do not have access to a server with the supported OS can use the public Amazon Machine Image (AMI). For details, see the document Running SMRT Analysis on Amazon.

Software Requirement

  • MySQL 5 (yum install mysql-server; apt-get install mysql-server)
  • bash
  • Perl (v5.10.1)
    • Statistics::Descriptive Perl module: sudo cpan Statistics::Descriptive

Client web browser:

We recommend using Google Chrome® 21 web browsers to run SMRT Portal for consistent functionality. We also support Apple’s Safari® and Internet Explorer® web browsers; however some features may not be optimized on these browsers.

Client Java:

To run SMRT View, we recommend using Java 7 for Windows (Java 7 64 bit for users with 64 bit OS), and Java 6 for the Mac OS.

Minimum Hardware Requirements

1 head node:

  • Minimum 8 cores, with 2 GB RAM per core.
  • Minimum 250 GB of disk space.

Compute nodes:

  • Minimum 3 compute nodes. We recommend 5 nodes for high utilization focused on de novo assemblies.
  • Minimum 8 cores per node, with 2 GB RAM per core. We recommend 16 cores per node with 4 GB RAM per core.
  • Minimum 250 GB of disk space per node.
  • To perform de novo assembly of large genomes using the Celera® Assembler, one of the nodes will need to have considerably more memory. See the Celera® Assembler home page for recommendations: http://wgs-assembler.sourceforge.net/.

Notes:

  • It is possible, but not advisable, to install SMRT Analysis on a single-node machine (see the distributed computing section). You will likely be able to submit jobs one SMRT Cell at a time, but the time to completion may be long as the software may not have sufficient resources to complete the job.

  • The RS_ReadsOfInsert protocol can be compute-intensive. If you plan to run it on every SMRT Cell, we recommend adding 3 additional 8-core compute nodes with at least 4 GB of RAM per core.

Data storage:

  • 10 TB (Actual storage depends on usage.)

Network File System Requirement

Please refer to the IT Site Prep guide provided with your instrument purchase for more details.

  1. The SMRT Analysis software directory (We recommend $SMRT_ROOT=/opt/smrtanalysis) must have the same path and be readable by the smrtanalysis user across all compute nodes via NFS.

  2. The SMRT Cell input directory (We recommend $SMRT_ROOT/pacbio_insrument_data/) must have the same path and be readable by the smrtanalysis user across all compute nodes via NFS. This directory contains data from the instrument and can either be a directory configured by RS Remote during instrument installation, or a directory you created when you received data from a core lab.

  3. The SMRT Analysis output directory (We recommend $SMRT_ROOT/userdata) must have the same path and be writable by the smrtanalysis user across all compute nodes via NFS. This directory is usually soft-linked to a large storage volume.

  4. The SMRT Analysis temporary directory is used for fast I/O operations during runtime. The software accesses this directory from $SMRT_ROOT/tmpdir and you can softlink this directory manually or using the install script. This directory should be a local directory (not NFS mounted) and be writable by the smrtanalysis user and exist as independent directories on all compute nodes.

Installation and Upgrade Summary

Please pay close attention as the upgrade procedure has changed.

The following instructions apply to fresh v2.1 installations and v2.0.1 to v2.1 upgrades only.

  • If you are using an older version of SMRT Analysis, you can either perform a fresh installation and manually import old SMRT Cells and jobs, or download and upgrade any intermediate versions (v1.4, v2.0.0, v2.0.1).

Step 1. Decide on a user and an installation directory for the SMRT Analysis software suite.

The SMRT Analysis install directory, $SMRT_ROOT, can be any directory as long as the smrtanalysis user has read, write, and execute permissions in that directory. Historically we have referred to $SMRT_ROOT as /opt/smrtanalysis.

We recommend that a system administrator create a special user called smrtanalysis, who belongs to the smrtanalysis group. This user will own all SMRT Analysis files, daemon processes, and smrtpipe jobs.

Step 2. Download the .run executable to the same level as $SMRT_ROOT and create $SMRT_ROOT.

  • Option 1: The SMRT Analysis user has sudo privileges.

    cd /opt
    wget http://path/to/smrtanalysis-os-version.run
    
    SMRT_ROOT=/opt/smrtanalysis
    sudo mkdir $SMRT_ROOT
    sudo chown smrtanalysis:smrtanalysis $SMRT_ROOT
    
  • Option 2: The SMRT Analysis user does not have sudo privileges. If you do not have sudo privileges, you can install SMRT Analysis as yourself in your home directory or any other directory you wish to use. However, you still must have root login credentials for the mysql database.

    cd /home/<your_username>
    wget <http://path/to/smrtanalysis-os-version.run>
    
    SMRT_ROOT=/home/<your_username>/smrtanalysis
    mkdir $SMRT_ROOT
    

Step 3. Run the installer or upgrade script and start services.

  • Option 1: If you are performing a fresh installation, run the installation script and start tomcat and kodos. See below for more details. cd /opt/ bash smrtanalysis-2.1.0.Current_Ubuntu-8.04.run --rootdir $SMRT_ROOT $SMRT_ROOT/admin/bin/tomcatd start $SMRT_ROOT/admin/bin/kodosd start

If you need to rerun the script and have already extracted the file, you can rerun using the --no-extract option:

bash smrtanalysis-2.1.0.Current_Ubuntu-8.04.run --rootdir $SMRT_ROOT --no-extract

You can see all other options by invoking the --help option:

bash smrtanalysis-2.1.0.Current_Ubuntu-8.04.run --help

  • Option 2: Please pay close attention as the upgrade procedure has changed. The new procedure requires running a script called smrtupdater from the old v2.0.1 smrtanalysis directory, which takes the path to the new v2.1 installer as an argument. See below for more details. IMPORTANT: If $SMRT_ROOT is a pre-existing symbolic link (e.g. /opt/smrtanalysis--> /opt/smrtanalysis-2.0.1), you must manually delete the softlink and create a new directory this time only. **IMPORTANT: Make sure you type SMRT_PATH_ORIG="$PATH" exactly as shown in the command below and do not replace it with a real path. Otherwise, the script will error out because it cannot find bash.

    /opt/smrtanalysis-2.0.1/etc/scripts/kodosd stop
    /opt/smrtanalysis-2.0.1/etc/scripts/tomcatd stop
    
    rm /opt/smrtanalysis
    mkdir /opt/smrtanalysis
    SMRT_PATH_ORIG=”$PATH” SMRT_ROOTDIR="/opt/smrtanalysis" bash /opt/smrtanalysis-2.0.1/admin/bin/smrtupdater /opt/smrtanalysis-2.1.0.Current_Ubuntu-8.04.run
    
    /opt/smrtanalysis/admin/bin/tomcatd start
    /opt/smrtanalysis/admin/bin/kodosd start
    

Note: For future upgrades beyond v2.1, we expect the upgrade command to be $SMRT_ROOT/admin/bin/smrtupdater /path/to/smrtanalysis-2.1.0.Current_Ubuntu-8.04.run

Step 4. New Installations only: Set up distributed computing

Decide on a job management system (JMS). See below for more details.

Step 5. New Installations only: Set up SMRT Portal

Register the administrative user and set up the SMRT Portal GUI. See below for more details.

Step 6. Verify the installation.

Run a sample SMRT Portal job to verify functionality. See below for more details.

Installation and Upgrade Details

Step 3, Option 1 Details: Run the Installation script and turn on services

The installation script attempts to discover inputs when possible, and performs the following:

  • Looks for valid hostnames (DNS) and IP Addresses. You must choose one from the list.
  • Assumes that the user running the script is the designated smrtanalysis user.
  • Installs the Tomcat web server. You will be prompted for:
    • The port number that the tomcat service will run under. (Default: 8080)
    • The port number that the tomcat service will use to shutdown. (Default: 8005)
  • Creates the smrtportal database in mysql. You will be prompted for:
    • The mysql administrative user name. (Default: root)
    • The mysql password. (Default: no password)
    • The mysql port number. (Default: 3306)
  • Attempts to configure the Job Management System (SGE, LSF, PBS, or NONE)
    • The $SGE_ROOT directory
    • The $SGE_CELL directory name
    • The $SGE_BINDIR directory that contains all the q-commands
    • The queue name
    • The parallel environment
  • Creates and configures special directories:
    • The $TMP directory
    • The $USERDATA directory

Step 3, Option 2 Details: Run the Upgrade Script

The upgrade script performs the following:

  • Checks that the same user is running the upgrade script
  • Checks for running services
  • Checks that the OS and hardware requirements are still met
  • Transfers computing configurations from a previous installation
  • Upgrades any references as necessary
  • Preserves SMRT Cells, jobs, and users from a previous installation by updating smrtportal database schema changes as necessary
  • Preserves special directories settings
    • Updates the $SMRT_ROOT/tmpdir softlink
    • Updates the $SMRT_ROOT/userdata softlink
  • The upgrade script does not port over protocols that were defined in previous versions of SMRT Analysis. This is because protocol files can vary a great deal between versions due to rapid code development and change. Please recreate any custom protocols you may have.

Step 4 Details: Set up Distributed Computing

Pacific Biosciences has explicitly validated Sun Grid Engine (SGE), and provide job submission templates for LSF and PBS. You only need to configure the software once during initial install.

Configuring Templates

The central component for setting up distributed computing in SMRT Analysis are the Job Management Templates, which provide a flexible format for specifying how SMRT Analysis communicates with the resident Job Management System (JMS). If you are using a non-SGE job managment system, you must create or edit the following files:

/opt/smrtanalysis/analysis/etc/cluster/<JMS>/start.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/interactive.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/kill.tmpl

Specifying the PBS Job Management System

PBS does not have a –sync option, so the interactive.tmpl file runs a script named qsw.py to simulate the functionality. You must edit both interactive.tmpl and start.tmpl.

  1. Change the queue name to one that exists on your system. (This is the –q option.)
  2. Change the parallel environment to one that exists on your system. (This is the -pe option.)
  3. Make sure that interactive.tmpl calls the –PBS option.

Specifying the LSF Job Management System

The equivalent SGE -sync option in LSF is -K and this should be provided with the bsub command in the interactive.tmpl file.

  1. Change the queue name to one that exists on your system. (This is the –q option.)
  2. Change the parallel environment to one that exists on your system. (This is the -pe option.)
  3. Make sure that interactive.tmpl calls the –K option.

Specifying other Job Management Systems

  1. Create a new directory smrtanalysis/current/analysis/etc/cluster/NEW_JMS.
  2. Edit smrtanalysis/current/analysis/etcsmrtpipe.rc, and change the CLUSTER_MANAGER variable to NEW_JMS
  3. Once you have a new JMS directory specified, create and edit the interactive.tmpl, start.tmpl, and kill.tmpl files for your particular setup.

Step 5 Details: (New Installations Only) Set Up SMRT® Portal

  1. Use your web browser to start SMRT Portal: http://hostname:port/smrtportal
  2. Click Register at the top right.
  3. Create a user named administrator (all lowercase). This user is special, as it is the only user that does not require activation on creation.
  4. Enter the user name administrator.
  5. Enter an email address. All administrative emails, such as new user registrations, will be sent to this address.
  6. Enter the password and confirm the password.
  7. Select Click Here to access Change Settings.
  8. To set up the mail server, enter the SMTP server information and click Apply. For email authentication, enter a user name and password. You can also enable Transport Layer Security.
  9. To enable automated submission from a PacBio® RS instrument, click Add under the Instrument Web Services URI field. Then, enter the following into the dialog box and click OK:
http://INSTRUMENT_PAP01:8081

INSTRUMENT_PAP01 is the IP address or name (pap01) of the instrument. 8081 is the port for the instrument web service.

  1. Select the new URI, then click Test to check if SMRT Portal can communicate with the instrument service.
  2. (Optional) You can delete the pre-existing instrument entry by clicking Remove.

Step 6: Verify the installation

Create a test job in SMRT Portal using the provided lambda sequence data. This is data from a single SMRT cell that has been down-sampled to reduce overall tarball size. If you are upgrading, this cell will already have been imported into your system, and you can skip to step 10 below.

Open your web browser and clear the browser cache:

  • Google Chrome: Choose Tools > Clear browsing data. Choose the beginning of time from the droplist, then check Empty the cache and click Clear browsing data.
  • Internet Explorer: Choose Tools > Internet Options > General, then under Browsing history, click Delete. Check Temporary Internet files, then click Delete.
  • Firefox: Choose Tools > Options > Advanced, then click the Network tab. In the Cached Web Content section, click Clear Now.

  1. Refresh the current page by pressing F5.
  2. Log into SMRT Portal by navigating to http://HOST:PORT/smrtportal.
  3. Click Design Job.
  4. Click Import and Manage.
  5. Click Import SMRT Cells.
  6. Click Add.
  7. Enter /opt/smrtanalysis/current/common/test/primary, then click OK.
  8. Select the new path and click Scan. You should get a dialog saying “One input was scanned."
  9. Click Design Job.
  10. Click Create New.
  11. Enter a job name and comment.
  12. Select the protocol RS_Resequencing.1.
  13. Under SMRT Cells Available, select a lambda cell and click the right-arrow button.
  14. Click Save on the bottom right, then click Start. The job should complete successfully.
  15. Click the SMRT View button. SMRT View should open with tracks displayed, and the reads displayed in the Details panel.

Optional Configurations

Set up Userdata folders

The userdata folder, $SMRT_ROOT/userdata, expands rapidly because it contains all jobs, references, and drop boxes. We recommend softlinking this folder to an external directory with more storage:

mv /opt/smrtanalysis/userdata /path/to/NFS/mounted/offline_storage
ln -s /path/to/NFS/mounted/offline_storage /opt/smrtanalysis/common/userdata

Bundled with SMRT® Analysis

The following are bundled within the application and should not depend on what is already deployed on the system.

  • Java® 1.7
  • Python® 2.7
  • Tomcat™ 7.0.23

Changes from SMRT® Analysis v2.0.1

See SMRT Analysis Release Notes v2.1 for changes and known issues. The latest version of this document resides on the Pacific Biosciences DevNet site; you can also link to it from the main SMRT Analysis web page.


For Research Use Only. Not for use in diagnostic procedures. © Copyright 2010 - 2013, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and the applicable license terms at http://www.pacificbiosciences.com/licenses.html. P/N 100-262-100