SMRT Analysis Software Installation v2.2.0

pb-jlandolin edited this page Jul 17, 2014 · 78 revisions
Clone this wiki locally

What's New?

Reorganized Directory Structure

Starting with SMRT Analysis v2.1.0, a new directory structure is being employed. Instead of the environment variable $SEYMOUR_HOME, $SMRT_ROOT is defined as the top-level directory of the SMRT Analysis installation. Please ensure that these variables are not defined explicitly in any setup.sh files or elsewhere, such as in user .bash* files, /etc/profile, or scripts in /etc/profile.d/. Although not a strict requirement, we recommend SMRT_ROOT=/opt/smrtanalysis/.

Below is a typical directory hierarchy of $SMRT_ROOT ("->" = symbolic link):

/opt/smrtanalysis
├── admin -> current/admin
│   ├── bin
│   └── log
├── current -> install/smrtanalysis-2.2.0.133377
├── install
│   ├── smrtanalysis-2.1.0.128013
│   ├── smrtanalysis-2.1.1.128514
│   ├── smrtanalysis-2.1.1.128514-patch-0.1
│   ├── smrtanalysis-2.2.0.133377
│   ├── smrtanalysis-2.2.0.133377-patch-1.134216
│   └── smrtanalysis-2.2.0.133377-patch-2.134913
├── README
├── scripts
├── tmpdir -> /tmp
└── userdata -> /path/to/NFS/mounted/offline_storage
    ├── database
    ├── inputs_dropbox
    ├── jobs
    ├── jobs_archive
    ├── jobs_dropbox
    ├── log
    ├── references
    ├── references_dropbox
    ├── runtime
    └── shared_dir

Embedded SMRT Portal Application Database

Pre-built binaries for MySQL Server are now bundled with the SMRT Analysis suite, providing a standalone, isolated environment, free of external system dependencies for the SMRT Portal application database. This new architecture allows a more seamless installation and upgrade process and provides an additional measure of data security with automated schema backups.

Upon install/upgrade, the default behavior is to embed the database. All data from the remote database will be migrated to the embedded server.

The default behavior can overridden by using the --no-bundled-db option during install/upgrade. However, users opting to run an external MySQL instance may have limited support options, and are discouraged from doing so.

Bundled Software

SMRT Analysis includes the following third-party software packages bundled within the application; the application should not depend on what is already deployed on the system.

  • Apache Tomcat™ 7.0.23
  • Celera® Assembler 8.1
  • Docutils 0.8.1
  • GMAP (2014-01-21)
  • HMMER 3.1b1 (May 2013)
  • Java™ SE Runtime Environment (build 1.7.0_02-b13)
  • Mono 3.0.7
  • MySQL® 5.1.73
  • Perl v5.8.8
  • Python® 2.7.3
  • SAMtools 0.1.17
  • Scala 2.9.0 RC3

Note: GATK and associated executables are no longer included.

Release Notes

See:

You can find the latest version of this document on the Pacific Biosciences DevNet site; you can also link to it from the main SMRT Analysis web page.

Getting Started

Note: This section contains a summary of the commands used for a quick installation.

  • Use these commands only if you are familiar with the installation/upgrade process.
  • Proceed to the Installation Guide for the more detailed procedure.

Download SMRT Analysis

Download SMRT Analysis from PacBio DevNet (http://www.pacbiodevnet.com):

wget https://s3.amazonaws.com/files.pacb.com/software/smrtanalysis/2.2.0/smrtanalysis-2.2.0.133377.run
wget https://s3.amazonaws.com/files.pacb.com/software/smrtanalysis/2.2.0/smrtanalysis-2.2.0.133377-patch-3.run

Installation Summary

  SMRT_ROOT=/opt/smrtanalysis
  sudo mkdir $SMRT_ROOT
  sudo chown smrtanalysis:smrtanalysis $SMRT_ROOT

  su -l smrtanalysis
  smrtanalysis-2.2.0.133377.run -p smrtanalysis-2.2.0.133377-patch-3.run --rootdir $SMRT_ROOT

  $SMRT_ROOT/admin/bin/smrtportald-initd start
  $SMRT_ROOT/admin/bin/kodosd start

Upgrade Summary

 su -l smrtanalysis
 SMRT_ROOT=/opt/smrtanalysis
 $SMRT_ROOT/admin/bin/smrtportald-initd stop 
 $SMRT_ROOT/admin/bin/smrtupdater -- -p smrtanalysis-2.2.0.133377-patch-3.run smrtanalysis-2.2.0.133377.run
 $SMRT_ROOT/admin/bin/smrtportald-initd start

Once SMRT Portal is installed, proceed to the following sections to complete setup:

  1. Set up SMRT Portal (for new installations only)
  2. Verify the Installation (for new installations and upgrades)

Patch Summary

These two commands must be run as smrtanalysis user.

  SMRT_ROOT=/opt/smrtanalysis
  $SMRT_ROOT/admin/bin/smrtportald-initd stop

These two commands must be run as root (e.x. using sudo). Skip these commands if the files do not exist.

  sudo rm /tmp/mysql_XXXXX.sock 
  sudo rm $SMRT_ROOT/userdata/database/../../error.log 

These two commands must be run as smrtanalysis user.

  $SMRT_ROOT/admin/bin/smrtupdater smrtanalysis-2.2.0.133377-patch-3.run
  $SMRT_ROOT/admin/bin/smrtportald-initd start

Installation Guide

System Requirements

Hardware Guidelines

Submit Host

  • Minimum 8 cores, with 2 GB RAM per core.
  • Minimum 250 GB of disk space.

Execution Hosts

  • Minimum of 3 nodes. We recommend 5 nodes for high utilization focused on de novo assemblies.
  • Minimum of 8 cores per node, with 2 GB RAM per core. We recommend 16 cores per node with 4 GB RAM per core.
  • Minimum of 250 GB of disk space per node.
  • To perform de novo assembly of large genomes using Celera® Assembler, one of the nodes will need to have considerably more memory. See the Celera® Assembler home page for recommendations: http://wgs-assembler.sourceforge.net/.

For more information, see What computing infrastructure is compatible with SMRT Analysis?

Notes:

  • It is possible, but not advisable, to install SMRT Analysis on a single-node machine (see the distributed computing section). You will likely be able to submit jobs one SMRT Cell at a time, but the time to completion may be long as the software may not have sufficient resources to complete the job.

  • The RS_ReadsOfInsert protocol can be compute-intensive. If you plan to run it on every SMRT Cell, we recommend adding 3 additional 8-core compute nodes with at least 4 GB of RAM per core.

Software Prerequisites

Operating Systems

  • SMRT Analysis is supported on:

    • English-language Ubuntu: versions 12.04, 10.04, 8.04
    • English-language RedHat/CentOS: versions 6.3, 5.6, 5.3
  • SMRT Analysis cannot be installed on Mac OS® or Windows® systems.

Software Dependencies

  • Bash
  • Linux Standard Base (LSB)

These are usually installed by default on most systems. If necessary, use the following commands to ensure that these packages are installed.

CentOS:

sudo yum groupinstall "Development Tools"
sudo yum install redhat-lsb

Ubuntu:

sudo apt-get install build-essential lsb-release

Client Web Browser

We recommend using the Google Chrome® 21 web browser to run SMRT Portal for consistent functionality. We also support Apple’s Safari® and Internet Explorer® web browsers; however some features may not be optimized on these browsers.

Client Java

To run SMRT View, we recommend:

  • Oracle Java: Java Version 7 Update 45 or later for Linux, Windows, and Mac OS X.
  • Apple Java: Java for OS X 2013-004 (1.6.0_51-b11-457-10M4509) or later.

Network Configuration

Please refer to the IT Site Prep guide provided with your instrument purchase for more details.

See also What data storage is compatible with SMRT Analysis?

Data Storage

  • 10 TB (Actual storage depends on usage.)

  • The SMRT Analysis software directory (we recommend $SMRT_ROOT=/opt/smrtanalysis) must have the same path and be readable by the smrtanalysis user across all compute nodes via NFS.

  • The SMRT Cell input directory (we recommend $SMRT_ROOT/pacbio_instrument_data/) must have the same path and be readable by the smrtanalysis user across all compute nodes via NFS. This directory contains data from the instrument and can either be a directory configured by RS Remote during instrument installation, or a directory you created when you received data from a core lab.

  • The SMRT Analysis output directory (we recommend $SMRT_ROOT/userdata) must have the same path and be writable by the smrtanalysis user across all compute nodes via NFS. This directory is usually softlinked to a large storage volume.

  • The SMRT Analysis temporary directory is used for fast I/O operations during runtime. The software accesses this directory from $SMRT_ROOT/tmpdir and you can softlink this directory manually or using the install script. This directory should be a local directory (not NFS-mounted) and be writable by the smrtanalysis user and exist as independent directories on all compute nodes.

Cluster Configuration

Pacific Biosciences has explicitly validated Sun Grid Engine (SGE), and provides job submission templates for LSF and PBS. You only need to configure the software once during initial install.

Installation Details

Downloading SMRT Analysis

Download SMRT Analysis from PacBio DevNet (http://www.pacbiodevnet.com):

wget https://s3.amazonaws.com/files.pacb.com/software/smrtanalysis/2.2.0/smrtanalysis-2.2.0.133377.run

Download the latest patch available for your version:

wget https://s3.amazonaws.com/files.pacb.com/software/smrtanalysis/2.2.0/smrtanalysis-2.2.0.133377-patch-3.run

Create the SMRT Analysis User

We recommend that a system administrator create a special user called smrtanalysis, who belongs to the smrtanalysis group. This user will own all SMRT Analysis files, daemon processes, and smrtpipe jobs.

Create the Installation Path

The SMRT Analysis top-level directory, $SMRT_ROOT, can be any directory as long as the smrtanalysis user has read, write, and execute permissions in that directory. Historically, we referred to $SMRT_ROOT as /opt/smrtanalysis.

If the parent directory $SMRT_ROOT is not writable by the SMRT Analysis user, the $SMRT_ROOT directory must be pre-created with read/write/execute permissions for the SMRT Analysis user.

Run the Installer

The installation script attempts to discover inputs when possible, and performs the following configurations:

  1. Confirms valid non-root user that will own SMRT Pipe jobs and daemon processes.
  2. Performs system hardware, OS, and software prerequisite check.
  3. Identifies valid host names and IP addresses recognized by DNS.
  4. Tomcat web server main port and shutdown port numbers.
  5. Creates and verifies symbolic links to TMP and USERDATA directories.
  6. MySQL server settings and initializes SMRT Portal database.
  7. Distributed/non-distributed SMRT Pipe jobs
  8. Job Management System and related parameters for queues and parallel environments.

Option 1: The SMRT Analysis user has sudo privileges.

For example, if $SMRT_ROOT is /opt/smrtanalysis, /opt is only writable by root, and the SMRT Analysis user is smrtanalysis belonging to the group smrtanalysis.

  SMRT_ROOT=/opt/smrtanalysis
  sudo mkdir $SMRT_ROOT
  sudo chown smrtanalysis:smrtanalysis $SMRT_ROOT

Option 2: The SMRT Analysis user does not have sudo privileges.

For example, if you do not have sudo privileges, you can install SMRT Analysis as yourself in your home directory.

  SMRT_ROOT=/home/<your_username>/smrtanalysis
  mkdir $SMRT_ROOT
  chown smrtanalysis:smrtanalysis $SMRT_ROOT
  smrtanalysis-2.2.0.133377.run -p smrtanalysis-2.2.0.133377-patch-3.run --rootdir $SMRT_ROOT

If you cancelled out of the install prompt and want to rerun the script without extracting again, you can rerun using the --no-extract option:

  smrtanalysis-2.2.0.133377.run -p smrtanalysis-2.2.0.133377-patch-3.run --rootdir $SMRT_ROOT --no-extract

Apply Patches During Installation

If installing after a patch has been released for the software, you can install both the software and the patch in one command using the -p option:

  smrtanalysis-2.2.0.133377.run -p smrtanalysis-2.2.0.133377-patch-3.run --rootdir $SMRT_ROOT

Set up Distributed Computing

Configuring Job Submission Templates

Distributed computing is configured by editing three template files:

$SMRT_ROOT/current/analysis/etc/cluster/<JMS>/start.tmpl
$SMRT_ROOT/current/analysis/etc/cluster/<JMS>/interactive.tmpl
$SMRT_ROOT/current/analysis/etc/cluster/<JMS>/kill.tmpl
Specifying the SGE Job Management System

The install script will automatically discover the queue name and parallel environment name based on the SGE installed on your system. If you want to configure or add options to the qsub command, you must edit the .tmp files manually. For example, the default interactive.tmpl looks like the following:

qsub -pe smp ${NPROC} -S /bin/bash -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} ${EXTRAS} ${CMD}

If you are assembling large genomes (>100 Mb) and wish to use the job distribution functionality within Celera Assembler, you must make sure the parallel environment is configured to use the $pe_slots allocation rule. For example, the smp parallel environment is configured as follows:

$ qconf -sp smp
pe_name            smp
slots              99999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
Specifying the PBS Job Management System

PBS does not have a –sync option, and the interactive.tmpl file runs a script named qsw.py to simulate the functionality. You must edit both interactive.tmpl and start.tmpl.

  1. Change the queue name to one that exists on your system. (This is the –q option.)
  2. Change the parallel environment to one that exists on your system. (This is the -pe option.)
  3. Make sure that interactive.tmpl calls the –PBS option.
Specifying the LSF Job Management System

The equivalent SGE -sync option in LSF is -K and this should be provided with the bsub command in the interactive.tmpl file.

  1. Change the queue name to one that exists on your system. (This is the –q option.)
  2. Change the parallel environment to one that exists on your system. (This is the -pe option.)
  3. Make sure that interactive.tmpl calls the –K option.
Specifying other Job Management Systems
  1. Create a new directory $SMRT_ROOT/current/analysis/etc/cluster/NEW_JMS.
  2. Edit $SMRT_ROOT/current/analysis/etcsmrtpipe.rc, and change the CLUSTER_MANAGER variable to NEW_JMS.
  3. Once you have a new JMS directory specified, create and edit the interactive.tmpl, start.tmpl, and kill.tmpl files for your particular setup.

Start the SMRT Analysis Services

Start the MySQL and Tomcat Daemons

The following command will start both Tomcat and MySQL. You should use this command to restart services.

$SMRT_ROOT/admin/bin/smrtportald-initd start

MySQL and Tomcat can also be controlled individually for troubleshooting purposes:

$SMRT_ROOT/admin/bin/mysqld start
$SMRT_ROOT/admin/bin/tomcatd start

You can check that the services are on or off using the ps command:

ps -ef | grep tomcat
ps -ef | grep mysql

Start the Kodos Daemon

$SMRT_ROOT/admin/bin/kodosd start

You can check that the services are on or off using the ps command:

ps -ef | grep kodos

Set Up SMRT Portal

Register the administrative user and set up the SMRT Portal GUI:

  1. Use a web browser to launch SMRT Portal: http://hostname:port/smrtportal
  2. Click Register at the top right.
  3. Create a user named administrator (all lowercase). This user is special, as it is the only user that does not require activation on creation.
  4. Enter the user name administrator.
  5. Enter an email address. All administrative emails, such as new user registrations, are sent to this address.
  6. Enter, then confirm the password.
  7. Select Click Here to access Change Settings.
  8. To set up the mail server, enter the SMTP server information and click Apply. For email authentication, enter a user name and password. You can also enable Transport Layer Security.
  9. To enable automated submission from a PacBio instrument, click Add under the Instrument Web Services URI field. Then, enter the following into the dialog box and click OK: http://INSTRUMENT_PAP01:8081
    • INSTRUMENT_PAP01 is the IP address or name (pap01) of the instrument.
    • 8081 is the port for the instrument web service.
  10. Select the new URI, then click Test to check if SMRT Portal can communicate with the instrument service.
  11. (Optional) You can delete the pre-existing instrument entry by clicking Remove.

Verify the Installation

Create a test job in SMRT Portal using the provided lambda sequence data. This is data from a single SMRT Cell that has been down-sampled to reduce overall tarball size. If you are upgrading, this cell will already have been imported into your system, and you can skip to step 10 below.

Open your web browser and clear the browser cache:

  • Google Chrome: Choose Tools > Clear browsing data. Choose the beginning of time from the droplist, then check Empty the cache and click Clear browsing data.
  • Internet Explorer: Choose Tools > Internet Options > General, then under Browsing history, click Delete. Check Temporary Internet files, then click Delete.
  • Firefox®: Choose Tools > Options > Advanced, then click the Network tab. In the Cached Web Content section, click Clear Now.

  1. Refresh the current page by pressing F5.
  2. Navigate to SMRT Portal at http://HOST:PORT/smrtportal, then log in.
  3. Click Design Job.
  4. Click Import and Manage.
  5. Click Import SMRT Cells.
  6. Click Add.
  7. Enter common/test/primary, then click OK.
  8. Select the new path and click Scan. You should get a dialog saying “One input was scanned."
  9. Click Design Job.
  10. Click Create New.
  11. Enter a job name and comment.
  12. Select the protocol RS_Resequencing.1.
  13. Under SMRT Cells Available, select a lambda cell and click the right-arrow button.
  14. Click Save on the bottom right, then click Start. The job should complete successfully.
  15. Click the SMRT View button. SMRT View should open with tracks displayed, and the reads displayed in the Details panel.

Optional Configurations

Set Up User Data Directory

The user data folder, $SMRT_ROOT/userdata, expands rapidly because it contains all jobs, references, and drop boxes. We recommend softlinking this folder to an external directory with more storage:

mv $SMRT_ROOT/userdata /path/to/NFS/mounted/offline_storage
ln -s /path/to/NFS/mounted/offline_storage $SMRT_ROOT/userdata

Upgrade Details

Supported Upgrade Path

  • For SMRT Analysis v2.2.0, only upgrades directly from v2.1.1 or v2.1.0 are supported.

  • SMRT Analysis does not support upgrades from SMRT Analysis v2.0.1 or earlier. The recommended upgrade path is to incrementally upgrade to each version, that is:

    1.4 -> 2.0.0 -> 2.0.1 -> 2.1.0 -> 2.2.0

Alternately, you may opt for a fresh installation of SMRT Analysis v2.2.0 and then manually import old SMRT Cells and jobs to preserve analysis history.

See Official Documentation for upgrading from earlier versions of SMRT Analysis:

Run the Upgrader

Upgrades are handled by the script smrtupdater located in $SMRT_ROOT/admin/bin/smrtupdater. The script performs the following:

  1. Confirms valid non-root user that will own SMRT Pipe jobs and daemon processes.
  2. Check for running services, and stops them if needed.
  3. Performs system hardware, OS, and software prerequisite check.
  4. Transfers computing configurations from the previous installation.
  5. Reference Repository Upgrade Check.
  6. Confirms and validates symbolic links to TMP and USERDATA directories.
  7. MySQL Database Upgrade.

  • The upgrade script does not port over protocols that were defined in previous versions of SMRT Analysis. This is because protocol files can vary a great deal between versions due to rapid code development and change. Please recreate any custom protocols you may have.

    $SMRT_ROOT/admin/bin/smrtupdater smrtanalysis-2.2.0.133377.run
    

Applying Patches During Upgrade

If you are upgrading after a patch has been released for the software, you can upgrade both the software and the patch in one command using the -- -p option. This uses the "-p" option in smrtanalysis-2.2.0.133377.run by passing it via the "--" option in smrtupdater.

 $SMRT_ROOT/admin/bin/smrtupdater -- -p smrtanalysis-2.2.0.133377-patch-3.run smrtanalysis-2.2.0.133377.run

Start the SMRT Analysis Services

Start the MySQL and Tomcat Daemons

$SMRT_ROOT/admin/bin/smrtportald-initd start

Start the Kodos Daemon

$SMRT_ROOT/admin/bin/kodosd start

Known Install Problems and Workarounds

Remote Storage Issues

In several installations, problems have been encountered with the mysql portion of the install due to the inability of the mysql scripts to change ownership (and possibly to change perm issions) of files in $SMRT_ROOT/userdata/runtime/tmp.  In each case, userdata was linked to remote NFS storage where the problem could be demonstrated with simple tests like creating a temporary file and running chown on it.  The best method to resolve this problem is to fix the storage issue, but the following workaround can be used instead:


SMRT_ROOT=<customer_specific>
# you can actually put these new directories anywhere on the head node
# local filesystem but these are shown as an example
SMRT_DB=$SMRT_ROOT/../smrtanalysis_db
SMRT_RTTMP=$SMRT_ROOT/../smrtanalysis_runtime_tmp
SAUSER=<smrtanalysis_user>
SAGRP=<smrtanalysis_group>
sudo mkdir $SMRT_DB; sudo chown $SAUSER:$SAGRP  $SMRT_DB
sudo mkdir $SMRT_RTTMP; sudo chown $SAUSER:$SAGRP  $SMRT_RTTMP

# replace old directories with links to these new ones
# note that this is safe to do only because this database
# directory is new with the 2.2 install and you have not
# yet finished a 2.2. install or used 2.2 yet
sudo rm -rf $SMRT_ROOT/userdata/database
sudo rm -rf $SMRT_ROOT/userdata/runtime/tmp

# then as the  <smrtanalysis_user>:
ln -s $SMRT_DB $SMRT_ROOT/userdata/database
ln -s $SMRT_RTTMP $SMRT_ROOT/userdata/runtime/tmp

#From there, you should be able to execute the install or upgrade as shown above.

# the following is probably not necessary due to the way that we resolve paths in our scripts,
# but this will cleanup broken links created during the install
rm $SMRT_ROOT/userdata/database/mysql/log
rm $SMRT_ROOT/userdata/database/mysql/runtime
ln -s $SMRT_ROOT/userdata/log $SMRT_ROOT/userdata/database/mysql/log
ln -s $SMRT_ROOT/userdata/runtime $SMRT_ROOT/userdata/database/mysql/runtime

ACL Problems

If you use ACLs in the SMRT_ROOT or any of the linked storage, you may have obscure install or execution problems if the "smrtanalysis" user does not have full permissions. For example, we have seen cases that failed in the middle of an install due to the inability to copy a file with "cp -a" in some of the install scripts. If you suspect ACL related probelms, try disabling them and retrying.

Advanced Deployment

Using Amazon Web Services

Users wishing to run SMRT Analysis in the cloud can use an Amazon Machine Image (AMI) with SMRT Analysis pre-installed. For details, see:

"Installing" SMRT Portal the easy way - Launching a SMRT Portal AMI.


For Research Use Only. Not for use in diagnostic procedures. © Copyright 2010 - 2014, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and the applicable license terms at http://www.pacificbiosciences.com/licenses.html. P/N 100-321-100-03