The R-kinetics Project
This project provides a vignette that describes how to perform kinetic analyses using the R suite of tools provided by Pacific Biosciences. The aim of this project is to introduce users to Pacific Bioscience's data - specifically kinetics data that is collected when performing a sequencing experiment. In addition to describing the data structures, we provide instructions on using the pbh5 R API to explore the rich set of information provided.
analysis.pdf: This document provides an overview of the API functionality as well as instructions/guidelines on how to perform a kinetics analysis using PacBio data.
Src: contains example code/scripts to perform modification detection
Data: contains two example datasets which have been generated by Pacific Biosciences with known modifications to facilitate development of detection algorithms. These datasets are identical to those that one would obtain by performing a sequencing run with a PacBio instrument.
ReferenceRepository: contains the two reference sequences used during this analysis.
This setup has been tested on Ubuntu 10.04. Realistically, all this should work with any recent version of Linux or Mac OS X. All instructions here assume that you want to install things system-wide; if that is not the case, then you'll probably need to install/build some things locally.
The pbh5 package which provides all of the important tools for accessing the data and conducting the analysis can be run on stock R (>= 2.10) with only the h5r as a dependency. Therefore if one wants to get started with stock R, then one can install just the h5r and pbh5 packages (steps 2 and 3 below).
1.) Obtaining a Recent Version of R, i.e., R >= 2.11.0
The default R on most package-managed linuxes is relatively old. The best way to get a new version is to update R using your package management system. Clear instructions for different version of Linux can be found here:
For Ubuntu lucid (10.04), add the following line to /etc/apt/sources.list
deb http://cran.fhcrc.org/bin/linux/ubuntu lucid/
sudo apt-get update sudo apt-get install r-base
2.) Obtaining the HDF5 Libraries
We need to install both the headers and shared libraries to access HDF5 file files.
sudo apt-get install libhdf5-serial-1.8.4 libhdf5-serial-dev
3.) Installing h5r and pbh5 Packages
The easiest way to install h5r is via the "install.packages" function from within R., i.e.,
Following successfull installation of the h5r package, we install the pbh5 R package using the following:
wget https://github.com/PacificBiosciences/R-pbh5/zipball/master -O pbh5.zip && unzip pbh5.zip sudo R CMD INSTALL PacificBiosciences-R-pbh5*
4.) Installing Additional R Packages
A couple of other packages are necessary to execute the analysis.Rnw document. These are reasonably standard R packages and should install straightforwardly.
> install.packages(c("ggplot2", "xtable")) > source("http://bioconductor.org/biocLite.R") > biocLite("Biostrings")
5.) Installing pbutils R Package
Finally, we provide some useful functions that aren't specific to our data structures.
wget https://github.com/PacificBiosciences/R-pbutils/zipball/master -O pbutils.zip && unzip pbutils.zip && sudo R CMD INSTALL PacificBiosciences-R-pbutils*
6.) Running the Document
In order to generate analysis.pdf, one simply should execute make. This will download the data and then run the code in analysis.Rnw.
At this point, a new analysis.pdf should have been generated. What can often fail is the generation of a pdf from the input analysis.tex. In order to generate the pdf, one will need to install pdflatex, this might correspond to too many packages that a sysadmin would want to install. In any case, a user can simply march through the document and run the individual code snippets.