A general toolkit for analyzing sequencing-based 'toeprinting' assays
Bo Li, Akshay Tambe, Sharon Aviran and Lior Pachter.
PROBer is a software to quantify chemical modification profiles for a general set of sequencing-based 'toeprinting' assays.
See INSTALL.md
To prepare reference sequence, you should run
PROBer prepare
. Run
PROBer prepare --help
to get usage information.
To estimate toeprinting parameters, you should run
PROBer estimate
. Run
PROBer estimate --help
to get usage information.
To allocate multi-mapping reads for iCLIP data, you should run PROBer iCLIP
. Run
PROBer iCLIP --help
to get usage information.
To simulate reads, you should run PROBer simulate
. Run
PROBer simulate --help
to get usage information.
PROBer can produce plots assessing the variation of its beta
estimates using a two step procedure: 1) multi-mapping reads are sampled using a collapsed Gibbs sampler; 2) For each transcript, the read counts are bootstrapped and the MAP estimates are re-estimated. Due to computational reasons, currently PROBer only provides variation plot for one transcript at a time.
To generate variation plots, you should turn on the --run-gibbs <directory>
option when you run PROBer estimate
.
Then for each transcript of interest, first run PROBer-bootstrap
:
Usage: PROBer-bootstrap reference_name input_dir transcript_name num_trials [--primer-length primer_length(default: 6)] [--size-selection-min min_frag_len(required)] [--size-selection-max max_frag_len(required)] [--read-length read_length] [--gamma-init gamma_init(default: 0.0001)] [--beta-init beta_init(default: 0.0001)] [-p number_of_threads] [--no-control] [--seed seed] [-q]
In the above command, input_dir
should be the same as the <directory>
in --run-gibbs
option. transcript_name
is the name of the transcript you are interested. This name should be exactly the same as the one documented in PROBer reference. num_trials
refers to the number of bootstrapping you want to perform (50 is recommended). -p
sets the number of threads, which should be the same as you used in PROBer estimate
. All other arguments/options have the same meanings as their counterparts in PROBer estimate
.
Lastly, run PROBer-generateVariationPlot
to generate plots:
Usage: PROBer-generateVariationPlot transcript_name estimates.beta bootstrap.txt percent start_position(1-based) end_position(1-based) output.pdf
In this command, transcript_name
should be identical to the one used in PROBer-bootstrap
. estimates.beta
should be the sample_name.beta
generated by PROBer estimate
. bootstrap.txt
should be <directory>/transcript_name.txt
. percent
is the percentage (between [0, 100]) used to draw error bars. For example, if percent = 90
, the 5th and 95th percentiles from pooled bootstrap estimates will be drawn as two boundaries of error bars. start_position
and end_position
are two 1-based transcript coordinates. Only positions within this interval will be plotted. Lastly, output.pdf
is the name of output pdf file.
Run
PROBer version
to get version information.
Suppose we have arabidopsis genome and gene annotation in two files: 'TAIR10_chr_all.fa' and 'TAIR10_GFF3_genes.gff'. We choose the reference name as 'arabidopsis' and are only interested in mRNA and rRNA. The data we have are single-end reads with read length 37bp, with minus channel reads in 'minus.fq' and plus channel reads in 'plus.fq'. The primer length is 6bp, the size selection range is from 21bp to 526bp. We use Bowtie aligner to align reads and assume Bowtie executables are under '/sw/bowtie'. We choose sample name as 'test_sample'. We use 40 cores. In the end, we simulate 10M single-end reads with output name 'test_sim'.
The commands are listed below:
PROBer prepare --gff3 TAIR10_GFF3_genes.gff --gff3-RNA-pattern mRNA,rRNA --bowtie --bowtie-path /sw/bowtie TAIR10_chr_all.fa arabidosis/arabidosis
PROBer estimate -p 40 --primer-length 6 --size-selection-min 21 --size-selection-max 526 --read-length 37 --bowtie-path /sw/bowtie arabidosis/arabidosis test_sample --reads plus.fq minus.fq
PROBer simulate arabidosis/arabidosis test_sample.temp/test_sample_minus.config test_sample minus 10000000 test_sim
PROBer simulate arabidosis/arabidosis test_sample.temp/test_sample_plus.config test_sample plus 10000000 test_sim
Bo Li wrote PROBer, with substaintial technical input from Akshay Tambe, Sharon Aviran and Lior Pachter.
Thanks Harold Pimentel and Páll Melsted for their help on CMake, website and markdown documents.
A small part of this project's codes are adopted from RSEM.
This project uses the Boost C++ and samtools libraries.
PROBer is licensed under the GNU General Public License v3.