In order to quantify aligned reads, they must be counts to a reference transcriptome. This will tell you in relative terms how much of each transcript is expressed in a system. The following sub-module will perform this quantification, as well as compile all sample quantifications into a single data matrix for downstream use.
The help menu can be accessed by calling the following from the command line:
$ xpresspipe count --help
Required Arguments | Description |
---|---|
:data:`-i \<path\>, --input \<path\>` | Path to input directory of SAM files |
:data:`-o \<path\>, --output \<path\>` | Path to output directory |
:data:`-g \</path/transcripts.gtf\>`, :data:`--gtf \</path/transcripts.gtf\>` | Path and file name to GTF used for alignment quantification (if a modified GTF was created, this should be provided here; if using Cufflinks and you want isoform abundance estimates, important that you do not provide a longest transcript only GTF) |
Optional Arguments | Description |
---|---|
:data:`--suppress_version_check` | Suppress version checks and other features that require internet access during processing |
:data:`-e <experiment_name>`, :data:`--experiment <experiment_name>` | Experiment name |
:data:`-c <method>`, :data:`--quantification_method <method>` | Specify quantification method (default: htseq; other option: cufflinks. If using Cufflinks, no downstream sample normalization is required) |
:data:`--feature_type \<feature\>` | Specify feature type (3rd column in GTF file) to be used if quantifying with htseq (default: CDS) |
:data:`--stranded \<fr-unstranded/fr-firststrand` :data:`/fr-secondstrand||no/yes\>` | Specify whether library preparation was stranded (Options before || correspond with Cufflinks inputs, options after correspond with htseq inputs) |
:data:`--deduplicate` | Include flag to quantify reads with de-duplication (will search for files with suffix :data:`_dedupRemoved.bam`) |
:data:`--bam_suffix <suffix>` | Change from default suffix of _Aligned.sort.bam |
:data:`-m <processors>, --max_processors <processors>` | Number of max processors to use for tasks (default: No limit) |
- Input points to directory with SAM alignment files that are sorted by name
- An experiment name is provided to name the final data matrix
- Reads are quantified only to coding genes and are not counted if mapping to the first x nucleotides of each transcript exon 1 (x being the value provided for truncation when initially creating the reference files)
$ xpresspipe count -i riboseq_out/alignments/ -o riboseq_out/ -r se_reference/ -g se_reference/transcripts_codingOnly_truncated.gtf -e se_test
- Input points to directory with SAM alignment files that are sorted by name
- An experiment name is not provided and a default name is given to the data matrix using datatime
- Reads are quantified to the entire transcriptome (coding and non-coding, no truncation)
$ xpresspipe count -i pe_out/alignments/ -o pe_out/ -r pe_reference/