Rscript for calculating average PolII occupancy and FDR for RNA Pol II DamID datasets, based on original algorithms developed by Tony Southall. Modifications from the original method are described in detail in the sourcecode.
The script processes datafiles in gatc.bedgraph or gatc.gff format, such as those generated by damidseq_pipeline.
As of v1.2, polii.gene.call is multithreaded and should run up to five times faster on multicore systems.
If you find this software useful, please cite:
Marshall OJ and Brand AH. (2015) damidseq_pipeline: an automated pipeline for processing DamID sequencing datasets. Bioinformatics. 31(20):3371-3. doi: 10.1093/bioinformatics/btv386. (pubmed; full text, open access)
Southall TD, Gold KS, Egger B, Davidson CM, Caygill EE, Marshall OJ, Brand AH. (2013) Cell-type-specific profiling of gene expression and chromatin binding without cell isolation: assaying RNA Pol II occupancy in neural stem cells. Dev Cell. 26(1):101-12. doi: 10.1016/j.devcel.2013.05.020 (pubmed; full text, open access)
-
R
-
A GFF-formatted list of genes. A file for release 6 of the Drosophila genome is provided in the archive; most GFF annotation files should also work. Place this file in an accessible directory and use the --genes.file commandline switch to access it:
polii.gene.call --genes.file=/path/to/my-genes-anotation.gff
To install, copy the polii.gene.call executable into your path.
Run polii.gene.call as follows:
polii.gene.call [options] [list of gatc.bedgraph and/or gatc.gff files to process]
Each file will be processed separately, with the output being two files:
- [filename].genes.details.csv
A .csv table of all genes, together with average occupancy and FDR
- [filename].genes
A plain text list of all genes below the FDR threshold (default is 0.01; change with --FDR= commandline switch). These genes may be considered to represent the significantly transcribed genes within a genome.
For a list of all possible commandline options, use
polii.gene.call --help
Two different transcriptomes generated through this method may be compared using the polii.correlation.plot Rscript available from the damid_misc repository.
Rscript polii.correlation.plot [file1.genes.details.csv] [file2.genes.details.csv]
The output is both a graphical plot of differentially expressed genes, and a table listing the difference in mean log2(occupancy).
Although both are methods for transcriptional profiling, please be aware that there may be differences between these two methods. In particular, transcript abundancy as assessed through RNAseq will depend on transcript stability, whereas RNA pol II occupancy may provide a better indication of transcription levels.