Skip to content

owenjm/polii.gene.call

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

polii.gene.call

Rscript for calculating average PolII occupancy and FDR for RNA Pol II DamID datasets, based on original algorithms developed by Tony Southall. Modifications from the original method are described in detail in the sourcecode.

The script processes datafiles in gatc.bedgraph or gatc.gff format, such as those generated by damidseq_pipeline.

Recent changes

As of v1.2, polii.gene.call is multithreaded and should run up to five times faster on multicore systems.

Citation

If you find this software useful, please cite:

Marshall OJ and Brand AH. (2015) damidseq_pipeline: an automated pipeline for processing DamID sequencing datasets. Bioinformatics. 31(20):3371-3. doi: 10.1093/bioinformatics/btv386. (pubmed; full text, open access)

Southall TD, Gold KS, Egger B, Davidson CM, Caygill EE, Marshall OJ, Brand AH. (2013) Cell-type-specific profiling of gene expression and chromatin binding without cell isolation: assaying RNA Pol II occupancy in neural stem cells. Dev Cell. 26(1):101-12. doi: 10.1016/j.devcel.2013.05.020 (pubmed; full text, open access)

Requirements

  1. R

  2. A GFF-formatted list of genes. A file for release 6 of the Drosophila genome is provided in the archive; most GFF annotation files should also work. Place this file in an accessible directory and use the --genes.file commandline switch to access it:

    polii.gene.call --genes.file=/path/to/my-genes-anotation.gff

Installation

To install, copy the polii.gene.call executable into your path.

Usage

Run polii.gene.call as follows:

polii.gene.call [options] [list of gatc.bedgraph and/or gatc.gff files to process]

Each file will be processed separately, with the output being two files:

  1. [filename].genes.details.csv
A .csv table of all genes, together with average occupancy and FDR
  1. [filename].genes
A plain text list of all genes below the FDR threshold (default is 0.01; change with --FDR= commandline switch).  These genes may be considered to represent the significantly transcribed genes within a genome.

For a list of all possible commandline options, use

polii.gene.call --help

Downstream data processing

Two different transcriptomes generated through this method may be compared using the polii.correlation.plot Rscript available from the damid_misc repository.

	Rscript polii.correlation.plot [file1.genes.details.csv] [file2.genes.details.csv]

The output is both a graphical plot of differentially expressed genes, and a table listing the difference in mean log2(occupancy).

Differences between RNA pol II DamID and RNAseq

Although both are methods for transcriptional profiling, please be aware that there may be differences between these two methods. In particular, transcript abundancy as assessed through RNAseq will depend on transcript stability, whereas RNA pol II occupancy may provide a better indication of transcription levels.

About

Calculation of average occupancy and FDR for RNA pol II DamID datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages