Skip to content

fredpdavis/opticlobe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

opticlobe

This package contains code to analyze RNA-seq data and generate figures and tables presented in:

A genetic, genomic, and computational resource for exploring neural circuit function
Davis FP*, Nern A*, Picard S, Reiser MB, Rubin GM, Eddy SR, Henry GL.
bioRxiv 2018

Version: updated Aug 27, 2019 to reflect latest preprint version

Contact fredpdavis@gmail.com with any questions.

Package contents

  • src - code, organized by language, to turn RNA-seq read files into figures.
  • metadata - text tables describing RNA-seq samples
  • data - text data files used by code

Requirements

Data

The data used by this package comes from several sources. We include nearly all data files expected by the R and LSF shell programs, along with README files describing the contents. The only exceptions are large files (eg, genome sequence, gene annotations, transcript sequences, RNA-seq alignment indices), which we do not provide but describe in README files how to obtain or build.

Data Source
RNA-seq FASTQ files GEO accession GSE116969
genome sequence ENSEMBL release 91; based on BDGP6 (dm6)
gene structures ENSEMBL release 91; based on FlyBase 2017_04
FlyBase gene groups FlyBase 2018_02

We also used data from the following papers:

Software

We used the following software on an LSF linux compute cluster.

# Software Source
1 seqtk (2012 Oct 16) https://github.com/lh3/seqtk
2 kallisto 0.43.1 https://pachterlab.github.io/kallisto/
3 STAR 2.5.3c https://github.com/alexdobin/STAR
4 bedtools 2.15.0 http://bedtools.readthedocs.io/
5 samtools 0.1.19 https://github.com/samtools/samtools
6 UCSC bedGraphToBigWig http://hgdownload.cse.ucsc.edu/admin/exe
7 picard 1.9.1 http://broadinstitute.github.io/picard/
8 R v3.3.1 https://www.r-project.org/
9 RStan/Stan http://mc-stan.org/users/interfaces/rstan

Analysis overview

The analysis includes three parts: (1) RNA-seq read processing, (2) modeling gene expression states, and (3) generating figures and tables.

1. RNA-seq read processing

This step is implemented in an LSF script that trims the reads (seqtk), pseudo-aligns them to the transcriptome (kallisto) and aligns them to the genome (STAR), evaluates quality metrics (picard), and generates bigwig tracks for visualization (samtools, ucsc kent tools)

bsub < process_sample.lsf.sh

2. Modeling gene expression states

We implemented this step in an R routine that uses the RStan interface to the Stan statistical inference engine to model gene expression states. This step uses an LSF compute cluster for parallelization.

The routine assume 200 cluster nodes with 4 CPUs each (values specified in the setSpecs() routine). On our cluster, this step takes ~ 3 hours in total.

source("../../src/R/analyzeOpticLobeExpr.R")
dat <- main(returnData = TRUE)
runModel(dat, mode="submit")
dat$pFit <- runModel(dat, mode="merge")
dat <- main(dat,returnData = TRUE)
saveRDS(dat, paste0(dat$specs$outDir,"/full_dataset.rds"))

3. Generating figures and tables

An R routine generates all the figures in final form, except for a few where we manually repositioned labels for clarity

makeFigs(dat)

About

Code accompanying Davis, Nern et al., 2018

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published