GitHub - Guangyu-Yang/JULiP: An efficient model for accurate intron selection from multiple RNA-seq samples

JULiP - An efficient model for accurate intron selection from multiple RNA-seq samples

Includes portions copyright from:

Note: for the negative binomial version of JULiP, including the differential splicing testing, please go to https://github.com/splicebox/JULiP

What is JULiP?

JULiP selects a comprehensive and accurate set of introns simultaneously from multiple RNA-seq samples.

JULiP produces a reliable set of introns in three stages. Stage 1 collects the introns from multiple samples, extracted from spliced read alignments, filtering low quality introns. Stage 2 divides the introns across the genome into 'gene' regions. Stage 3 uses a Lasso-regularized program to determine a set of introns for each gene region.

Usage

Sequential version:

Usage: java -jar julip.jar [options]  
Options:  
 -model VAL       : generateSplicesAndRegions: generate splices & regions
                    generateIntronFiles: generate intron files
                    intronsSelection: perform intron selection 
 -bamFileList VAL : file containing the paths to the bam files
 -clean VAL       : true: delete the temporary files; false: keep the
                    temporary files (default: true)
 -outFile VAL     : file storing the output of JULiP

Parallel version:

Usage: hadoop jar julipForHadoop.jar [options]
Options:
 -intronFileList VAL  : file containing the paths to the intron files

Configure Hadoop:

See this official link for setting up a single node cluster:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
and this link for cluster setup:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html

Input/Output

The primary input to JULiP is a set of RNA-seq alignment files, each in BAM format and sorted by chromosome and position, for instance as produced with the program Tophat2.

Given a collection of input alignment file {x}.bam files, JULiP generates three types of intermediate data files in each of the directories hosting the BAM files, namely: {x}.region ('gene' regions), {x}.splice (candidate introns) and {x}.intron (selected introns). Additionally, it generates the file gene.txt, containing all the detected gene regions, and three list files, containing the paths to the .region, .splice and .intron files for each input data set, in the working directory.

The format of the x.region file is:
chrom_id position #_of_reads_on_the_position
The format of the x.splice file is:
chrom_id start_intron_position end_intron_position #_of_supporting_reads strand
The format of the x.intron file is:
chrom_id start_intron_position end_intron_position #_of_supporting_reads strand

The final output, consisting of selected introns.

The format of the output file is:
chrom_id start_intron_position end_intron_position selection_flag gene_region_id beta reads_in_samples

Example

Sequential version:

java -jar julip.jar \
	 -model generateSplicesAndRegions \
	 -bamFileList bamFileList.txt

java -jar julip.jar \
	 -model generateIntronFiles

java -jar julip.jar \
     -model intronsSelection \
     -outFile introns_selection_result.txt

Parallel version:

hadoop fs -mkdir -p /user/hadoop
hadoop jar julipForHadoop.jar -intronFileList intronFileList.txt
hadoop fs -cat  /user/hadoop/splice_results/* > introns.txt.uniq

Support

Contact: gyang22@jhu.edu, florea@jhu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
samples		samples
LICENSE		LICENSE
README.md		README.md
julip.jar		julip.jar
julipForHadoop.jar		julipForHadoop.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

samples

samples

LICENSE

LICENSE

README.md

README.md

julip.jar

julip.jar

julipForHadoop.jar

julipForHadoop.jar

Repository files navigation

JULiP - An efficient model for accurate intron selection from multiple RNA-seq samples

Table of contents

What is JULiP?

Usage

Input/Output

Example

Support

About

Releases

Packages

Contributors 2

License

Guangyu-Yang/JULiP

Folders and files

Latest commit

History

Repository files navigation

JULiP - An efficient model for accurate intron selection from multiple RNA-seq samples

Table of contents

What is JULiP?

Usage

Input/Output

Example

Support

About

Resources

License

Stars

Watchers

Forks