Copyright (C) 2016-2018, and GNU GPL, by Guangyu Yang, Liliana Florea
Includes portions copyright from:
SAMtools - Copyright (C) 2008-2009, Genome Research Ltd, Heng Li
Note: for the negative binomial version of JULiP, including the differential splicing testing, please go to https://github.com/splicebox/JULiP
JULiP selects a comprehensive and accurate set of introns simultaneously from multiple RNA-seq samples.
JULiP produces a reliable set of introns in three stages. Stage 1 collects the introns from multiple samples, extracted from spliced read alignments, filtering low quality introns. Stage 2 divides the introns across the genome into 'gene' regions. Stage 3 uses a Lasso-regularized program to determine a set of introns for each gene region.
Sequential version:
Usage: java -jar julip.jar [options]
Options:
-model VAL : generateSplicesAndRegions: generate splices & regions
generateIntronFiles: generate intron files
intronsSelection: perform intron selection
-bamFileList VAL : file containing the paths to the bam files
-clean VAL : true: delete the temporary files; false: keep the
temporary files (default: true)
-outFile VAL : file storing the output of JULiP
Parallel version:
Usage: hadoop jar julipForHadoop.jar [options]
Options:
-intronFileList VAL : file containing the paths to the intron files
Configure Hadoop:
See this official link for setting up a single node cluster:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
and this link for cluster setup:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html
The primary input to JULiP is a set of RNA-seq alignment files, each in BAM format and sorted by chromosome and position, for instance as produced with the program Tophat2.
Given a collection of input alignment file {x}.bam files, JULiP generates three types of intermediate data files in each of the directories hosting the BAM files, namely: {x}.region ('gene' regions), {x}.splice (candidate introns) and {x}.intron (selected introns). Additionally, it generates the file gene.txt, containing all the detected gene regions, and three list files, containing the paths to the .region, .splice and .intron files for each input data set, in the working directory.
-
The format of the x.region file is:
chrom_id position #_of_reads_on_the_position -
The format of the x.splice file is:
chrom_id start_intron_position end_intron_position #_of_supporting_reads strand -
The format of the x.intron file is:
chrom_id start_intron_position end_intron_position #_of_supporting_reads strand
The final output, consisting of selected introns.
- The format of the output file is:
chrom_id start_intron_position end_intron_position selection_flag gene_region_id beta reads_in_samples
Sequential version:
java -jar julip.jar \
-model generateSplicesAndRegions \
-bamFileList bamFileList.txt
java -jar julip.jar \
-model generateIntronFiles
java -jar julip.jar \
-model intronsSelection \
-outFile introns_selection_result.txt
Parallel version:
hadoop fs -mkdir -p /user/hadoop
hadoop jar julipForHadoop.jar -intronFileList intronFileList.txt
hadoop fs -cat /user/hadoop/splice_results/* > introns.txt.uniq
Contact: gyang22@jhu.edu, florea@jhu.edu