GitHub - chelseaju/Fleximer: Accurate Quantification of RNA-Seq via Variable Length k-mers

Fleximer Fleximer: Accurate Quantification of RNA-Seq via Variable Length k-mers

Download

git clone https://github.com/chelseaju/Fleximer.git

Installation

Fleximer is implemented in C++11, and requires compiler g++ 4.7 and above. Please run the following commands to compile the executables:

cd Fleximer
make

Binary files are also provided:

sigmer_generation, sigmer_selection, sigmer_count, EM

Data

Fleximer contains two stages: sig-mer preparation and RNA-Seq quantification. In sig-mer preparation, Fleximer takes a predefined partition of transcriptome. The partition divide transcripts into a set of non-overlapping clusters in a specialized fasta format. A regular fasta format contains a header line of transcript ID follow by one sequence:

>seq1
ACGTACGTA

A specialized fasta format contains a header line of a cluster ID and a set of member transcript IDs, follow by member transcript sequence. Each member is separated by '|':

>clusterX|seq1|seq2
ACGTACGTA|ACGAACGTA

The partition can be based on sequence similarity, or functional annotation. By default, we partition the transcriptome based on sequence similarity as described in RNA-Skim. A human transcriptome partition is provided in /Data/reference/human_60mers.fa.

In RNA-Seq quantification stage, Fleximer takes the RNA-Seq data as input. Several examples simulated from polyester are provided in Data/test.

Sig-mer Preparation

Fleximer relies on the structure and properties of suffix tree to discover all the possible sig-mers. It then constructs sig-mers overlap graphs to guide the selection. sigmer_generation takes the specialized fasta file of transcriptome partition as input, and writes all sig-mers to an intermediate file. sigmer_selection takes the output from sigmer_generation and a desired read length to produces a list of representative sig-mers for the quantification stage.

./sigmer_generation Data/reference/human_60mers.fa Data/sigmers.txt
./sigmer_selection Data/reference/human_60mers.fa Data/sigmers.txt 75 Data/selected_sigmer

RNA-Seq Quantification

Fleximer determines the reads origin through the representative sig-mers. It uses the Aho-Corasick algorithm to determine the sig-mers occurrences for each read. For reads that may come from more than one origin, Fleximer uses the EM algorithm to resolve the ambiguity.

./sigmer_count Data/reference/human_60mers.fa Data/selected_sigmer.txt Data/test/len75_sample_46/eClass_count.txt Data/test_len75_sample_46/sample_01.fq
./EM Data/reference/human_60mers.fa Data/test/len75_sample_46/eClass_count.txt 75 Data/test/len75_sample_46/abundance.txt

The final transcript abundance for each transcript is reported in read count, RPKM, and TPM.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data		Data
third-party		third-party
EM		EM
EM.cpp		EM.cpp
HELPER.cpp		HELPER.cpp
HELPER.hpp		HELPER.hpp
Makefile		Makefile
README.md		README.md
SEQCLUSTER.cpp		SEQCLUSTER.cpp
SEQCLUSTER.hpp		SEQCLUSTER.hpp
aho01new.cpp		aho01new.cpp
aho01new.hpp		aho01new.hpp
sigmer_count		sigmer_count
sigmer_count.cpp		sigmer_count.cpp
sigmer_generation		sigmer_generation
sigmer_generation.cpp		sigmer_generation.cpp
sigmer_selection		sigmer_selection
sigmer_selection.cpp		sigmer_selection.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download

Installation

Data

Sig-mer Preparation

RNA-Seq Quantification

About

Releases

Packages

Languages

chelseaju/Fleximer

Folders and files

Latest commit

History

Repository files navigation

Download

Installation

Data

Sig-mer Preparation

RNA-Seq Quantification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages