`rTea` (RNA Transposable Element Analyzer)

rTea is a computational method to detect transposon-fusion RNA.

Citation: Pan-cancer analysis reveals multifaceted roles of retrotransposon-fusion RNAs

Overview

We developed rTea to detect TE-fusion transcripts from short-read RNA-seq data. We utilized multiple features from aligned reads, such as base quality of clipped sequences, percentage of multi-mapped reads, and matching score of reads to TE sequences to filter out false positives caused by nonspecifically mapped reads.

Demo and result files

Users can try rTea on a demo data set and can check the output at https://gitlab.aleelab.net/junseokpark/rTea-results

Installation

rTea runs on a Linux-based operating system with certain prerequisite software. Here is a list of the software you should install before you start using rTea.

System software for Ubuntu 18.04 LTS

apt-get update && apt-get install -y \
    cmake \
    libxml2-dev \
    libcurl4-openssl-dev \
    libboost-dev \
    gawk \
    libssl-dev \
    pigz \
    htop \
    iputils-ping

Before installing rTea, you'll also need to set up the prerequisite software and environment variables (ENV).
- fastp
- HISAT2 (>= v2.1.0)
- samtools (>= v1.9)
- HTSlib (>= v1.9)
- Scallop (>= v0.10.4)
- bamtools (>= v2.5.1)
```
# Bamtools environment
# BAMTOOL_HOME is installed directory
PKG_CXXFLAGS="-I$BAMTOOL_HOME/include/bamtools"
PKG_LIBS="-L$BAMTOOL_HOME/lib -lbamtools"
```
- bwa (>=0.7.17)
R (==3.6.2) and the necessary R software should be installed.

R -e "install.packages('XML', repos = 'http://www.omegahat.net/R')"
R -e "install.packages(c( \
       'magrittr', \
       'data.table', \
       'stringr', \
       'optparse', \
       'Rcpp', \
       'BiocManager' \
     ))"

R -e "BiocManager::install(c( \
       'GenomicAlignments', \
       'BSgenome.Hsapiens.UCSC.hg19', \
       'BSgenome.Hsapiens.UCSC.hg38', \
       'EnsDb.Hsapiens.v75', \
       'EnsDb.Hsapiens.v86' \
     ))"

Download GRCh38 genome_snp_tran

Use Docker for Installation

Build a Docker file and run rTea in the Docker container.

DOCKER_BUILDKIT=1 docker build -t rtea .

Use Singularity for Installation

After creating a Docker image for rTea, convert it to Singularity.

docker save -o rTea.tar rtea:latest
singularity build rTea.simg docker-archive://rTea.tar

Running `rTea`

If you are using Docker as your runtime environment, run the Docker image to execute rTea.

docker exec -it -v ${GENOME_SNP_TRAN_DIR}:/app/rTea/hg38/genome_snp_tran rtea bash

If the runtime environment is Singularity, execute the Singularity image to run rTea.

singularity shell -B ${GENOME_SNP_TRAN_DIR}:/app/rTea/hg38/genome_snp_tran \
    rTea.simg

rTea supports paired-end FASTQ files and a BAM file as input. For FASTQ file input, use the following command:

rTea.sh \
        ${R1.fq}.gz \
        ${R2.fq}.gz \
        $SAMPLE_NAME \
        $GENOME_SNP_TRAN_DIR \
        $NUMBER_OF_CORES \
        $OUT_DIR \
        hg38 \
        resume

For BAM file input, please use the following command:

rnatea_pipeline_from_bam \
        ${BAM} + \
        $SAMPLE_NAME \
        $GENOME_SNP_TRAN_DIR \
        $NUMBER_OF_CORES \
        $OUT_DIR \
        hg38

Output file

After running rTea, the user can find a <SAMPLE_NAME>.rTea.txt file in the rTea directory, which contains information about TEs and other supporting data.

Column	Description
chr	Chromosome name
pos	Fusion breakpoint position on the chromosome
ori	Fusion direction on the chromosome (f, TE\|gene; r, gene\|TE)
class	TE class
seq	Proximal portion of fusion sequence
isPolyA	Whether it is a fusion with polyA sequence
posRepFamily	Repeat masked repeat family on the breakpoint position
posRep	Repeat masked repeat element on the breakpoint position
TEfamily	TE family with highest alignment score when fusion sequence is aligned with consensus TE sequence
TEscore	Alignment score of fusion sequence with the consensus TE sequence
TEside	Fusion direction on the consensus TE sequence (5, TE\|gene; 3, gene\|TE)
TEbreak	Fusion breakpoint position on the consensus TE sequence
depth	Number of RNA-seq reads on the breakpoint position
matchCnt	Number of fusion-supporting RNA-seq reads
polyAcnt	Number of polyA reads
baseQual	Median base quality of supporting reads
lowMapQual	Number of supporting reads that have low mapping quality
mateDist	Minimum distance of mate reads
overhang	Distance of breakpoint from splice site
gap	Length of nearby intron
secondary	Proportion of supporting reads that are from secondary alignment
nonspecificTE	Mean alignment score of supporting reads to consensus TE sequence
r1pstrand	Proportion of supporting reads that are from positive strand of chromosome
fusion_tx_id	Transcript ID of the fusion transcript
tx_support_exon	Number of read fragments spanning exonic region of the fusion transcript ID
tx_support_intron	Number of read gaps matching the fusion transcript ID
strand	Strand of fusion transcript
pos_type	Genomic region of breakpoint
polyTE	Known non-reference TE on the breakpoint position
hardstart	Start position of nearby reference genome where fusion sequence came from
hardend	End position of nearby reference genome where fusion sequence came from
hardTE	Repeat masked TE subfamily of nearby reference genome where fusion sequence came from
hardDist	Distance from fusion breakpoint to nearby reference genome where fusion sequence came from
fusion_type	Type of TE fusion
fusion_tx_biotype	Biotype of fusion transcript
fusion_gene_id	Gene ID of fusion transcript
fusion_gene_name	Gene symbol of fusion transcript
Filter	Filter reason of low confidence fusion

Licenses

Contacts

Junseok Park Boram Lee

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
ctea		ctea
etc		etc
frombam		frombam
images		images
ref		ref
results		results
rtea		rtea
tmp		tmp
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
HISAT2_TE.sh		HISAT2_TE.sh
README.md		README.md
ctea2rtea.R		ctea2rtea.R
ctea_filter.R		ctea_filter.R
fastp.html		fastp.html
gtex_pipeline.sh		gtex_pipeline.sh
isb_cgc_api_v3.py		isb_cgc_api_v3.py
rnatea_pipeline_from_bam		rnatea_pipeline_from_bam
rtea.sh		rtea.sh
rtea_functions.R		rtea_functions.R
rtea_pipeline		rtea_pipeline
run_ctea.R		run_ctea.R
run_rtea.R		run_rtea.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`rTea` (RNA Transposable Element Analyzer)

Overview

Demo and result files

Installation

Use Docker for Installation

Use Singularity for Installation

Running `rTea`

Output file

Licenses

Contacts

About

Releases 1

Packages

Contributors 3

Languages

ealeelab/rtea

Folders and files

Latest commit

History

Repository files navigation

rTea (RNA Transposable Element Analyzer)

Overview

Demo and result files

Installation

Use Docker for Installation

Use Singularity for Installation

Running rTea

Output file

Licenses

Contacts

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

`rTea` (RNA Transposable Element Analyzer)

Running `rTea`

Packages