Skip to content

Shao-Group/rnabridge-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repository tests the performance of rnabridge-align and rnabridge-denovo. Here we provide scripts to download datasets, run these tools and reproduce the results and figures in the manuscript.

The pipeline involves in the followint five steps:

  1. Download necessary datasets (data directory).
  2. Download and/or compile necessary programs (programs directory).
  3. Run the methods and produce results regarding rnabridge-align (align directory).
  4. Run the methods and produce results regarding rnabridge-denovo (denovo directory).
  5. Summarize results and produce figures (plots directory).

Datasets

We evaluate them on two datasets, namely simulation80 and encode10. We also need the reference annotation files for evaluating reference-based transcript assembly. In directory data, we provide metadata for these datasets, and also provide scripts to download them.

simulation80

The data was simulated with Flux-Simulator. We tried two parameters, the average length of fragments (300 and 500) and the length of reads (75 and 100). For each combination, we simulated 20 samples. The reads, ground-truth transcripts, alignments (using STAR) can be downloaded through Penn State Data Commons (https://doi.org/10.26208/b01x-aq20).

encode10

This dataset contains 10 human RNA-seq samples downloaded from ENCODE. This dataset has also been used in scalloptest. All these samples are sequenced with strand-specific and paired-end protocols. For each of these 10 samples, we align it with two RNA-seq aligners, STAR and HISAT2. You may download all these reads alignments via Penn State Data Commons (https://doi.org/10.26208/8c06-w247).

annotations

Use the following script in data to download annotations:

./download.annotation.sh

The downloaded files will appear under data/ensembl.

Programs

Our experiments (used in the manuscript) involve the following four programs:

Program Version Description
rnabridge-align v1.0.1 bridging RNA-seq alignments
Scallop v0.10.5 transcript assembler
StringTie v2.1.4 transcript assembler
gffcompare v0.11.2 Evaluate assembled transcripts
gtfcuff a set of utilities for processing RNA-seq data

You need to download and/or complile them, and then link them to programs directory. Make sure that the program names are in lower cases (i.e., stringtie, scallop, and gffcompare) in programs directory.

Generate Results for Evaluating rnabridge-align

Once the datasets and programs are available, use the following scripts in align to run:

./run.simulation80.sh
./run.encode10.sh

In each of these scripts, you can modify it to run different parameters. For each run, you need to specify a run-id, which will be used later on when collecting the results.

After experiments finish running, the following script can collect accuracies:

./collect.sh

This will report results to a directory results.RUN-ID, which can be directly use by the scripts to generate figreus (below).

Analysis Results and Reproduce Figures

Once the results have been generated, one can use the following scripts in plots to reproduce the figures:

./build.figures.sh

You may need to install R tikzDevice. You may also need to modify these scripts to match the run-id(s) you specified. #The results used in the manuscript (run-id = D400) has been update in this repo (including GTEx dataset), #so the directly running above script can generate all figures used in the manuscript.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages