From 7f9923680fb8c738fa747954e9be276f462647b4 Mon Sep 17 00:00:00 2001 From: Jaclyn Taroni Date: Mon, 24 Sep 2018 15:47:08 -0400 Subject: [PATCH] Add info about fasterq-dump (#36) --- docs/main_text.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/main_text.md b/docs/main_text.md index 19d772d..b3275f8 100644 --- a/docs/main_text.md +++ b/docs/main_text.md @@ -124,7 +124,9 @@ Some analyses around this platform detection procedure can be found in [this rep ![rna-seq-pipeline](https://user-images.githubusercontent.com/15315514/44549339-c86fd680-a6ee-11e8-8d62-419ae7f10a94.png) We use [Salmon](https://combine-lab.github.io/salmon/) and [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) to process all RNA-seq data in refine.bio. -We obtain fastq files run on our [supported short-read platforms](https://github.com/AlexsLemonade/refinebio/blob/dev/config/supported_rnaseq_platforms.txt) from Sequence Read Archive. +We obtain sra files run on our [supported short-read platforms](https://github.com/AlexsLemonade/refinebio/blob/dev/config/supported_rnaseq_platforms.txt) from NCBI Sequence Read Archive and use [`fasterq-dump`](https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump) (with default behavior) to obtain fastq files for use with Salmon. +Note that any unmated reads from paired experiments are discarded. + We use the library strategy and library source metadata fields to identify RNA-seq experiments. It's possible that experiments that are inappropriate for use with Salmon will still appear in refine.bio (e.g., long-read platforms that are labeled incorrectly in the source repository). If you find an experiment that you believe is inappropriate for use with Salmon, please [file an issue on GitHub](https://github.com/AlexsLemonade/refinebio/issues) so that we can resolve it.