RNAseq_umis

Description

RNA sequencing analysis pipeline with UMIs

Generate counts from fastq files from Ilumina sequencing with the use of UMIs.

Workflow

Contents of the repository

The bash script rnaseq_pipeline_umis.sh that can be used for obtaining read counts from fastq files.
Resource folder has two files, polyA.fa.gz (extract polyA in trimmering) and truseq_rna.fa.gz (extract adapters in trimmering).
config_env.yml which is a file for create the working enviroment.

The output of this bash script includes:

<sample>.read_counts.txt which are the read counts (counts folder).
<sample>.read_distribution.txt which are the read counts (counts folder).
<sample>.deduplicated.bam which are deduplicated bam ( folder).

There are other outputs less important such as fastqc, bai, trimmering, umi data, ...

Pipeline

Step 1: Clone the repo in your home

If you have not wget package installed

conda install -c anaconda git

Then clone the entire repository in your local space

git clone https://github.com/malumbreslab/RNAseq_umis.git
cd RNAseq_umis

Step 2: Update conda and create the environment

conda update --all
conda env create -f config_env.yml

Step 3: Activate the environment

conda activate rseq

Step 4: Create folders for data

In this folder you must introduce compressed fastq files (Example: S1.fastq.gz)

mkdir data

Step 5: Create folders for result files

mkdir counts genome fastqc

Step 6: Download genome reference and annotations files (hg38)

If you have not wget package installed

conda install -c anaconda wget

Then download files with url and uncompress

cd genome
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/latest/hg38.fa.gz
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
wget https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/hg38_RefSeq.bed.gz  
gzip -d hg38.fa.gz
gzip -d hg38.ncbiRefSeq.gtf.gz
gzip -d hg38_RefSeq.bed.gz

Step 7: Index genome reference (hg38)

cd ..
STAR --runThreadN 16 --runMode genomeGenerate --genomeDir genome --genomeFastaFiles genome/genome_hg38.fa --sjdbGTFfile genome/annotations_hg18.gtf --sjdbOverhang 75

Step 8a: Run pipeline in local

You must run this in your terminal shell and in sample must type after the script the diferent samples names that you want to analyze separated by spaces.

bash rnaseq_pipeline_umis.sh <sample1> <sample2> <sample3>

Step 8b: Run pipeline in cluster of CNIO

Outputs:

log.txt is the output file
error.txt is the error file

Parameters:

--mem is memory
-t is time
-J is job name
-c is number of cores
-o is name of output file
-e is name of error file
--wrap is the command which you want to run on the cluster

sbatch --mem=64G -t1440 -c 16 -J name -o log.txt -e error.txt --wrap "bash rnaseq_pipeline_umis.sh <sample1> <sample2> <sample3>"

Recomendations

The minimal requeriments are:

Memory (mem): 64Gb
Number of cores (c): 16
Time (t): 1440 min

For fastq with 10 millions of reads the time is about 40 min (without index genome reference, which could be around 2 hours)

When you index genome reference, you don't have to do it again if you want align some samples with the same reference genome.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
images		images
resource		resource
README.md		README.md
config_env.yml		config_env.yml
rnaseq_pipeline_umis.sh		rnaseq_pipeline_umis.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNAseq_umis

Description

Table of contents

Workflow

Contents of the repository

Pipeline

Step 1: Clone the repo in your home

Step 2: Update conda and create the environment

Step 3: Activate the environment

Step 4: Create folders for data

Step 5: Create folders for result files

Step 6: Download genome reference and annotations files (hg38)

Step 7: Index genome reference (hg38)

Step 8a: Run pipeline in local

Step 8b: Run pipeline in cluster of CNIO

Recomendations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RNAseq_umis

Description

Table of contents

Workflow

Contents of the repository

Pipeline

Step 1: Clone the repo in your home

Step 2: Update conda and create the environment

Step 3: Activate the environment

Step 4: Create folders for data

Step 5: Create folders for result files

Step 6: Download genome reference and annotations files (hg38)

Step 7: Index genome reference (hg38)

Step 8a: Run pipeline in local

Step 8b: Run pipeline in cluster of CNIO

Recomendations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages