Skip to content

Pipeline for RNAseq analysis in Bash and scripts for microarray gene expression analysis in R

Notifications You must be signed in to change notification settings

BioSystemsUM/bRNAsPipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline for RNAseq and Microarray analysis

Pipeline and scripts used for raw microarray and RNAseq data analysis in "Tânia Barata, Vítor Vieira, Rúben Rodrigues, Ricardo Pires das Neves, Miguel Rocha, Reconstruction of tissue-specific genome-scale metabolic models for human cancer stem cells, Computers in Biology and Medicine, Volume 142, 2022, 105177, ISSN 0010-4825, https://doi.org/10.1016/j.compbiomed.2021.105177"

RNAseq

Pipeline for RNAseq was developed in Bash and it uses docker containers. Requirements to run are: Linux system and Podman. It is recommended to use ensembl annotation and ensembl genome reference files.

  1. Fill Studies_RNAseq.txt with your studies info. Studies_RNAseq.txt is a tab-delimited file, under data directory. Columns:
  • Study is the study identifier
  • SampleId is sample identifier
  • Reads has '1' or '2' to destinguish between foward and reverse reads, single-end studies have 'Unpaired'
  • Link is the link to fastq.gz file.
  • All other columns should be filled with 'NA' when there are no values.
  1. Move to rnaseq scripts folder: mv scr/bash
  2. Edit base folder path and URLs of genome and annotation files in scr/bash/Edit
  3. Download genome ref and annotation files by doing: ./Dirs.sh
  4. Download files of a study with: DownloadFiles.sh <Study>
  5. Confirm if files finished to download: ps -e | grep <jobId> To get Job ids of donwloads cd data/<Study>/rawData and do cat PIDs
  6. After all downloads finish, evaluate raw read quality with: ./GetQCfiles.sh <Study>.
  7. After this, manually check fastqc results and decide which contaminants/overrepresented sequences should be removed in each sample and add them to file Seq2RemoveFile in folder data//trimmedData so that Trimmomatic will remove those sequences. If no file is provided, trimmomatic runs without excluding those sequences. Example of Seq2RemoveFile content:

seqname ACTTTTTTTTTTTTTTTTTTT

  1. To define specific trimmomatic parameters for a sample, include a file named TrimParams in directory data/trimmedData where you can change trimmomatic parameters for each sample, if you see for example that reads need to be trimmed in that study. Otherwise, default parameters are run.
  2. To run the rest of the analysis: ./RNAseqAnalysis.sh <Study> Results are in folders inside directory data/

Microarray

To run in Windows OS with R. File with studies info is: Studies_Microarrays.xlsx Run script scr/R/MicroarrayNormalize.R Paths are hardcoded at beginning of the script

About

Pipeline for RNAseq analysis in Bash and scripts for microarray gene expression analysis in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published