**01.14.2020** <br>
**RNASeq Processing (fq files to counts table)** <br>
All files were transferred to local computing cluster O2 <br>
The cluster location is **/n/scratch2/ajit**- So replace this with your own location for reuse of this code. <br>

All files were stored inside a folder named **jain_rnaseq** <br>
Full location: /n/scratch2/ajit/jain_rnaseq/raw_files <br>

**Step-1: Unzip all the files**

```
cd /n/scratch2/ajit/jain_rnaseq/raw_files
find . -name "*.gz" | while read filename; do gunzip "`dirname "$filename"`" "$filename"; done;
```

**Step-2: Move all the files fom inside a folder to outside and remove empty folders**

```
find . -name '*.fq' -exec mv {} . \;
find . -depth -type d -empty -exec rmdir {} \;
```

**Step-3: Create a sample description file and edit to include metadata regarding the samples**

```
(echo 'samplename,description'; for f in raw_files/*fq*; do readlink -f $f | perl -pe 's/(.*?_(S[0-9]+)_.*)/\1,\2/'; done) > alignment.csv
```

**Step-4: Download the reference genome**

```
mkdir reference
cd reference
wget ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
wget ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz
gunzip *
```

**Step-5: Prepare the O2.yaml file**

```
cd ..
vim O2.yaml

details:
  - analysis: RNA-seq
    genome_build: hg38
    algorithm:
      transcriptome_fasta: /n/scratch2/ajit/jain_rnaseq/reference/Homo_sapiens.GRCh38.cdna.all.fa
      transcriptome_gtf: //n/scratch2/ajit/jain_rnaseq/reference/Homo_sapiens.GRCh38.96.gtf
      aligner: hisat2
      strandedness: unstranded
      tools_on: [bcbiornaseq]
      bcbiornaseq:
          organism: homo sapiens
upload:
  dir: ../final
```

**Step-6: Intiate bcBio**

```
module load bcbio/latest
unset PYTHONPATH
bcbio_nextgen.py -w template O2.yaml alignment.csv raw_files/
```

**Step-7: Create Submission script to O2**

```
vim submit_bcbio.sh

#!/bin/sh
#SBATCH -p medium
#SBATCH -J bcbio_O2_jain
#SBATCH -o run.o
#SBATCH -e run.e
#SBATCH -t 3-00:00
#SBATCH --cpus-per-task=3
#SBATCH --mem=100G
#SBATCH --mail-type=END         # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=ajitj_nirmal@dfci.harvard.edu   # Email to which notifications will be sent

export PATH=/n/app/bcbio/tools/bin:$PATH
bcbio_nextgen.py ../config/alignment.yaml \
    -n 24 -t ipython -s slurm -q medium -r t=3-00:00 --timeout 2000
```

**Step-8: Submit job to O2 for processing**

```
cp submit_bcbio.sh alignment/work
cd alignment/work
sbatch submit_bcbio.sh
```