# Mapping Reads to a Reference Genome
<!---
** Add Intro Text***
-->

## Shell Variables

In [None]:
# Source the config script
source pe_analysis_config.sh
mkdir -p ${STAR_OUT}

ls $CUROUT

## Mapping with STAR


In [None]:
STAR \
    --runMode alignReads \
    --twopassMode None \
    --genomeDir $GENOME_DIR \
    --readFilesIn $TRIMMED/10_H_S8_L002_R1_001.trim.fastq.gz $TRIMMED/10_H_S8_L002_R2_001.trim.fastq.gz \
    --readFilesCommand gunzip -c \
    --outFileNamePrefix ${STAR_OUT}/10_H_S8_L002_ \
    --quantMode GeneCounts \
    --outSAMtype BAM Unsorted \
    --outSAMunmapped Within \
    --runThreadN 2

We will start with these parameters, but there is an extensive list of command line options detailed in the [STAR Manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf), it is a good idea to read through and try to understand all of them.  We will discuss some more later.

* --runMode alignReads : map reads 
* --twopassMode : Run one pass or two? If two-pass mode is on, STAR tries to discover novel junctions, then reruns mapping with these added to the annotation
* --genomeDir : directory containing the genome index
* --readFilesIn : input FASTQ
* --readFilesCommand gunzip -c : use "gunzip -c" to uncompress FASTQ on-the-fly, since it is gzipped
* --outFileNamePrefix : prefix (and path) to use for all output files
* --quantMode GeneCounts : output a table of read counts per gene
* --outSAMtype BAM Unsorted : output an unsort BAM file
* --outSAMunmapped Within : included unmapped reads in the BAM file
* --runThreadN : tells STAR to run using multiple cores.  I am using it so we don't have to wait too long for this to run during class.  It is OK to use multiple cores, but before you do this you should be sure that the server is not busy, and even then you should use a reasonable number of cores.  Abusing multi-threading is inconsiderate of other users and could crash the server.


### STAR Output
So what happened? Let's take a look . . .

In [None]:
ls ${STAR_OUT}

In [None]:
head ${STAR_OUT}/10_H_S8_L002*

STAR generates several files for each FASTQ:
* Log.out : lots of details of the run, including all parameters used
* Log.final.out : Important summary statistics
* ReadsPerGene.out.tab : Count table, the main thing we are interested in
* SJ.out.tab : All splice junctions, including ones from the GTF and novel junctions discovered by STAR
* Log.progress.out: run statistics updated during run, not so interesting at the end

Let's take a closer look at Log.final.out

In [None]:
cat ${STAR_OUT}/10_H_S8_L002_Log.final.out

### MultiQC
MultiQC also works with STAR reports, so let's try it!

In [None]:
multiqc ${STAR_OUT} --outdir ${STAR_OUT}

Once multiqc is done running we can view the results by finding the output in the Jupyter browser, it should be in a file named `multiqc_report.html` in :

In [None]:
echo ${STAR_OUT}