# Working with Loops 

## Shell Variables
Assign the variables in this notebook.

In [1]:
source bioinf_intro_config.sh
mkdir -p $TRIMMED $STAR_OUT

## A Brief journey into `for` loops
`for` loops take our use of the `$FASTQ` variable to the next level! It is analogous to how you would teach a child to set the table: "FOR each place at the table, put a plate . . .,
At the shell you phrase it like this:

    for PERSON in Alice Bob Carol Dave Eve
    do
    put plate at PERSON's place
    put napkin at PERSON's place
    put fork at PERSON's place
    put spoon at PERSON's place
    put knife at PERSON's place
    done

Here is a real example:

In [2]:
for FASTQ in A B C D E F
    do
       echo "______${FASTQ}________"
    done

______A________
______B________
______C________
______D________
______E________
______F________


The `for` loop in Bash is conceptually the same as in any other programming language, although the syntax may be different.  The `do` and `done` are essential - `do` needs to be before the "loop body" (what is going to be repeated) and `done` needs to be after it.

So let's try something almost useful:

In [3]:
for FASTQ in 21_2019_P_M1_S21_L002_R1
    do
        echo "RUNNING FASTQ: ${FASTQ}"
    done

RUNNING FASTQ: 21_2019_P_M1_S21_L002_R1


## Now for the real thing . . .
### Let's run the pipeline in a loop:
Notice that we are now assigning to the `$FASTQ` variable in the `for` statement

In [4]:
for FASTQ in 21_2019_P_M1_S21_L002_R1
    do
        echo "---------------- TRIMMING: $FASTQ ----------------"
        fastq-mcf \
            $MYINFO/neb_e7600_adapters.fasta \
            $RAW_FASTQS/${FASTQ}_001.fastq.gz \
            -q 20 -x 0.5 \
            -o $TRIMMED/${FASTQ}_001.trim.fastq.gz
        
        echo "---------------- MAPPING: $FASTQ ----------------"
        STAR \
            --runMode alignReads \
            --twopassMode None \
            --genomeDir $GENOME_DIR \
            --readFilesIn $TRIMMED/${FASTQ}_001.trim.fastq.gz \
            --readFilesCommand gunzip -c \
            --outFileNamePrefix ${STAR_OUT}/${FASTQ}_ \
            --quantMode GeneCounts \
            --outSAMtype None \
            --runThreadN 2
    done

---------------- TRIMMING: 21_2019_P_M1_S21_L002_R1 ----------------
Command Line: /home/jovyan/work/scratch/bioinf_intro/myinfo/neb_e7600_adapters.fasta /data/hts_2019_data/hts2019_pilot_rawdata/21_2019_P_M1_S21_L002_R1_001.fastq.gz -q 20 -x 0.5 -o /home/jovyan/work/scratch/bioinf_intro/trimmed_fastqs/21_2019_P_M1_S21_L002_R1_001.trim.fastq.gz
Scale used: 2.2
Phred: 33
Threshold used: 751 out of 300000
Adapter Adapter (AGATCGGAAGAGCACACGTCTGAACTCCAGTCA): counted 2515 at the 'end' of '/data/hts_2019_data/hts2019_pilot_rawdata/21_2019_P_M1_S21_L002_R1_001.fastq.gz', clip set to 6
Files: 1
Total reads: 2437108
Too short after clip: 1347
Clipped 'end' reads: Count: 44977, Mean: 15.55, Sd: 8.27
Trimmed 288960 reads by an average of 1.70 bases on quality < 20
---------------- MAPPING: 21_2019_P_M1_S21_L002_R1 ----------------
Jun 26 15:35:08 ..... started STAR run
Jun 26 15:35:08 ..... loading genome
Jun 26 15:35:09 ..... started mapping
Jun 26 15:36:32 ..... finished successfully


### And let's check the result

In [5]:
ls ${STAR_OUT}

21_2019_P_M1_S21_L001_R1_short_introns_Aligned.sortedByCoord.out.bam
21_2019_P_M1_S21_L001_R1_short_introns_Aligned.sortedByCoord.out.bam.bai
21_2019_P_M1_S21_L001_R1_short_introns_Log.final.out
21_2019_P_M1_S21_L001_R1_short_introns_Log.out
21_2019_P_M1_S21_L001_R1_short_introns_Log.progress.out
21_2019_P_M1_S21_L001_R1_short_introns_ReadsPerGene.out.tab
21_2019_P_M1_S21_L001_R1_short_introns_SJ.out.tab
21_2019_P_M1_S21_L002_R1_Aligned.out.bam
21_2019_P_M1_S21_L002_R1_Log.final.out
21_2019_P_M1_S21_L002_R1_Log.out
21_2019_P_M1_S21_L002_R1_Log.progress.out
21_2019_P_M1_S21_L002_R1_ReadsPerGene.out.tab
21_2019_P_M1_S21_L002_R1_short_introns_Aligned.sortedByCoord.out.bam
21_2019_P_M1_S21_L002_R1_short_introns_Aligned.sortedByCoord.out.bam.bai
21_2019_P_M1_S21_L002_R1_short_introns_Log.final.out
21_2019_P_M1_S21_L002_R1_short_introns_Log.out
21_2019_P_M1_S21_L002_R1_short_introns_Log.progress.out
21_2019_P_M1_S21_L002_R1_short_introns_ReadsPerGene.out.tab
21_2019_P_M1_S21_L002_R1_short_in

In [6]:
head ${STAR_OUT}/21_2019_P_M1_S21_L002_R1_ReadsPerGene.out.tab

N_unmapped	46060	46060	46060
N_multimapping	35006	35006	35006
N_noFeature	12466	2145291	18783
N_ambiguous	203327	820	316
CNAG_04548	0	0	0
CNAG_07303	0	0	0
CNAG_07304	6	0	6
CNAG_00001	0	0	0
CNAG_07305	1	0	1
CNAG_00002	51	0	51
