# Task 1: Bulk RNA-seq data analysis
## Notebook 1: Quality control (QC) and mapping


Comparative & Regulatory Genomics - [I0U29a] | Task 1 - Bulk RNA-seq data analysis | Antoine Ruzette <b> r0829308 </b> | 19.12.2021

Based on the RNA-seq map count jupyter notebook from Prof. Stein Aerts. 


### Data loading

First, we create a directory named `Assignment` to work in: 

In [8]:
mkdir -p /mnt/storage/$USER/jupyternotebooks/Assignment
cd /mnt/storage/$USER/jupyternotebooks/Assignment

We load the data using the fastq -dump function: 

In [9]:
vdb-config -s /repository/user/cache-disabled=true

Six data set were available. For simplicity, four of them were randomly sampled out of the six, namely control 1 (...21), control 3 (...23), treatment 1 (...18) and treatment 2 (...19).

#### SRR13516821 - Control 1

In [25]:
fastq-dump --split-files SRR13516821

Read 36112182 spots for SRR13516821
Written 36112182 spots for SRR13516821


#### SRR13516823 - Control 3

In [37]:
fastq-dump --split-files SRR13516823

Read 38608574 spots for SRR13516823
Written 38608574 spots for SRR13516823


#### SRR13516818 - Treatment1

In [3]:
fastq-dump --split-files SRR13516818

Read 33591277 spots for SRR13516818
Written 33591277 spots for SRR13516818


#### SRR13516819	- Treatment2

In [7]:
fastq-dump --split-files SRR13516819

Read 47956846 spots for SRR13516819
Written 47956846 spots for SRR13516819


In [28]:
ls -lt *.fastq

-rw-r--r-- 1 r0829308 domain users 6600197076 Nov 29 21:57 SRR13516821_1.fastq
-rw-r--r-- 1 r0829308 domain users 8779615252 Nov 29 16:19 SRR13516819_1.fastq
-rw-r--r-- 1 r0829308 domain users 6136350556 Nov 29 14:41 SRR13516818_1.fastq
-rw-r--r-- 1 r0829308 domain users 7059533204 Nov 27 13:10 SRR13516823_1.fastq


### Quality control using `FastQC`

In [26]:
#!/bin/bash
for filename in /mnt/storage/$USER/jupyternotebooks/Assignment/*.fastq; 
do
    /usr/bin/fastqc -o . $filename
done

Started analysis of SRR13516818_1.fastq
Approx 5% complete for SRR13516818_1.fastq
Approx 10% complete for SRR13516818_1.fastq
Approx 15% complete for SRR13516818_1.fastq
Approx 20% complete for SRR13516818_1.fastq
Approx 25% complete for SRR13516818_1.fastq
Approx 30% complete for SRR13516818_1.fastq
Approx 35% complete for SRR13516818_1.fastq
Approx 40% complete for SRR13516818_1.fastq
Approx 45% complete for SRR13516818_1.fastq
Approx 50% complete for SRR13516818_1.fastq
Approx 55% complete for SRR13516818_1.fastq
Approx 60% complete for SRR13516818_1.fastq
Approx 65% complete for SRR13516818_1.fastq
Approx 70% complete for SRR13516818_1.fastq
Approx 75% complete for SRR13516818_1.fastq
Approx 80% complete for SRR13516818_1.fastq
Approx 85% complete for SRR13516818_1.fastq
Approx 90% complete for SRR13516818_1.fastq
Approx 95% complete for SRR13516818_1.fastq
Analysis complete for SRR13516818_1.fastq
Started analysis of SRR13516819_1.fastq
Approx 5% complete for SRR13516819_1.fastq


In [27]:
#open the zip file to inspect the QC report + insert screenshots in master notebook
ls -lt *.zip

-rw-r--r-- 1 r0829308 domain users 507415 Nov 29 22:22 SRR13516823_1_fastqc.zip
-rw-r--r-- 1 r0829308 domain users 506755 Nov 29 22:21 SRR13516821_1_fastqc.zip
-rw-r--r-- 1 r0829308 domain users 495198 Nov 29 22:19 SRR13516819_1_fastqc.zip
-rw-r--r-- 1 r0829308 domain users 496964 Nov 29 22:17 SRR13516818_1_fastqc.zip


### Mapping to the genome using STAR

#### Running `STAR` on each fastq files

In [29]:
#!/bin/bash
for filename in /mnt/storage/$USER/jupyternotebooks/Assignment/*.fastq; 
do
    STAR --genomeDir /mnt/nfs/mfiers/STAR/hg19_star_db \
         --genomeLoad NoSharedMemory \
         --runThreadN 2 \
         --readFilesIn $filename \
         --outFileNamePrefix $filename.
done

Nov 29 22:28:12 ..... started STAR run
Nov 29 22:28:12 ..... loading genome
Nov 29 22:29:16 ..... started mapping
Nov 29 22:34:52 ..... finished successfully
Nov 29 22:34:52 ..... started STAR run
Nov 29 22:34:52 ..... loading genome
Nov 29 22:35:12 ..... started mapping
Nov 29 22:43:09 ..... finished successfully
Nov 29 22:43:09 ..... started STAR run
Nov 29 22:43:09 ..... loading genome
Nov 29 22:43:28 ..... started mapping
Nov 29 22:50:04 ..... finished successfully
Nov 29 22:50:04 ..... started STAR run
Nov 29 22:50:04 ..... loading genome
Nov 29 22:50:24 ..... started mapping
Nov 29 22:56:42 ..... finished successfully


In [30]:
ls -l *.sam
#note the presence of sam files for each treatment

-rw-r--r-- 1 r0829308 domain users  7162954163 Nov 29 22:34 SRR13516818_1.fastq.Aligned.out.sam
-rw-r--r-- 1 r0829308 domain users 10299467879 Nov 29 22:43 SRR13516819_1.fastq.Aligned.out.sam
-rw-r--r-- 1 r0829308 domain users  7889097735 Nov 29 22:50 SRR13516821_1.fastq.Aligned.out.sam
-rw-r--r-- 1 r0829308 domain users  8446737478 Nov 29 22:56 SRR13516823_1.fastq.Aligned.out.sam


### SAM file structure

In [32]:
head -40 SRR13516818_1.fastq.Aligned.out.sam | grep '^@'

@HD	VN:1.4
@SQ	SN:chrM	LN:16571
@SQ	SN:chr1	LN:249250621
@SQ	SN:chr2	LN:243199373
@SQ	SN:chr3	LN:198022430
@SQ	SN:chr4	LN:191154276
@SQ	SN:chr5	LN:180915260
@SQ	SN:chr6	LN:171115067
@SQ	SN:chr7	LN:159138663
@SQ	SN:chr8	LN:146364022
@SQ	SN:chr9	LN:141213431
@SQ	SN:chr10	LN:135534747
@SQ	SN:chr11	LN:135006516
@SQ	SN:chr12	LN:133851895
@SQ	SN:chr13	LN:115169878
@SQ	SN:chr14	LN:107349540
@SQ	SN:chr15	LN:102531392
@SQ	SN:chr16	LN:90354753
@SQ	SN:chr17	LN:81195210
@SQ	SN:chr18	LN:78077248
@SQ	SN:chr19	LN:59128983
@SQ	SN:chr20	LN:63025520
@SQ	SN:chr21	LN:48129895
@SQ	SN:chr22	LN:51304566
@SQ	SN:chrX	LN:155270560
@SQ	SN:chrY	LN:59373566
@PG	ID:STAR	PN:STAR	VN:STAR_2.5.4b	CL:STAR   --runThreadN 2   --genomeDir /mnt/nfs/mfiers/STAR/hg19_star_db   --genomeLoad NoSharedMemory   --readFilesIn /mnt/storage/r0829308/jupyternotebooks/Assignment/SRR13516818_1.fastq      --outFileNamePrefix /mnt/storage/r0829308/jupyternotebooks/Assignment/SRR13516818_1.fastq.
@CO	user command line: STAR --genomeDir /mn

In [33]:
head -40 SRR13516818_1.fastq.Aligned.out.sam | grep -v '^@'

SRR13516818.1	0	chr12	53809466	255	50M	*	0	0	GNTGCAAGTAGTGAGGATTTTGTTGATACCTCTGCTGGGATGTGTGCTTT	F#FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF,FFF	NH:i:1	HI:i:1	AS:i:48	nM:i:0
SRR13516818.2	16	chr1	65309809	255	50M	*	0	0	TTCCAAAGCTCCACTTGTCAGCAGCCACACTCAGGTTCTTGGAGTCCTNA	:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF#F	NH:i:1	HI:i:1	AS:i:48	nM:i:0
SRR13516818.3	16	chr14	21820068	255	50M	*	0	0	TAGAAGCGGTACATGAGGCACAACCTGATATATCTTCCTTAGATATATNA	FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF#F	NH:i:1	HI:i:1	AS:i:48	nM:i:0
SRR13516818.4	0	chr9	37126904	255	36M175245N14M	*	0	0	TNAACCTTGTGGGATGTGAAAACTCTGTTACTGAAGGGGAAGATGGTATA	F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:45	nM:i:1
SRR13516818.5	16	chr12	125397694	3	50M	*	0	0	CAGGGTACGACCATCTTCCAGCTGTTTTCCGGCAAAGATCAACCTCTGNT	FFFFFFF,FFFFFFFFFFFFFFF:FFFFFFFFFFFFF:::FFFFFFFF#:	NH:i:2	HI:i:1	AS:i:48	nM:i:0
SRR13516818.5	272	chr12	125397238	3	50M	*	0	0	CAGGGTACGACCATCTTCCAGCTGTTTTCCGGCAAAGATCAACCTCTGNT	FFFFFFF,FFFFF

#### Convert sam to bam files using `samtools`

In [34]:
#!/bin/bash
for filename in /mnt/storage/$USER/jupyternotebooks/Assignment/*.fastq; 
do
    samtools sort -o $filename.bam $filename.Aligned.out.sam
done

[bam_sort_core] merging from 8 files and 1 in-memory blocks...
[bam_sort_core] merging from 12 files and 1 in-memory blocks...
[bam_sort_core] merging from 9 files and 1 in-memory blocks...
[bam_sort_core] merging from 10 files and 1 in-memory blocks...


In [35]:
ls -l *.bam
#notice the presence of bam files for each treatment

-rw-r--r-- 1 r0829308 domain users 557561505 Nov 29 23:56 SRR13516818_1.fastq.bam
-rw-r--r-- 1 r0829308 domain users 790438151 Nov 30 00:00 SRR13516819_1.fastq.bam
-rw-r--r-- 1 r0829308 domain users 580905214 Nov 30 00:04 SRR13516821_1.fastq.bam
-rw-r--r-- 1 r0829308 domain users 642591761 Nov 30 00:08 SRR13516823_1.fastq.bam


In [37]:
#generate an index file 
#!/bin/bash
for filename in /mnt/storage/$USER/jupyternotebooks/Assignment/*.fastq; 
do
    samtools index $filename.bam
done

In [38]:
ls -l *.bai

-rw-r--r-- 1 r0829308 domain users 2809016 Nov 30 00:14 SRR13516818_1.fastq.bam.bai
-rw-r--r-- 1 r0829308 domain users 3063528 Nov 30 00:14 SRR13516819_1.fastq.bam.bai
-rw-r--r-- 1 r0829308 domain users 2677232 Nov 30 00:15 SRR13516821_1.fastq.bam.bai
-rw-r--r-- 1 r0829308 domain users 2834344 Nov 30 00:15 SRR13516823_1.fastq.bam.bai


In [39]:
#check the presence of the sam/bam/bai files
ls -l *[bs]a[mi]

-rw-r--r-- 1 r0829308 domain users  7162954163 Nov 29 22:34 SRR13516818_1.fastq.Aligned.out.sam
-rw-r--r-- 1 r0829308 domain users   557561505 Nov 29 23:56 SRR13516818_1.fastq.bam
-rw-r--r-- 1 r0829308 domain users     2809016 Nov 30 00:14 SRR13516818_1.fastq.bam.bai
-rw-r--r-- 1 r0829308 domain users 10299467879 Nov 29 22:43 SRR13516819_1.fastq.Aligned.out.sam
-rw-r--r-- 1 r0829308 domain users   790438151 Nov 30 00:00 SRR13516819_1.fastq.bam
-rw-r--r-- 1 r0829308 domain users     3063528 Nov 30 00:14 SRR13516819_1.fastq.bam.bai
-rw-r--r-- 1 r0829308 domain users  7889097735 Nov 29 22:50 SRR13516821_1.fastq.Aligned.out.sam
-rw-r--r-- 1 r0829308 domain users   580905214 Nov 30 00:04 SRR13516821_1.fastq.bam
-rw-r--r-- 1 r0829308 domain users     2677232 Nov 30 00:15 SRR13516821_1.fastq.bam.bai
-rw-r--r-- 1 r0829308 domain users  8446737478 Nov 29 22:56 SRR13516823_1.fastq.Aligned.out.sam
-rw-r--r-- 1 r0829308 domain users   642591761 Nov 30 00:08 SRR13516823_1.fastq.bam
-rw-r--r-- 1 r08

### Inspect the BAM file(s)

In [41]:
#inspect the bam file
samtools view SRR13516818_1.fastq.bam | head -3

SRR13516818.44629	0	chrM	1	255	3S47M	*	0	0	ATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATG	FFFFFFFFFFFFFFFFFFFFFFF,FFFFF:FF:FFF:FFFFFFF,FFFFF	NH:i:1	HI:i:1	AS:i:46	nM:i:0
SRR13516818.1375092	0	chrM	1	255	3S47M	*	0	0	ATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATG	FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:46	nM:i:0
SRR13516818.2108425	0	chrM	1	255	3S47M	*	0	0	ATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:46	nM:i:0
samtools view: writing to standard output failed: Broken pipe
samtools view: error closing standard output: -1


With `samtools idxstats` we check how many reads map to each chromosome

In [42]:
#!/bin/bash
for filename in /mnt/storage/$USER/jupyternotebooks/Assignment/*.fastq; 
do
    echo $filename
    samtools idxstats $filename.bam
    echo '---------------------------------'
done

/mnt/storage/r0829308/jupyternotebooks/Assignment/SRR13516818_1.fastq
chrM	16571	448238	0
chr1	249250621	4002904	0
chr2	243199373	3167089	0
chr3	198022430	2376001	0
chr4	191154276	1567011	0
chr5	180915260	2286799	0
chr6	171115067	2076256	0
chr7	159138663	1916138	0
chr8	146364022	1290835	0
chr9	141213431	1375984	0
chr10	135534747	1472035	0
chr11	135006516	1900831	0
chr12	133851895	2624576	0
chr13	115169878	868959	0
chr14	107349540	1203185	0
chr15	102531392	1132161	0
chr16	90354753	1507230	0
chr17	81195210	2104733	0
chr18	78077248	578034	0
chr19	59128983	1412524	0
chr20	63025520	1044821	0
chr21	48129895	358809	0
chr22	51304566	791467	0
chrX	155270560	1416151	0
chrY	59373566	99246	0
*	0	0	0
---------------------------------
/mnt/storage/r0829308/jupyternotebooks/Assignment/SRR13516819_1.fastq
chrM	16571	696528	0
chr1	249250621	5759445	0
chr2	243199373	4455162	0
chr3	198022430	3341906	0
chr4	191154276	2157280	0
chr5	180915260	3236984	0
chr6	171115067	2995438	0
chr7	159138663	2766003	0
chr8

Running `samtools flagstat` tells us what the distribution of mapping flags (column 2 in the sam/bam file) is:

    0x0001	p	the read is paired in sequencing
    0x0002	P	the read is mapped in a proper pair
    0x0004	u	the query sequence itself is unmapped
    0x0008	U	the mate is unmapped
    0x0010	r	strand of the query (1 for reverse)
    0x0020	R	strand of the mate
    0x0040	1	the read is the first read in a pair
    0x0080	2	the read is the second read in a pair
    0x0100	s	the alignment is not primary
    0x0200	f	the read fails platform/vendor quality checks
    0x0400	d	the read is either a PCR or an optical duplicate
    0x0800	S	the alignment is supplementary
    
See [here](http://www.htslib.org/doc/samtools.html) for a more extensive explanation of the `samtools flagstat` output

In [44]:
#!/bin/bash
for filename in /mnt/storage/$USER/jupyternotebooks/Assignment/*.fastq; 
do
    samtools flagstat $filename.bam
    echo '--------------------------------'
done

39022017 + 0 in total (QC-passed reads + QC-failed reads)
5882764 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
39022017 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
--------------------------------
56072618 + 0 in total (QC-passed reads + QC-failed reads)
8828179 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
56072618 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
--------------------------------
42972999 + 0 in total (QC-passed reads + QC-failed reads)
7293840 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
42972999 + 0 mapped (100.00

### Reads to gene counts

First we'll make a symbolic link to (**a part of!**) the human annotation.

Note, if you want to do this for your own project, a full gtf file (containing all chromosomes) can be found in:

    /mnt/nfs/data/RNA-seq/gencode.v19.nopseudo.plus.sort.gtf

In [45]:
ln -sf /mnt/nfs/data/RNA-seq/gencode.v19.nopseudo.plus.sort.gtf .
ls -l *gtf

lrwxrwxrwx 1 r0829308 domain users 56 Nov 30 00:21 gencode.v19.nopseudo.plus.sort.gtf -> /mnt/nfs/data/RNA-seq/gencode.v19.nopseudo.plus.sort.gtf


In [50]:
featureCounts -Q 10 -g gene_name -a /mnt/nfs/data/RNA-seq/gencode.v19.nopseudo.plus.sort.gtf -o all.counts SRR13516818_1.fastq.bam SRR13516819_1.fastq.bam SRR13516821_1.fastq.bam SRR13516823_1.fastq.bam 


       [44;37m =====      [0m[36m   / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
       [44;37m   =====    [0m[36m  | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
       [44;37m     ====   [0m[36m   \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
       [44;37m       ==== [0m[36m   ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
	  v1.6.0

||  [0m                                                                          ||
||             Input files : [36m4 BAM files  [0m [0m                                   ||
||                           [32mS[36m SRR13516818_1.fastq.bam[0m [0m                       ||
||                           [32mS[36m SRR13516819_1.fastq.bam[0m [0m                       ||
||                           [32mS[36m SRR13516821_1.fastq.bam[0m [0m                       ||
||                           [32mS[36m SRR13516823_1.fastq.bam[0m [0m                       ||
||  [0m                                             

In [51]:
#check the presence of two files: all.counts and all.counts.summary
ls -l *count*

-rw-r--r-- 1 r0829308 domain users 30949261 Nov 30 12:36 all.counts
-rw-r--r-- 1 r0829308 domain users      564 Nov 30 12:36 all.counts.summary


In [53]:
head all.counts

# Program:featureCounts v1.6.0; Command:"featureCounts" "-Q" "10" "-g" "gene_name" "-a" "/mnt/nfs/data/RNA-seq/gencode.v19.nopseudo.plus.sort.gtf" "-o" "all.counts" "SRR13516818_1.fastq.bam" "SRR13516819_1.fastq.bam" "SRR13516821_1.fastq.bam" "SRR13516823_1.fastq.bam" 
Geneid	Chr	Start	End	Strand	Length	SRR13516818_1.fastq.bam	SRR13516819_1.fastq.bam	SRR13516821_1.fastq.bam	SRR13516823_1.fastq.bam
MIR1302-11	chr1;chr1;chr1;chr1;chr1;chr1	29554;30267;30366;30564;30976;30976	30039;30667;30503;30667;31097;31109	+;+;+;+;+;+	1021	0	0	0	0
FAM138A	chr1;chr1;chr1;chr1;chr1	34554;35245;35277;35721;35721	35174;35481;35481;36073;36081	-;-;-;-;-	1219	0	0	0	0
OR4F5	chr1	69091	70008	+	918	0	0	0	0
RP11-34P13.7	chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1	89295;92091;92230;110953;112700;112700;112700;120721;120775;129055;129055;129081;133374	91629;92240;92240;111357;112804;112804;112804;120932;120932;129173;129217;129223;133566	-;-;-;-;-;-;-;-;-;-;-;-;-	3569	0	0	0	0
RP11-34P13.8	ch

In [52]:
head all.counts.summary

Status	SRR13516818_1.fastq.bam	SRR13516819_1.fastq.bam	SRR13516821_1.fastq.bam	SRR13516823_1.fastq.bam
Assigned	25737451	36535778	27532411	29345927
Unassigned_Unmapped	0	0	0	0
Unassigned_MappingQuality	8920095	13358916	11211380	12191661
Unassigned_Chimera	0	0	0	0
Unassigned_FragmentLength	0	0	0	0
Unassigned_Duplicate	0	0	0	0
Unassigned_MultiMapping	0	0	0	0
Unassigned_Secondary	0	0	0	0
Unassigned_Nonjunction	0	0	0	0


Note that there are a number of columns on the gene structure, and a number with the actual counts. We'll separate these:

In [54]:
cut -f-6 all.counts  > all.genedata.tsv

In [55]:
head all.genedata.tsv

# Program:featureCounts v1.6.0; Command:"featureCounts" "-Q" "10" "-g" "gene_name" "-a" "/mnt/nfs/data/RNA-seq/gencode.v19.nopseudo.plus.sort.gtf" "-o" "all.counts" "SRR13516818_1.fastq.bam" "SRR13516819_1.fastq.bam" "SRR13516821_1.fastq.bam" "SRR13516823_1.fastq.bam" 
Geneid	Chr	Start	End	Strand	Length
MIR1302-11	chr1;chr1;chr1;chr1;chr1;chr1	29554;30267;30366;30564;30976;30976	30039;30667;30503;30667;31097;31109	+;+;+;+;+;+	1021
FAM138A	chr1;chr1;chr1;chr1;chr1	34554;35245;35277;35721;35721	35174;35481;35481;36073;36081	-;-;-;-;-	1219
OR4F5	chr1	69091	70008	+	918
RP11-34P13.7	chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1	89295;92091;92230;110953;112700;112700;112700;120721;120775;129055;129055;129081;133374	91629;92240;92240;111357;112804;112804;112804;120932;120932;129173;129217;129223;133566	-;-;-;-;-;-;-;-;-;-;-;-;-	3569
RP11-34P13.8	chr1;chr1	89551;90287	90050;91105	-;-	1319
AL627309.1	chr1;chr1	134901;137621	135802;139379	-;-	2661
RP11-34P13.14	chr1;chr1	13979

In [56]:
cut -f1,7- all.counts | grep -v '^#' > all.gene.counts

In [61]:
head all.gene.counts

wc -l all.gene.counts
#in total, 41 864 genes are analysed

Geneid	SRR13516818_1.fastq.bam	SRR13516819_1.fastq.bam	SRR13516821_1.fastq.bam	SRR13516823_1.fastq.bam
MIR1302-11	0	0	0	0
FAM138A	0	0	0	0
OR4F5	0	0	0	0
RP11-34P13.7	0	0	0	0
RP11-34P13.8	0	0	0	0
AL627309.1	0	0	0	0
RP11-34P13.14	0	0	0	0
RP11-34P13.13	0	0	0	0
RNU6-1100P	0	0	0	0
41864 all.gene.counts


The paper did not mention control genes. However, they identified a load of genes that are involved in regulation of mTOR, mitochondrial translation, global translation and ribosome biogenesis. We sampled three genes that encode for three well-know mTORc1 negative regulators. MTOR gene was also included.  
For more information, please refer to the master report - <i> Introduction and Context </i>. 

In [1]:
#compute the number of counts for genes of interest
#!bin/bash 
for gene in MTOR RNF152 SESN2 SESN3 FNIP1 RRAGD;
do
    grep -w "$gene" all.gene.counts
done

MTOR	4383	6679	5240	5362
MTOR-AS1	0	0	0	0
RNF152	297	404	94	39
SESN2	3775	6668	1655	2100
SESN3	24	53	0	2
FNIP1	2602	3425	1048	1064
RRAGD	1710	2656	775	1038
