# Testing FeatureCounts Behavior

Background: SHAREseq RNA libraries are fr-stranded (read 1 is the original mRNA sequence). To ensure that the featureCounts tool is assigning features to aligned reads properly, we are testing a few scenarios as described below by creating synthetic reads.

Scenarios to test:

1. pos
    - description: read that aligns to the correct strand of an annotation
2. neg
    - description: read that aligns to the opposite strand of an annotation
3. intron
    - description: read that aligns to just an intron
4. splice
    - description: read that aligns across an exon splice junction
5. overlap1
    - description: read that aligns overlapping two transcripts for the same gene
6. overlap2
    - description: read that aligns overlapping with two transcripts from different genes (opposite strands)

<img src='./test_featureCounts_reads_location.png'/>

## manually create synthetic reads

In [None]:
# copy a samfile header
#!module load biology samtools; samtools view -H /mypath/03_rna_star_Aligned.out.bam > /mypath/test_featureCounts/aligned.sam
# then manually edit the sam file

In [70]:
# show synthetic reads in sam file
!module load biology samtools; samtools view aligned.sam

test_pos	0	chr12	130987095	255	100M	*	0	0	GCCCCTATTGGACTCATGTCCTATTTACATGGAAATCCAAGGAGGGCCTGAAAGTCTACGTCAACGGGACCCTGAGCACCTCTGATCCGAGTGGAAAAGT	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:58	NM:i:3	MD:Z:53T6G0G4
test_neg	16	chr12	130987095	255	100M	*	0	0	GCCCCTATTGGACTCATGTCCTATTTACATGGAAATCCAAGGAGGGCCTGAAAGTCTACGTCAACGGGACCCTGAGCACCTCTGATCCGAGTGGAAAAGT	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:58	NM:i:3	MD:Z:53T6G0G4
test_intron	0	chr12	130982275	255	100M	*	0	0	CTGGGCTCCGTCTGCCCGTGCACTGTGTGCCTGCTGTAGGTTACCTGACCACACTGACCTCAGTGATCACACCTGTGATGTGCAGATGATATTGACAGTA	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NH:i:1	HI:i:1	AS:i:58	NM:i:3	MD:Z:53T6G0G4
test_splice	0	chr12	130982013	255	50M5032N50M	*	0	0	TGTATACGCGGGACAATTCCATGACATGGGAGGCCTCCTTCAGCCCCCCAGCCCCTATTGGACTCATGTCCTATTTACATGGA

## run featureCounts

### align to gene, include intronic reads, strand specific (current SHAREseq pipeline strategy)

In [67]:
!featureCounts -Q 30 -a /oak/stanford/groups/wjg/bliu/resources/gtf/gencode.v41.annotation.BPfiltered.gtf -t gene -g gene_name -s 1 -o output.genes -R CORE aligned.sam 2> featureCounts.log
!cat aligned.sam.featureCounts

test_pos	Assigned	1	ADGRD1
test_neg	Unassigned_NoFeatures	-1	NA
test_intron	Assigned	1	ADGRD1
test_splice	Assigned	1	ADGRD1
test_overlap1	Assigned	1	ADGRD1
test_overlap2_pos	Assigned	1	ADGRD1
test_overlap2_neg	Assigned	1	ADGRD1-AS1


### align to exon, exclude intronic reads, strand specific

In [68]:
!featureCounts -Q 30 -a /oak/stanford/groups/wjg/bliu/resources/gtf/gencode.v41.annotation.BPfiltered.gtf -t exon -g gene_name -s 1 -o output.genes -R CORE aligned.sam 2> featureCounts.log
!cat aligned.sam.featureCounts

test_pos	Assigned	1	ADGRD1
test_neg	Unassigned_NoFeatures	-1	NA
test_intron	Unassigned_NoFeatures	-1	NA
test_splice	Assigned	1	ADGRD1
test_overlap1	Assigned	1	ADGRD1
test_overlap2_pos	Assigned	1	ADGRD1
test_overlap2_neg	Assigned	1	ADGRD1-AS1


### align to gene, include intronic reads, non stranded

In [72]:
!featureCounts -Q 30 -a /oak/stanford/groups/wjg/bliu/resources/gtf/gencode.v41.annotation.BPfiltered.gtf -t gene -g gene_name -o output.genes -R CORE aligned.sam 2> featureCounts.log
!cat aligned.sam.featureCounts

test_pos	Assigned	1	ADGRD1
test_neg	Assigned	1	ADGRD1
test_intron	Assigned	1	ADGRD1
test_splice	Assigned	1	ADGRD1
test_overlap1	Assigned	1	ADGRD1
test_overlap2_pos	Unassigned_Ambiguity	-1	NA
test_overlap2_neg	Unassigned_Ambiguity	-1	NA


### align to exon, exclude intronic reads, non stranded

In [75]:
!featureCounts -Q 30 -a /oak/stanford/groups/wjg/bliu/resources/gtf/gencode.v41.annotation.BPfiltered.gtf -t exon -g gene_name -o output.genes -R CORE aligned.sam 2> featureCounts.log
!cat aligned.sam.featureCounts

test_pos	Assigned	1	ADGRD1
test_neg	Assigned	1	ADGRD1
test_intron	Unassigned_NoFeatures	-1	NA
test_splice	Assigned	1	ADGRD1
test_overlap1	Assigned	1	ADGRD1
test_overlap2_pos	Unassigned_Ambiguity	-1	NA
test_overlap2_neg	Unassigned_Ambiguity	-1	NA
