### Gene Expression
###### Pipeline to compare gene expression orthologous and non orthologous genes in two species. 

__Versions__  
Bedtools: 2.29.0   
R: 3.6.1  
DEseq 1.26.0  
GNU Awk 4.0.2   
GNU grep 2.20   
GNU Coreutils 8.22   

In [None]:
# GET DMEL TAD COLUMNS FOR CHROM, START, AND STOP
cut -f 4-6 /dmel_to_dtri_domains/dtri_lo_dmel_0.9r_final | sort -k1,1 -k2,2n | uniq > t1
cut -f 5-7 /dmel_to_dtri_domains/missing_split | sort -k1,1 -k2,2n | uniq > t2
cut -f 4-6 /dmel_to_dtri_domains/dtri_lo_dmel_NOT0.9_contig | sort -k1,1 -k2,2n | uniq > t3
cut -f 7-9 /dmel_to_dtri_domains/dtri_lo_dmel_tande | sort -k1,1 -k2,2n | uniq > t4

# GET RID OF ANY DUPLICATE ENTRIES AND COMBINE THE 3 NONCONSERVED TAD FILES
cat t1 | sort -k1,1 -k2,2n | uniq > dtri_lo_dmel_0.9r_final.bed
cat t2 t3 t4 | sort -k1,1 -k2,2n | uniq > dtri_lo_dmel_NOT_0.9r_final.bed

# GET GENES THAT OVERLAP CONSERVED AND NONCONSERVED DOMAINS
bedtools intersect -u -a dmel-all-r6.21.proteinCoding.MullerIDs.bed -b dtri_lo_dmel_0.9r_final.bed > dtri_lo_dmel_0.9r_final.GENES.bed
bedtools intersect -u -a dmel-all-r6.21.proteinCoding.MullerIDs.bed -b dtri_lo_dmel_NOT_0.9r_final.bed > dtri_lo_dmel_NOT_0.9r_final.GENES.bed

# MAKE NEW FILE WITH ONLY GENE IDs
cut -f 4 dtri_lo_dmel_0.9r_final.GENES.bed | sort | uniq > dtri_lo_dmel_0.9r_final.GENES.IDs
cut -f 4 dtri_lo_dmel_NOT_0.9r_final.GENES.bed | sort | uniq > dtri_lo_dmel_NOT_0.9r_final.GENES.IDs

# COUNT HOW MANY 1-1 ORTHOLOGS ARE IN CONSERVED VS NONCONSERVED DOMAINS
grep -f dtri_lo_dmel_0.9r_final.GENES.IDs dtri_dmel.1to1_orthologs | wc -l
grep -f dtri_lo_dmel_NOT_0.9r_final.GENES.IDs dtri_dmel.1to1_orthologs | wc -l

# USE DESEQ OUTPUT TO CREATE TWO FILES: ONE FOR CONSERVED DOMAIN GENES AND ONE FOR NONCONSERVED
head -1 deseq_output_all.csv > t1
head -1 deseq_output_all.csv > t2
grep -f dtri_lo_dmel_0.9r_final.GENES.IDs deseq_output_all.csv >> t1
grep -f dtri_lo_dmel_NOT_0.9r_final.GENES.IDs deseq_output_all.csv >> t2

# COUNT THE NUMBER OF CONSERVED AND NONCONSERVED DIFFERENTIALLY EXPRESSED GENES WITH ADJUSTED PVALUE<=0.05
awk -F, '{print $7}' t1 | grep -v padj | grep -v NA | awk '$1<=0.05' | wc -l
awk -F, '{print $7}' t2 | grep -v padj | grep -v NA | awk '$1<=0.05' | wc -l

Fisher's Exact Test to compare differentially expressed genes between conserved and nonconserved TADs.

In [None]:
matrix(c(99,1092-99,501,4739-501),nrow=2)->m
fisher.test(m)