# CRC Differential Expression - Modification
```
pi:ababaian
files: ~/Crown/data2/crcmod_dge
start: 2017 08 23
complete : YYYY MM DD
```
## Introduction

20170825 Writeup

Quite a simple experiment, in the CRC data there are two classes of cancers, those with 1248U hypo-modification and  "normal" levels of modification.

The main motivation is trying to understand what the underlying reason or mechanism for this difference is. An obvious candidate would be a recurrent deficiency in nucleotide synthesis, methionine (SAM) metabolism, pseudouradine synthesases or something along those lines.

Since the hypo-modification is continuous, the data was seperated into two groups based on ranking each CRC library. From the 69 CRC sequencing libraries, the top 15 hypomodified and bottom 15 controls were selected and their respective 'normal' tisue controls for a 2-way analysis (CRC_Hypo vs. CRC_Control & CRC vs. Normal).



## Objective


## Materials and Methods

#### CRC Libraries
The alignments were taken from the previous CRC hg19 alignments from the LIONS project (by MMK). These are from the genentech publication.
```
@PG	ID:TopHat	VN:2.0.8	CL:/home/ababaian/software/tophat-2.0.8/tophat -p 6 -r 200 --report-secondary-alignments -o /scratch/epigenomics/mkarimi/human/libs/587221/ /home/ababaian/resources/hg19r/hg19r /scratch/epigenomics/mkarimi/human/Colon/587221_1_1.fastq.gz /scratch/epigenomics/mkarimi/human/Colon/587221_1_2.fastq.gz
```

ERRORS
```
1 587271 SEGMENTATION FAULT
2 587290 ALIGNMENT FILE MISSING -- Hypo
3 587288 ALIGNMENT FILE CORRUPTED -- Hypo
4 587328 ALIGNMENT FILE CORRUPTED -- Ctrl
```

#### Ranking CRC Libraries
In `crcmod30_dge.xlsx` all the libraries were ranked based on their 1248U Reference Allele Frequency. Two crc15 libraries (*) had errors in the bam files and were omitted, so the next two on the ranked list were taken instead.

The respective normal libraries are one number greater then the ones listed below.

*crc15_hypo*
```
587264
587302
587384
587290*
587270
587288*
587344
587278
587254
587260
587380
587346
587292
587350
587386
587268
587322
```
*crc15_ctrl*
```
587300
587348
587340
587360
587358
587392
587338
587236
587342
587294
587372
587400
587230
587398
587316
```


`runDiff_crc30.sh`

```
#!/bin/bash

cd /home/ababaian/projects/Colon_DGE/

touch RUNNING

# CuffDiff on all Cancer Samples


cuffdiff -p 8 -o CRCMOD30/ -L norm15_ctrl,norm15_hypo,crc15_ctrl,crc15_hypo \
gencode.v26lift37.basic.annotation.gtf \
$(cat norm15_ctrl.bams) \
$(cat norm15_hypo.bams) \
$(cat crc15_ctrl.bams)  \
$(cat crc15_hypo.bams)
```


`crc30_analysis.r`

```
# crc30_analysis.r

library(cummeRbund)

# CRC Analysis ==============================================================
cuff = readCufflinks(dir='./cufflinks/', genome = 'hg38' , gtfFile = "gencode.v26lift37.basic.annotation.gtf")
cuff

#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday
  gene_diff_data <- diffData(genes(cuff))   

#Do gene_diff_data followed by tab to see all the variables in this data object
  sig_gene_data  <- subset(gene_diff_data,
                           #(sample_1 == 'crc15_ctrl'),
                           #(sample_2 == 'crc15_hypo'),
                           (significant ==  'yes'),
                           level='genes')
  
  # CRC ctrl vs. CRC hypo comparison
  crcmod_gene_data = gene_diff_data[(gene_diff_data$sample_1 == 'crc15_ctrl' & gene_diff_data$sample_2 == 'crc15_hypo'),]
  
  crcmod_sig_data =  subset(crcmod_gene_data,
                            (significant ==  'yes'),
                            level='genes')

  write.table(crcmod_sig_data, file = 'crc30_mod.csv', quote = F,sep = '\t')
  
  
#Retrive significant gene IDs (XLOC) with a pre-specified alpha
#Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes
  diffGeneIDs <- getSig(cuff, level="genes", alpha=0.05)
  diffGenes <-   getGenes(cuff, diffGeneIDs)

# Add Feature Data
  CRC30 = cbind(featureNames(diffGenes)[,2], sig_gene_data)
  rm(sig_gene_data, gene_diff_data, diffGenes, diffGeneIDs, cuff)

  write.table( sig_gene_data, file = 'crc30_diffExp.csv', append = F, quote = F, sep ='\t', row.names = F)
  

# Analysis  ==============================================================

#Below command is equivalent to looking at the gene_exp.diff file that we spent a lot of time parsing yesterday
gene_diff_data <- diffData(genes(cuff))   

#Do gene_diff_data followed by tab to see all the variables in this data object
sig_gene_data  <- subset(gene_diff_data, (significant ==  'yes'))

#Retrive significant gene IDs (XLOC) with a pre-specified alpha
#Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes
diffGeneIDs <- getSig(cuff,level="genes", alpha=0.05)
diffGenes <- getGenes(cuff,diffGeneIDs)

# Plot all pair-wise comparisons    
mySigMat <- sigMatrix(cuff, level="genes", alpha=0.05) # constructs the plot
mySigMat


myGene <- getGene(cuff, "RPPH1")
#  expressionPlot(myGene) # Will collapse replicates and only show gene level FPKM
#  expressionPlot(myGene, replicates=T) # Will show replicate FPKMs
  expressionPlot(myGene, replicates=T, logMode = T ) # Show as bar plot instead
  
#myGene <- getGene(cuff, "SI")
#  expressionPlot(myGene, replicates=T,logMode = T ) # Show as bar plot instead
# 
```


## Results


#### lncRNA genes
```
gene_id	Gene name	Gene description	sample_1	sample_2	status	value_1	value_2	log2_fold_change	test_stat	p_value	q_value	significant
ENSG00000269900	RMRP	RNA component of mitochondrial RNA processing endoribonuclease [Source:HGNC Symbol;Acc:HGNC:10031]	crc15_ctrl	crc15_hypo	OK	681.025	41.9002	-4.02268	-8.96698	5.00E-05	0.000968818	yes
ENSG00000259001	AL355075.4	RNase P -- RPPH1(ENSG00000252678.1)	crc15_ctrl	crc15_hypo	OK	187.579	6.17257	-4.92549	-2.37484	0.00015	0.00249348	yes
RNU4-2	RNA, U4 small nuclear 2 [Source:HGNC Symbol;Acc:HGNC:10193]	crc15_ctrl	crc15_hypo	OK	743.103	20.466	-5.18226	-69.1121	9.00E-04	0.0107869	yes
SNORD96A	small nucleolar RNA, C/D box 96A [Source:HGNC Symbol;Acc:HGNC:32758]	crc15_ctrl	crc15_hypo	OK	1126.22	1081.91	-0.0578974	-5.56E-05	0.001	0.0117372	yes
MT-RNR1	mitochondrially encoded 12S RNA [Source:HGNC Symbol;Acc:HGNC:7470]	crc15_ctrl	crc15_hypo	OK	680.171	281.846	-1.27099	-1.13E+00	0.00135	0.0148756	yes
```

One discriminating gene *RPPH1* is upregulated quite highly in CRC-control libraries. This is an RNAzyme, component of RNAseP and may play a role in rRNA processing and definetly in tRNA processing.

![RPHH1 CRC30](../../data2/crcmod_dge/plot/RPPH1_crc30.png)

#### DAVID

All the output genes either Upreglated or Downregulated were put into DAVID 6.8 (https://david.ncifcrf.gov/summary.jsp).

Relating to metabolism as defined by the original purpose there is downregulation of 'Triglyceride catabolism' in 1248U hypo-modified libraries. Notably defined by the downregulation of FABP1,2,3. 
```
Category	Term	Count	%	PValue	Genes	List Total	Pop Hits	Pop Total	Fold Enrichment	Bonferroni	Benjamini	FDR
GOTERM_BP_DIRECT	GO:0019433~triglyceride catabolic process	5	3.448275862	3.18E-05	APOB, FABP3, FABP1, FABP2, CPS1	125	25	16792	26.8672	0.020225308	0.020225308	0.047365501
```
There is also non-significant, although enriched Retinol metabolism downregulation.


Cox survival curve on RPPH1, using top 33 and bottom 33 percentiles on oncolnc.org
http://www.oncolnc.org/kaplan/?lower=33&upper=33&cancer=COAD&gene_id=85495&raw=RPPH1&species=mRNA


## Discussion
