# DESeq2 with miRNA counts table

[Resources] (http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html)

In [None]:
options(stringsAsFactors = FALSE)

Load required libraries (install packages if required)

In [None]:
library(ggplot2)
library(gplots)
library(data.table)
library(limma)
library(DESeq2)
library(RColorBrewer)
library(apeglm) 

### Creating DESeq2 object

Read in counts file from `Data/Serum_ExoR_Nor_miRNA_counts.csv` and view head of file

In [None]:
counts<-read.csv("FILE",header=TRUE,row.names=1)

In [None]:
head()

**Make DESeq2 object: Define counts matrix, groups, and design**
- countData: matrix of counts
- colData: dataframe with metadata for each sample
- design: name of column in colData we want to use as comparators 

In [None]:
countData<-as.matrix(VARIABLE)
ncol(countData)

In [None]:
colData<-data.frame(condition=c(rep("ExoR",times=4),rep("Nor",times=3)))
colData

Create the DESeq2 object using `DESeqDataSetFromMatrix()`

In [None]:
dds <- DESeqDataSetFromMatrix(countData = VARIABLE, colData = VARIABLE, design = ~VARIABLE) 
dds
class(dds)

### Normalization

Calculate normalization factors and read out normalized counts file. DESeq2 doesn’t actually use normalized counts, rather it uses the raw counts and models the normalization inside the Generalized Linear Model (GLM). 

To do this, we can use the function [`estimateSizeFactors()`](https://rdrr.io/bioc/DESeq2/man/estimateSizeFactors.html), which "estimates the size factors using the "median ratio method" described by Equation 5 in Anders and Huber (2010)".

In [None]:
dds <- estimateSizeFactors(dds)
sizeFactors(dds)

These normalized counts will be useful for downstream visualization of results, but cannot be used as input to DESeq2 or any other tools that peform differential expression analysis which use the negative binomial model.

We can extract the normalized counts using `counts()` with the parameter `normalized=TRUE`.

In [None]:
normalized_counts <- counts(VARIABLE, normalized=PARAMETER)
write.table(normalized_counts, file="output/DESeq2normalized_counts_ExoR_Nor.txt", sep="\t", quote=F, col.names=NA)

### Run DESeq2 

In [None]:
#Run DESeq2 
dds<-DESeq(dds)

The [`results()`](https://rdrr.io/bioc/DESeq2/man/results.html) function "extracts a result table from a DESeq analysis giving base means across samples, log2 fold changes, standard errors, test statistics, p-values and adjusted p-values". We want to contrast our two conditions "ExoR" and "Nor", and filter the genes to have the FDR cutoff at 0.05. 

"alpha" is the significance cutoff used for optimizing the independent filtering (by default 0.1). If the adjusted p-value cutoff (FDR) will be a value other than 0.1, alpha should be set to that value. Here we have set alpha to 0.05.

In [None]:
res<-results(VARIABLE, contrast=c("condition","ExoR","Nor"),alpha=NUM)
head(res)

Make table of results and order by adjusted p-value using `table()`.

In [None]:
table(res$padj<0.05)
res <- res[order(res$padj),]

Merge `res` with `normalized_counts` and write out final table

In [None]:
resdata <- merge(as.data.frame(VARIABLE), as.data.frame(VARIABLE)), by="row.names", sort=FALSE)

write.table(resdata,file="output/ExoR_Nor_DESeq2_resultsalpha0.05.txt",sep="\t")

In [None]:
head(var)

## Plots

### MA Plot
In DESeq2, the function plotMA shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples in the DESeqDataSet. Points will be colored red if the adjusted p value is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

Generate an MA plot from `res`.

In [None]:
plotMA(VARIABLE, ylim=c(-9,9))

The shrunken fold changes are useful for ranking genes by effect size and for visualization. We can do this using `lfcShrink()`. We need to set the following parameters: `coef="condition_Nor_vs_ExoR` and `type="apeglm"`. 

In [None]:
#Log fold change shrinkage for visualization and ranking
resultsNames(dds)
resLFC <- lfcShrink(dds, coef="PARAMETER",type="PARAMETER")
resLFC

Plot the results from `resLFC` using the shrunken log fold change data.

In [None]:
plotMA(VARIABLE, ylim=c(-5,5))

### PCA Plot
Related to the distance matrix is the PCA plot, which shows the samples in the 2D plane spanned by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects.

First apply a regularized log transformation using `rlog()` to `dds`.

In [None]:
###function rlog returns a SummarizedExperiment object which contains the rlog-transformed values in its assay slot:
rld <- rlog(VARIABLE)
head(rld)

Then plot the results from `rld` using `plotPCA()`.

In [None]:
#plot PCA
plotPCA(VARIABLE, intgroup = c("condition"))

### Heatmap of sample-to-sample distances
A heatmap of this distance matrix gives us an overview over similarities and dissimilarities between samples. 

The [`dist()`](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/dist) function "computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix".

In [None]:
#visualize sample distances
sampleDists <- dist(t(assay(rld)))
sampleDistMatrix <- as.matrix(sampleDists)
head(sampleDistMatrix)

Rename the rows and columns so it only contains the condtion.

In [None]:
rownames(sampleDistMatrix) <- paste(rld$condition, sep="-")
colnames(sampleDistMatrix) <- paste(rld$condition, sep="-")
head(sampleDistMatrix)

Plot the `sampleDistMatrix` using `heatmap.2()`

In [None]:
colours = colorRampPalette(rev(brewer.pal(9, "Blues")))(255)
heatmap.2(sampleDistMatrix, trace="none", col=colours)

### Heatmap of miRNAs significantly different between groups
Make list of DE miRNAs and heatmap using heatmap.2.

Read in the DE miRNAs from `data/DESeq2_p0.05_80miRNAs.txt` then subset the `normalized_counts` to only get the data for the DE miRNAs.

In [None]:
for_merge<-data.frame(read.delim("FILE",header=TRUE,row.names=1))
miRNAs<-as.list(row.names(for_merge))
miRNAs_norm<-subset(VARIABLE, rownames(normalized_counts) %in% miRNAs)

Plot a heatmap using `miRNAs_norm` using `heatmap.2()`.

In [None]:
mypalette <- brewer.pal(11,"RdYlBu")
morecols <- colorRampPalette(mypalette)
heatmap.2(VARIABLE,col=rev(morecols(50)),trace="none", main="p<0.05 DESeq2 normalized",scale="row",margins=c(9,9), cexCol=0.7)

### Volcano plot
The volcano plot enables to simultaneously capture the effect size and significance (ordinate) of each tested gene.

Create a volcano plot using `res`.

In [None]:
#volcano plot
#reset par
par(mfrow=c(1,1))
# Make a basic volcano plot
with(VARIABLE, plot(log2FoldChange, -log10(pvalue), pch=20, main="Volcano plot", xlim=c(-3,3),ylim=c(0,20)))

# Add colored points: blue if padj<0.01, red if log2FC>1 and padj<0.05)
with(subset(res, padj<.01 ), points(log2FoldChange, -log10(pvalue), pch=20, col="PARAMETER"))
with(subset(res, padj<.01 & abs(log2FoldChange)>2), points(log2FoldChange, -log10(pvalue), pch=20, col="PARAMETER"))