# Now we will load DESeq2 and perform PCA and differential expression

You will first need to install DESeq2. This is a program that runs in R, so we need to install it within R. First, we will use conda to download R in our anaconda. If you check which R before installing it to conda, it might take you to a R installation that already exists on TSCC. Since we want to be in control of our specific version and installs, we will use anaconda to install it.

    conda install -c r r=3.2.2
    
In order to run DESeq2, we need to update our verion of gcc (I found this out by trying to install DESeq2 and getting errors that it needs a newer version of this program. We are going to specify the version to download.

    conda install -c anaconda gcc=4.8.5

Open R in your terminal on TSCC by typing:

    R
    
This will now take you into R where you can do your installation. Note that your command line now has a > rather than your TSCC login. This is specific to the R language. Now you are coding in R, not BASH. Use the following two commands to install DESeq2:

    source("http://bioconductor.org/biocLite.R")
    
    biocLite("DESeq2")
    
It will ask you about updating packages:

    Update all/some/none? [a/s/n]: 
    
UPDATE NONE!!! Type n to update NONE.
    
Did it work?

    library("DESeq2")
    
It should load without any error messages. To get back to a bash command line, quit R with:

    quit()
    
Do you want to save the workspace image? No.
    
We want to install a few more packages with R that we will use later on. 

    conda install -c r r-essentials=1.1
    
Now when you open a jupyter notebook, you will have the option to select R as a kernel in a new notebook. 

In [None]:
library("DESeq2")

library("ggplot2")

library("RColorBrewer")

In [None]:
counts <- read.csv('/home/ucsd-train01/scratch/projects/lin28b_shrna/all_bams/counts_for_deseq2.csv',
                  header=TRUE, row.names=1)

counts$Length <- NULL

head(counts)

In [None]:
col_data <- read.csv('/home/ucsd-train01/scratch/projects/lin28b_shrna/all_bams/conditions_matrix_deseq2.csv',
                  header=TRUE, row.names=1)

head(col_data)

In [None]:
dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = col_data,
                              design = ~ condition)

In [None]:
dds <- dds[ rowSums(counts(dds)) > 4, ]

In [None]:
dds <- DESeq(dds)

In [None]:
res <- results(dds)

write.csv(as.data.frame(res), file="/home/ucsd-train01/scratch/projects/lin28b_shrna/all_bams/differential_expression.csv")

In [None]:
summary(res)

In [None]:
plotMA(res, main="DESeq2", ylim=c(-2,2))

In [None]:
res05 <- results(dds, alpha=0.05)

plotMA(res05, main="alpha=0.05", ylim=c(-2,2))

In [None]:
rld <- rlog(dds)
vsd <- varianceStabilizingTransformation(dds)

data <- plotPCA(rld, intgroup="condition", returnData=TRUE)
percentVar <- round(100 * attr(data, "percentVar"))
ggplot(data, aes(PC1, PC2, color=condition)) +
    geom_point(size=3) +
    xlab(paste0("PC1: ",percentVar[1],"% variance")) 
    ylab(paste0("PC2: ",percentVar[2],"% variance"))

In [None]:
sampleDists <- dist(t(assay(rld)))


sampleDistMatrix <- as.matrix(sampleDists)

rownames(sampleDistMatrix) <- paste(rld$condition)

colnames(sampleDistMatrix) <- paste(rld$condition)

colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)

heatmap(sampleDistMatrix,
clustering_distance_rows=sampleDists,
clustering_distance_cols=sampleDists,
col=colors)

**Q: How do I make a plot of the counts for the gene that has the smallest adjusted p-value?**