# DESeq2: Create Data Objects

## Objective: Create a DESeqDataSet object

### Initial note

First step is to create a countData and colData object (see ? DESeqDataSet)

countData: for matrix input: a matrix of non-negative integers

 colData: for matrix input: a ‘DataFrame’ or ‘data.frame’ with at least
          a single column. Rows of colData correspond to columns of
          countData


### Load packages

In [None]:
library(tidyverse)
library(DESeq2)

### Load the 2019 pilot count objects from the image file

In [None]:
curdir <- "/home/jovyan/work/scratch/analysis_output"

imgdir <- file.path(curdir, "img")

imgfile <- file.path(imgdir, "pilotcnt2019.RData")

imgfile

attach(imgfile)

tools::md5sum(imgfile)

### List the objects that have been attached
ls(2)

cnt2019 <- cnt2019
mtdf2019 <- mtdf2019

detach(pos = 2)

### Check dimensions of the two objects

In [None]:
dim(cnt2019)
dim(mtdf2019)

In [None]:
mtdf2019 %>% head

In [None]:
cnt2019[,1:5]

### Create columnData object

In [None]:
# columnData --- metadata
mtdf2019 %>%
    DataFrame ->
        columnData2019

### Add the labels as rownames
rownames(columnData2019) <- columnData2019[["Label"]]

columnData2019[, c("Label", "genotype", "condition")] %>% head

In [None]:
### Note that libraries are across rows and libraries across columns
### DESeq2 requires that the matrix is transposed as that the gene names become row names
cnt2019[1:4,1:5]

In [None]:
### Transpose the count matrix (so that libraries are across the columns and genes across rows) 
### Note that as.matrix() converts the tibble to a matrix object
cnt2019 %>%
    gather(key = gene, value = value, 2:ncol(cnt2019)) %>% 
            spread_(key = names(cnt2019)[1],value = 'value') %>%
                column_to_rownames("gene") %>%
                    as.matrix ->
                        countData2019

countData2019[1:5, 1:6]




### Reorder the columns of the count matrix according to the order of Label in columnData

In [None]:
### The two sets coincide
setequal(columnData2019[["Label"]], colnames(countData2019))


In [None]:
### but they are identical as they follow different orders
identical(columnData2019[["Label"]], colnames(countData2019))

### Reorder the columns of countData 

In [None]:
countData2019 <- countData2019[,columnData2019[["Label"]]]

In [None]:
countData2019[1:4,1:5]

### make sure that labels match

In [None]:
### The two variables coincide
setequal(columnData2019[["Label"]], colnames(countData2019))
### but they are identical as they follow different orders
identical(columnData2019[["Label"]], colnames(countData2019))

### Make DESeq object on the basis of the counts

The design option allows you to specify an additive or a multiplicitive model

Additive model

In [None]:
dds_add <- DESeqDataSetFromMatrix(
    countData2019,                      # Count matrix
    columnData2019,                     # metadata
    ~ condition + genotype)             # design formula

Inspect object

In [None]:
dds_add

In [None]:
slotNames(dds_add)

Check design

In [None]:
dds_add@design

Check column data

In [None]:
dds_add@colData

Get count matrix

In [None]:
counts(dds_add)[1:10,1:10]

Change design: multiplicative model

In [None]:
dds_mult <- DESeqDataSetFromMatrix(
    countData2019,                       # Count matrix
    columnData2019,                      # metadata
    ~ condition + genotype + condition:genotype) # design formula

In the following demonstration, we will use the additive model. The multiplicitive model will be illustrated in the appendix below.

In [None]:
dds2019 <- dds_add

In [None]:
curdir <- "/home/jovyan/work/scratch/analysis_output"
imgdir <- file.path(curdir, "img")

imgfile <- file.path(imgdir, "pilotdds2019.RData")

imgfile

In [None]:
save(dds2019, file = imgfile)
tools::md5sum(imgfile)

In [None]:
sessionInfo()