**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../config/config_sing.R")))
suppressMessages(suppressWarnings(library("DESeq2")))
show_env()

You are in Singularity: singularity_proj_encode_fcc 
BASE DIRECTORY (FD_BASE): /data/reddylab/Kuei 
WORK DIRECTORY (FD_WORK): /data/reddylab/Kuei/out 
CODE DIRECTORY (FD_CODE): /data/reddylab/Kuei/code 
PATH OF PROJECT (FD_PRJ): /data/reddylab/Kuei/code/Proj_CombEffect_ENCODE_FCC 
PATH OF RESULTS (FD_RES): /data/reddylab/Kuei/out/proj_combeffect_encode_fcc 
PATH OF LOG     (FD_LOG): /data/reddylab/Kuei/out/proj_combeffect_encode_fcc/log 


## Import count matrix and metadata

In [2]:
PREFIX = "KS91_K562_ASTARRseq"
FOLDER = "coverage_astarrseq_peak_macs_input"

fdiry = file.path(FD_RES, "results", PREFIX, FOLDER, "summary")

fname = "matrix.raw.count.WGS.tsv"
fpath = file.path(fdiry, fname)
dat_count = read_tsv(fpath, show_col_types = FALSE)

fname = "metadata.raw.WGS.tsv"
fpath = file.path(fdiry, fname)
dat_meta = read_tsv(fpath, show_col_types = FALSE)

**Arrange count matrix and metadata**

In [3]:
dat_col = dat_meta  %>% 
    dplyr::select(Sample, Group) %>% 
    dplyr::rename(condition = Group) %>%
    column_to_rownames(var = "Sample")

dat_cnt = dat_count %>% 
    column_to_rownames(var = "Peak")

dat_cnt[is.na(dat_cnt)] = 0

**Show data**

In [4]:
head(dat_cnt)

Unnamed: 0_level_0,Input.rep1,Input.rep2,Input.rep3,Input.rep4,Input.rep5,Input.rep6,Output.rep1,Output.rep2,Output.rep3,Output.rep4
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
chr1:100006256-100006880,121,176,180,155,147,152,10,32,24,51
chr1:100010437-100010915,103,122,125,146,123,119,2,9,15,28
chr1:10002087-10003910,399,561,538,546,536,458,28,77,70,127
chr1:100021298-100021629,79,106,121,106,96,92,2,7,12,16
chr1:100023727-100023976,48,72,72,68,79,57,11,11,7,14
chr1:100027983-100029702,480,611,744,697,676,573,37,108,110,165


In [5]:
dat_col

Unnamed: 0_level_0,condition
Unnamed: 0_level_1,<chr>
Input.rep1,Input
Input.rep2,Input
Input.rep3,Input
Input.rep4,Input
Input.rep5,Input
Input.rep6,Input
Output.rep1,Output
Output.rep2,Output
Output.rep3,Output
Output.rep4,Output


In [6]:
print(all(rownames(dat_col) %in% colnames(dat_cnt)))
print(all(rownames(dat_col) ==   colnames(dat_cnt)))

[1] TRUE
[1] TRUE


## Setup DESeq2

In [7]:
dds = DESeqDataSetFromMatrix(
    countData = dat_cnt, 
    colData   = dat_col, 
    design    = ~condition)

converting counts to integer mode

“some variables in design formula are characters, converting to factors”


**Pre-filtering**

In [8]:
### remove the peaks which have < 10 reads
cat("Before filter:", nrow(dds), "\n")
dds = dds[rowSums(counts(dds)) >= 10,]
cat("After  filter:", nrow(dds), "\n")

### set control condition as reference
dds$condition <- relevel(dds$condition, ref = "Input")

Before filter: 246852 
After  filter: 246850 


## Run DESeq2

In [9]:
dds = DESeq(dds)

estimating size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing



## Get results

In [10]:
resultsNames(dds)

In [11]:
res = results(dds)
res = as.data.frame(res) %>% rownames_to_column(var = "Peak")
head(res)

Unnamed: 0_level_0,Peak,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,chr1:100006256-100006880,74.39238,-0.01311792,0.15512423,-0.08456397,0.932608,0.9510236543
2,chr1:100010437-100010915,48.25931,-0.80687947,0.22174651,-3.63874709,0.0002739677,0.0008947519
3,chr1:10002087-10003910,224.35927,-0.34892859,0.09743886,-3.58100041,0.0003422811,0.0010926173
4,chr1:100021298-100021629,38.13434,-1.03799571,0.25401013,-4.08643436,4.380534e-05,0.0001707736
5,chr1:100023727-100023976,32.62049,-0.19100207,0.25625467,-0.74536036,0.4560539,0.5497617125
6,chr1:100027983-100029702,290.98442,-0.18001582,0.09610434,-1.87312902,0.06105057,0.1032752224


## Save results

In [12]:
fdiry = file.path(FD_RES, "results", PREFIX, FOLDER, "summary")
fname = "result.Log2FC.raw.deseq.WGS.tsv"
fpath = file.path(fdiry, fname)
print(fpath)

write_tsv(res, fpath)

[1] "/data/reddylab/Kuei/out/proj_combeffect_encode_fcc/results/KS91_K562_ASTARRseq/coverage_astarrseq_peak_macs_input/summary/result.Log2FC.raw.deseq.WGS.tsv"
