**Set environment**

In [1]:
suppressMessages(source("../config_sing.R"))

You are in Singularity: singularity_proj_combeffect 
BASE DIRECTORY:     /mount/work 
PATH OF SOURCE:     /mount/work/source 
PATH OF EXECUTABLE: /mount/work/exe 
PATH OF ANNOTATION: /mount/work/annotation 
PATH OF PROJECT:    /mount/project 
PATH OF RESULTS:    /mount/work/out/proj_combeffect_encode_fcc 


In [2]:
fdiry = file.path(FD_RES, "KS91_K562_ASTARRseq", "coverage")
dir(fdiry)

In [4]:
### init: file directory
fdiry = file.path(FD_RES, "KS91_K562_ASTARRseq", "coverage")

### init:
ctypes = c(col_character(), col_integer(), col_integer())
cnames = c("Chrom", "Loc", "Depth")

### INPUT: set sample group and number of replicates
GROUP   = "Input"
REPLICS = paste0("rep", 1:6)

### INPUT: import data
lst_dat_inp = lapply(REPLICS, function(replic){
    ### get sample file path
    sam   = paste(GROUP, replic, sep="_")
    fglob = paste0("*", sam, "*")
    fpath = Sys.glob(file.path(fdiry, fglob))
    
    ### read data
    dat = read_tsv(
        fpath, 
        col_types = ctypes, 
        col_names = cnames)
    dat$Sample = sam
    return(dat)
})

### OUTPUT: set sample group and number of replicates
GROUP   = "Output"
REPLICS = paste0("rep", 1:4)

### OUTPUT: import data
lst_dat_out = lapply(REPLICS, function(replic){
    ### get sample file path
    sam   = paste(GROUP, replic, sep="_")
    fglob = paste0("*", sam, "*")
    fpath = Sys.glob(file.path(fdiry, fglob))
    
    ### read data
    dat = read_tsv(
        fpath, 
        col_types = ctypes, 
        col_names = cnames)
    dat$Sample = sam
    return(dat)
})

In [3]:
fdiry = file.path(FD_RES, "KS91_K562_ASTARRseq", "fragment")
fname = "library_size.tsv"
dat_lib = read_tsv(file.path(fdiry, fname))
head(dat_lib)

[1mRows: [22m[34m10[39m [1mColumns: [22m[34m4[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (3): Sample, Group, Replicate
[32mdbl[39m (1): Size

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Sample,Group,Replicate,Size
<chr>,<chr>,<chr>,<dbl>
Input_rep1,Input,rep1,358823
Input_rep2,Input,rep2,461577
Input_rep3,Input,rep3,496229
Input_rep4,Input,rep4,464845
Input_rep5,Input,rep5,454013
Input_rep6,Input,rep6,409058


In [5]:
dat_astarr = bind_rows(lst_dat_inp, lst_dat_out) %>% 
    left_join(dat_lib, by="Sample") %>%
    mutate(Depth_Norm = Depth / Size)
head(dat_astarr)

Chrom,Loc,Depth,Sample,Group,Replicate,Size,Depth_Norm
<chr>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>
chrX,47786400,0,Input_rep1,Input,rep1,358823,0
chrX,47786401,0,Input_rep1,Input,rep1,358823,0
chrX,47786402,0,Input_rep1,Input,rep1,358823,0
chrX,47786403,0,Input_rep1,Input,rep1,358823,0
chrX,47786404,0,Input_rep1,Input,rep1,358823,0
chrX,47786405,0,Input_rep1,Input,rep1,358823,0


In [14]:
### summarize the repeats for output and inptu
dat = dat_astarr
dat = dat %>% 
    group_by(Loc, Group) %>% 
    summarise(Depth_Norm = sum(Depth_Norm))

### calculate the ratio
dat = dat %>% 
    spread(Group, Depth_Norm) %>% 
    mutate(Ratio = Output / (Input+1))

head(dat)

[1m[22m`summarise()` has grouped output by 'Loc'. You can override using the `.groups` argument.


Loc,Input,Output,Ratio
<dbl>,<dbl>,<dbl>,<dbl>
47786400,0,0,0
47786401,0,0,0
47786402,0,0,0
47786403,0,0,0
47786404,0,0,0
47786405,0,0,0


In [15]:
fdiry = file.path(FD_RES, "KS91_K562_ASTARRseq", "coverage")
fname = "KS91_K562_hg38_ASTARRseq_Ratio.GATA1.unstranded.perbase.tsv"
fpath = file.path(fdiry, fname)

write_tsv(dat, fpath)