**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity: singularity_proj_encode_fcc 
BASE DIRECTORY (FD_BASE): /data/reddylab/Kuei 
REPO DIRECTORY (FD_REPO): /data/reddylab/Kuei/repo 
WORK DIRECTORY (FD_WORK): /data/reddylab/Kuei/work 
DATA DIRECTORY (FD_DATA): /data/reddylab/Kuei/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log 
PROJECT REF     (FD_REF): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references 



## Prepare

**Set global variable**

In [2]:
vec = c(
    "fcc_astarr_macs_input_overlap",
    "fcc_astarr_macs_input_union"
)
names(vec) = vec

VEC_TXT_FOLDER = vec
for(txt in vec){cat(txt, "\n")}

fcc_astarr_macs_input_overlap 
fcc_astarr_macs_input_union 


In [3]:
TXT_FNAME_ANNOT = "matrix.annotation.fcc_starrmpra_junke.merge_direction.tsv"

**View files**

In [4]:
txt_fdiry = file.path(FD_RES, "region_annotation", "*", "summary")
txt_fname = TXT_FNAME_ANNOT
txt_fglob = file.path(txt_fdiry, txt_fname)

vec = Sys.glob(txt_fglob)
for(txt in vec){cat(txt, "\n")}

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results/region_annotation/fcc_astarr_macs_input_overlap/summary/matrix.annotation.fcc_starrmpra_junke.merge_direction.tsv 
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results/region_annotation/fcc_astarr_macs_input_union/summary/matrix.annotation.fcc_starrmpra_junke.merge_direction.tsv 


## Import data

In [5]:
### loop to import data
lst = lapply(VEC_TXT_FOLDER, function(txt_folder){
    ### set file directory
    txt_fdiry = file.path(FD_RES, "region_annotation", txt_folder, "summary")
    txt_fname = TXT_FNAME_ANNOT
    txt_fpath = file.path(txt_fdiry, txt_fname)

    ### read table
    dat = read_tsv(txt_fpath, show_col_types = FALSE)
    return(dat)
})

### assign and show
lst_dat_region_annot_import = lst

res = lapply(lst, dim)
print(res)

dat = lst[[1]]
fun_display_table(head(dat, 3))

$fcc_astarr_macs_input_overlap
[1] 99749    16

$fcc_astarr_macs_input_union
[1] 135016     16



Chrom,ChromStart,ChromEnd,Region,ASTARR_A,ASTARR_AR,ASTARR_R,LMPRA_A,LMPRA_AR,LMPRA_R,TMPRA_A,TMPRA_AR,TMPRA_R,WSTARR_A,WSTARR_AR,WSTARR_R
chr1,10038,10405,chr1:10038-10405,0,0,1,0,0,0,0,0,0,0,0,0
chr1,16025,16338,chr1:16025-16338,0,0,1,0,0,0,0,0,0,0,0,0
chr1,17288,17689,chr1:17288-17689,0,0,1,0,0,0,0,0,0,0,0,0


## Summarize

**Check**

In [6]:
lst = lst_dat_region_annot_import
dat = lst[[1]]
dat = dat %>% dplyr::select(Region, ends_with("_A")) 
fun_display_table(head(dat, 3))

Region,ASTARR_A,LMPRA_A,TMPRA_A,WSTARR_A
chr1:10038-10405,0,0,0,0
chr1:16025-16338,0,0,0,0
chr1:17288-17689,0,0,0,0


In [7]:
lst = lst_dat_region_annot_import
dat = lst[[1]]
dat = dat %>% dplyr::select(Region, ends_with("_R")) 
fun_display_table(head(dat, 3))

Region,ASTARR_R,LMPRA_R,TMPRA_R,WSTARR_R
chr1:10038-10405,1,0,0,0
chr1:16025-16338,1,0,0,0
chr1:17288-17689,1,0,0,0


**Execute: Caculate the assay votes for both active and repressive labels**

In [8]:
### helper function
fun = function(mat){
    vec = apply(mat, 1, sum)
    dat = tibble(
        Region    = names(vec),
        Num_Assay = vec
    )
    return(dat)
}

### init
lst = lst_dat_region_annot_import
vec = c("Chrom", "ChromStart", "ChromEnd", "Region", "Direction_Assay", "Num_Assay")

### loop through each ATAC region
lst = lapply(lst, function(dat){
    ### get enhance labels
    dat_region_enhnace = dat %>% dplyr::select(Region, ends_with("_A")) 
    
    ### get repress labels
    dat_region_repress = dat %>% dplyr::select(Region, ends_with("_R"))
    
    ### combine into list
    tmp = list(
        "Active"     = dat_region_enhnace,
        "Repressive" = dat_region_repress
    )
    
    ### loop to get assay count
    tmp = lapply(tmp, function(dat){
        dat = dat %>% tibble::column_to_rownames(var = "Region")
        dat = fun(dat)
        return(dat)
    })
    
    ### arrange count table
    dat = bind_rows(tmp, .id = "Direction_Assay")
    dat = dat %>% 
        dplyr::filter(Num_Assay > 0) %>%
        tidyr::separate(
            Region, 
            into = c("Chrom", "ChromStart", "ChromEnd"), 
            remove = FALSE) %>%
        dplyr::select(!!!vec)
    return(dat)
})

### assign and show
lst_dat_region_annot_result = lst

res = lapply(lst, dim)
print(res)

dat = lst[[1]]
fun_display_table(head(dat, 3))

$fcc_astarr_macs_input_overlap
[1] 109394      6

$fcc_astarr_macs_input_union
[1] 146148      6



Chrom,ChromStart,ChromEnd,Region,Direction_Assay,Num_Assay
chr1,115429,115969,chr1:115429-115969,Active,1
chr1,184091,184563,chr1:184091-184563,Active,1
chr1,605104,605675,chr1:605104-605675,Active,1


**Explore: Count**

In [9]:
lst = lst_dat_region_annot_result
lst = lapply(lst, function(dat){
    dat = dat %>% dplyr::mutate(Note = paste0("N", Num_Assay))
    res = table(dat$Direction_Assay, dat$Note, dnn=c("Direction", "Count"))
    dat = as.data.frame(res)
    return(dat)
})

dat = bind_rows(lst, .id = "Region")
dat = dat %>% tidyr::spread(Count, Freq) %>% dplyr::mutate(Total = N1 + N2 + N3 + N4)
fun_display_table(dat)

Region,Direction,N1,N2,N3,N4,Total
fcc_astarr_macs_input_overlap,Active,27396,12548,4011,38,43993
fcc_astarr_macs_input_overlap,Repressive,63652,1744,5,0,65401
fcc_astarr_macs_input_union,Active,34438,13491,3910,34,51873
fcc_astarr_macs_input_union,Repressive,91407,2862,6,0,94275


## Save results

In [10]:
for (txt_folder in VEC_TXT_FOLDER){

    ### get each table
    dat_region_annot_result = lst_dat_region_annot_result[[txt_folder]]
    
    ### set file directory
    txt_fdiry = file.path(
        FD_RES, 
        "region_annotation", 
        txt_folder,
        "summary"
    )
    
    ### set file path
    txt_fname = "region.annotation.fcc_starrmpra_junke.assayvote.rmAR.tsv"
    txt_fpath = file.path(txt_fdiry, txt_fname)

    ### write table
    dat = dat_region_annot_result
    write_tsv(dat, txt_fpath)
}