**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity: singularity_proj_encode_fcc 
BASE DIRECTORY (FD_BASE): /data/reddylab/Kuei 
REPO DIRECTORY (FD_REPO): /data/reddylab/Kuei/repo 
WORK DIRECTORY (FD_WORK): /data/reddylab/Kuei/work 
DATA DIRECTORY (FD_DATA): /data/reddylab/Kuei/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log 
PROJECT REF     (FD_REF): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references 



## Prepare

**Set global variable**

In [2]:
vec = c(
    "fcc_astarr_macs_input_overlap",
    "fcc_astarr_macs_input_union"
)
names(vec) = vec

VEC_TXT_FOLDER = vec
for(txt in vec){cat(txt, "\n")}

fcc_astarr_macs_input_overlap 
fcc_astarr_macs_input_union 


**View files: FCC region coverage**

In [3]:
txt_folder = VEC_TXT_FOLDER[1]
txt_fdiry  = file.path(FD_RES, "analysis_fcc_comparison", txt_folder)
txt_fname  = "region.annotation.fcc_starrmpra_junke.group.coverage.crispri.tsv"
txt_fglob  = file.path(txt_fdiry, txt_fname)

vec = Sys.glob(txt_fglob)
for (txt in vec) {cat(txt, "\n")}

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results/analysis_fcc_comparison/fcc_astarr_macs_input_overlap/region.annotation.fcc_starrmpra_junke.group.coverage.crispri.tsv 


## Import data

In [9]:
### loop to import data
lst = lapply(VEC_TXT_FOLDER, function(txt_folder){
    ### set file directory
    txt_fdiry  = file.path(FD_RES, "analysis_fcc_comparison", txt_folder)
    txt_fname  = "region.annotation.fcc_starrmpra_junke.group.coverage.crispri.tsv"
    txt_fpath = file.path(txt_fdiry, txt_fname)

    ### read table
    dat = read_tsv(txt_fpath, show_col_types = FALSE)
    return(dat)
})

### assign and show
lst_dat_region_merge_import = lst

res = lapply(lst, dim)
print(res)

dat = lst[[1]]
head(dat, 3)

$fcc_astarr_macs_input_overlap
[1] 15460    13

$fcc_astarr_macs_input_union
[1] 15847    13



Chrom,ChromStart,ChromEnd,Region,Group,Label,Score,Zcore,Score_Label,Zcore_Label,Assay_Folder,Assay_Group,Assay_Label
<chr>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>
chr11,4092109,4092511,chr11:4092109-4092511,Distal:Active,Screen:NotSignif,-0.1294364,-0.1294364,Mean(ZScore),Mean(ZScore),CRISPRi_FlowFISH_K562_Riley_JinWoo,CRISPRi-HCRFF,CRISPRi-HCRFF
chr11,4094223,4095304,chr11:4094223-4095304,Proximal:Active,Screen:NotSignif,-0.04849565,-0.04849565,Mean(ZScore),Mean(ZScore),CRISPRi_FlowFISH_K562_Riley_JinWoo,CRISPRi-HCRFF,CRISPRi-HCRFF
chr11,4393045,4394087,chr11:4393045-4394087,Proximal:Active,Screen:NotSignif,-0.08214896,-0.08214896,Mean(ZScore),Mean(ZScore),CRISPRi_FlowFISH_K562_Riley_JinWoo,CRISPRi-HCRFF,CRISPRi-HCRFF


## Calculate ratio of significant for each group

In [11]:
lst = lst_dat_region_merge_import

lst = lapply(lst, function(dat){
    ### select columns
    dat = dat %>% 
        dplyr::select(Region, Group, Label, Assay_Label) %>%
        dplyr::distinct()
    
    ### count by assay and group
    dat = dat %>%
        dplyr::group_by(Group, Label, Assay_Label) %>%
        dplyr::summarise(Count = n(), .groups = "drop")
    
    ### count by assay
    dat = dat %>%
        dplyr::group_by(Group, Assay_Label) %>%
        dplyr::mutate(Total = sum(Count)) %>%
        dplyr::ungroup()

    ### calculate percentage
    dat = dat %>%
        dplyr::mutate(Ratio   = Count / Total) %>%
        dplyr::mutate(Percent = Ratio * 100)
    return(dat)
   
    return(dat)
})

### assign and show
lst_dat_region_merge_ratio = lst

res = lapply(lst, dim)
print(res)

dat = lst[[1]]
fun_display_table(head(dat, 3))

$fcc_astarr_macs_input_overlap
[1] 14  7

$fcc_astarr_macs_input_union
[1] 14  7



Group,Label,Assay_Label,Count,Total,Ratio,Percent
Distal:Active,Screen:NotSignif,CRISPRi-Growth,7953,8322,0.9556597,95.56597
Distal:Active,Screen:NotSignif,CRISPRi-HCRFF,120,135,0.8888889,88.88889
Distal:Active,Screen:Signif,CRISPRi-Growth,369,8322,0.0443403,4.43403


## Explore: count and ratio of significant regions

**Helper function for showing percentage**

In [12]:
fun_inner = function(num){
    if (num > 99) {
        num = round(num, 1)
        return(num)
    } 
    if (num < 1) {
        num = round(num, 1)
        return(num)  
    }
    num = round(num)    
    return(num)
}

fun_wrapper = function(vec_num_inp){
    vec_num_out = sapply(vec_num_inp, fun_inner)
    return(vec_num_out)  
}

**Show ratio of significant regions for CRISPRi-HCRFF**

In [14]:
### get table
idx = "fcc_astarr_macs_input_overlap"
lst = lst_dat_region_merge_ratio
dat = lst[[idx]]

### filter assay
dat = dat %>% dplyr::filter(Assay_Label == "CRISPRi-HCRFF")

### show percentage of ratio
dat = dat %>% 
    dplyr::select(Group, Total, Label, Percent) %>%
    dplyr::mutate(Percent = fun_wrapper(Percent)) %>%
    dplyr::mutate(Percent = paste0(Percent, "%")) %>%
    tidyr::spread(Label, Percent)

dat %>% kableExtra::kable("markdown")



|Group             | Total|Screen:NotSignif |Screen:Signif |
|:-----------------|-----:|:----------------|:-------------|
|Distal:Active     |   135|89%              |11%           |
|Distal:Repressive |    12|92%              |8%            |
|Proximal:Active   |    73|81%              |19%           |

**Show ratio of significant regions for CRISPRi-Growth**

In [15]:
### get table
idx = "fcc_astarr_macs_input_overlap"
lst = lst_dat_region_merge_ratio
dat = lst[[idx]]

### filter assay
dat = dat %>% dplyr::filter(Assay_Label == "CRISPRi-Growth")

### show percentage of ratio
dat = dat %>% 
    dplyr::select(Group, Total, Label, Percent) %>%
    dplyr::mutate(Percent = fun_wrapper(Percent)) %>%
    dplyr::mutate(Percent = paste0(Percent, "%")) %>%
    tidyr::spread(Label, Percent)

dat %>% kableExtra::kable("markdown")



|Group               | Total|Screen:NotSignif |Screen:Signif |
|:-------------------|-----:|:----------------|:-------------|
|Distal:Active       |  8322|96%              |4%            |
|Distal:Repressive   |   745|95%              |5%            |
|Proximal:Active     |  5861|82%              |18%           |
|Proximal:Repressive |   312|95%              |5%            |