**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity: singularity_proj_encode_fcc 
BASE DIRECTORY (FD_BASE): /data/reddylab/Kuei 
REPO DIRECTORY (FD_REPO): /data/reddylab/Kuei/repo 
WORK DIRECTORY (FD_WORK): /data/reddylab/Kuei/work 
DATA DIRECTORY (FD_DATA): /data/reddylab/Kuei/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log 
PROJECT REF     (FD_REF): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references 



## Prepare

**Set global variable**

In [2]:
vec = c(
    "fcc_astarr_macs_input_overlap",
    "fcc_astarr_macs_input_union"
)
names(vec) = vec

VEC_TXT_FOLDER = vec
for(txt in vec){cat(txt, "\n")}

fcc_astarr_macs_input_overlap 
fcc_astarr_macs_input_union 


In [3]:
TXT_FNAME_ANNOT = "region.annotation.fcc_crispri_growth.tsv"

**View files**

In [4]:
txt_fdiry = file.path(FD_RES, "region_annotation", "*", "summary")
txt_fname = TXT_FNAME_ANNOT
txt_fglob = file.path(txt_fdiry, txt_fname)

vec = Sys.glob(txt_fglob)
for(txt in vec){cat(txt, "\n")}

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results/region_annotation/fcc_astarr_macs_input_overlap/summary/region.annotation.fcc_crispri_growth.tsv 
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results/region_annotation/fcc_astarr_macs_input_union/summary/region.annotation.fcc_crispri_growth.tsv 


## Import data

**Read region annotation**

In [5]:
### loop to import data
lst = lapply(VEC_TXT_FOLDER, function(txt_folder){
    ### set file directory
    txt_fdiry = file.path(FD_RES, "region_annotation", txt_folder, "summary")
    txt_fname = TXT_FNAME_ANNOT
    txt_fpath = file.path(txt_fdiry, txt_fname)

    ### read table
    dat = read_tsv(txt_fpath, show_col_types = FALSE)
    return(dat)
})
names(lst) = VEC_TXT_FOLDER

### assign and show
lst_dat_region_annot_import = lst

res = lapply(lst, dim)
print(res)

dat = lst[[1]]
fun_display_table(head(dat, 3))

$fcc_astarr_macs_input_overlap
[1] 4380   10

$fcc_astarr_macs_input_union
[1] 4907   10



Chrom,ChromStart,ChromEnd,Region,Annotation_A,Annotation_B,Group,Label,Region_Annot,Region_Count
chr1,605104,605675,chr1:605104-605675,fcc_astarr_macs_input_overlap,fcc_crispri_growth_signif,CRISPRi-Growth,Signif,chr1:605550-605627,1
chr1,826796,828040,chr1:826796-828040,fcc_astarr_macs_input_overlap,fcc_crispri_growth_signif,CRISPRi-Growth,Signif,chr1:826642-827902,1
chr1,1068587,1070616,chr1:1068587-1070616,fcc_astarr_macs_input_overlap,fcc_crispri_growth_signif,CRISPRi-Growth,Signif,chr1:1067929-1070953,1


**Read region**

In [6]:
### set file directory
txt_fdiry = file.path(FD_RES, "region", "fcc_crispri_growth", "summary")
txt_fname = "K562.hg38.CRISPRi_Growth.signif.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### assign and show
dat_region_original = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 6242    9


Chrom,ChromStart,ChromEnd,Region,Guide_ID,Log2FC,Padj,Group,Label
chr1,605550,605627,chr1:605550-605627,chr1.1.1,-0.9859338,0.0,CRISPRi-Growth,Signif
chr1,826642,827902,chr1:826642-827902,chr1.4.8,0.1855074,0.0325051,CRISPRi-Growth,Signif
chr1,964946,965136,chr1:964946-965136,chr1.41.7,-1.1466792,0.0,CRISPRi-Growth,Signif


**Check**

In [7]:
lst = lst_dat_region_annot_import
lst = lapply(lst, function(dat){table(dat$Label)})
print(lst)

$fcc_astarr_macs_input_overlap

Signif 
  4380 

$fcc_astarr_macs_input_union

Signif 
  4907 



## Explore: Count table

**Total peaks**

In [8]:
dat = dat_region_original

### count guides
vec = unique(dat$Guide_ID)
print(length(vec))

### count DHS
vec = unique(dat$Region)
print(length(vec))

[1] 6242
[1] 6242


In [9]:
dat = dat_region_original
dat = dat %>% dplyr::select(Region, Group) %>% dplyr::distinct()

res = table(dat$Group, dnn = "Group")
dat = as.data.frame(res)

dat_region_peak_count = dat
fun_display_table(dat)

Group,Freq
CRISPRi-Growth,6242


**Region contains significant guides**

In [10]:
lst = lst_dat_region_annot_import
lst = lapply(lst, function(dat){
    dat = dat %>% dplyr::select(Region, Group) %>% dplyr::distinct()
    res = table(dat$Group, dnn = "Group")
    dat = as.data.frame(res)
    return(dat)
})
dat = bind_rows(lst, .id = "Region")

### assign and show
dat_region_annot_count = dat
fun_display_table(dat)

Region,Group,Freq
fcc_astarr_macs_input_overlap,CRISPRi-Growth,4380
fcc_astarr_macs_input_union,CRISPRi-Growth,4907


**Summarize**

In [11]:
tmp = dat_region_peak_count
tmp = tmp %>% dplyr::mutate(Freq = scales::comma(Freq))
colnames(tmp) = c("Group", "Peak (Total)")

dat = dat_region_annot_count
dat = dat %>% 
    dplyr::mutate(Region = fun_str_map_atac(Region)) %>%
    dplyr::mutate(Freq   = scales::comma(Freq)) %>%
    tidyr::spread(Region, Freq)

dat = tmp %>% dplyr::right_join(dat, by = "Group")
dat %>% kableExtra::kable("markdown")



|Group          |Peak (Total) |ATAC (Overlap) |ATAC (Union) |
|:--------------|:------------|:--------------|:------------|
|CRISPRi-Growth |6,242        |4,380          |4,907        |

In [12]:
lst = lst_dat_region_annot_import
lst = lapply(lst, function(dat){
    res = table(dat$Region_Count, dnn = "Note")
    dat = as.data.frame(res)
    return(res)
})

dat = bind_rows(lst, .id = "Region")
#fun_display_table(dat)
dat %>% kableExtra::kable("markdown")



|Region                        |    1|  2|
|:-----------------------------|----:|--:|
|fcc_astarr_macs_input_overlap | 4325| 55|
|fcc_astarr_macs_input_union   | 4839| 68|