**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity: singularity_proj_encode_fcc 
BASE DIRECTORY (FD_BASE): /data/reddylab/Kuei 
REPO DIRECTORY (FD_REPO): /data/reddylab/Kuei/repo 
WORK DIRECTORY (FD_WORK): /data/reddylab/Kuei/work 
DATA DIRECTORY (FD_DATA): /data/reddylab/Kuei/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log 
PROJECT REF     (FD_REF): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references 



## Import data

In [2]:
txt_fdiry = file.path(
    FD_RES, 
    "region_annotation", 
    "fcc_astarr_macs_input_overlap",
    "summary"
)
txt_fname = "matrix.annotation.fcc_peak_call.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

dat = read_tsv(txt_fpath, show_col_types = FALSE)

dat_region_annot_fcc = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 100454     15


Chrom,ChromStart,ChromEnd,Region,ASTARR_A,WSTARR_A,LMPRA_A,TMPRA_A,ASTARR_R,WSTARR_R,LMPRA_R,TMPRA_R,CRISPRi-HCRFF,CRISPRi-Growth,CRISPR-E2G
chr1,10038,10405,chr1:10038-10405,0,0,0,0,1,0,0,0,0,0,0
chr1,16025,16338,chr1:16025-16338,0,0,0,0,1,0,0,0,0,0,0
chr1,17288,17689,chr1:17288-17689,0,0,0,0,1,0,0,0,0,0,0


## Enhancer vs Silencer

In [3]:
dat = dat_region_annot_fcc
dat = dat %>% dplyr::select(Region, ends_with("_A"))

dat_region_annot_enh = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 100454      5


Region,ASTARR_A,WSTARR_A,LMPRA_A,TMPRA_A
chr1:10038-10405,0,0,0,0
chr1:16025-16338,0,0,0,0
chr1:17288-17689,0,0,0,0


In [4]:
dat = dat_region_annot_fcc
dat = dat %>% dplyr::select(Region, ends_with("_R"))

dat_region_annot_rep = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 100454      5


Region,ASTARR_R,WSTARR_R,LMPRA_R,TMPRA_R
chr1:10038-10405,1,0,0,0
chr1:16025-16338,1,0,0,0
chr1:17288-17689,1,0,0,0


## Count assay for each region

In [6]:
### init
lst = list(
    "Enhance" = dat_region_annot_enh,
    "Repress" = dat_region_annot_rep
)

### count assay for each region
lst = lapply(lst, function(dat){
    ### count number of assays for each matrix
    dat = dat %>% tibble::column_to_rownames(var="Region")
    vec = apply(dat, 1, sum)

    ### remove regions without any assay peak
    idx = (vec > 0)
    vec = vec[idx]

    ### rearrange results into dataframe
    dat = data.frame(
        Region    = names(vec),
        Num_Assay = vec
    )
    return(dat)
})

### combine enhance and repress counts
dat = bind_rows(lst, .id = "Type")
dat = dat %>% tidyr::separate(Region, c("Chrom", "ChromStart", "ChromEnd"), remove = FALSE)
dat = dat %>% dplyr::select(Chrom, ChromStart, ChromEnd, Region, Type, Num_Assay)

### assign and show
dat_region_fcc_assayvote = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 122228      6


Unnamed: 0,Chrom,ChromStart,ChromEnd,Region,Type,Num_Assay
chr1:115429-115969,chr1,115429,115969,chr1:115429-115969,Enhance,1
chr1:184091-184563,chr1,184091,184563,chr1:184091-184563,Enhance,1
chr1:605104-605675,chr1,605104,605675,chr1:605104-605675,Enhance,1


**Explore results**

In [9]:
dat = dat_region_fcc_assayvote
res = table(dat$Type)
print(res)


Enhance Repress 
  47686   74542 


In [7]:
dat = dat_region_fcc_assayvote
vec = unique(dat$Region)
print(length(vec))

[1] 99749


In [8]:
dat = dat_region_fcc_assayvote
dat = dat %>% dplyr::filter(Num_Assay > 1)
vec = unique(dat$Region)
print(length(vec))

[1] 22517
