**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



## Import data

In [2]:
txt_region = "fcc_astarr_macs_input_overlap"
txt_fdiry  = file.path(FD_RES, "region_coverage_fcc", txt_region, "summary")
txt_fname  = "result.coverage.zscore.final.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

dat = read_tsv(txt_fpath, show_col_types = FALSE)

dat_region_score = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 432505      9


Chrom,ChromStart,ChromEnd,Region,Score,Assay_Name,Assay_Type,Assay_Group,Assay_Label
chr10,100729094,100729750,chr10:100729094-100729750,-0.3065107,CRISPRi_FlowFISH_K562_Riley_JinWoo,CRISPRi-HCRFF,CRISPRi-HCRFF,CRISPRi-HCR FlowFISH
chr10,100743501,100744571,chr10:100743501-100744571,-0.2702473,CRISPRi_FlowFISH_K562_Riley_JinWoo,CRISPRi-HCRFF,CRISPRi-HCRFF,CRISPRi-HCR FlowFISH
chr10,100745413,100745741,chr10:100745413-100745741,0.1130381,CRISPRi_FlowFISH_K562_Riley_JinWoo,CRISPRi-HCRFF,CRISPRi-HCRFF,CRISPRi-HCR FlowFISH


**Check**

In [3]:
dat = dat_region_score
table(dat$Assay_Type)


        ASTARR CRISPRi-Growth  CRISPRi-HCRFF          LMPRA          TMPRA 
        150040          72743            941          61478            823 
        WSTARR 
        146480 

In [4]:
dat = dat_region_score
table(dat$Assay_Group)


   ASTARR_KS91 CRISPRi-Growth  CRISPRi-HCRFF          LMPRA          TMPRA 
        150040          72743            941          61478            823 
        WSTARR 
        146480 

In [5]:
dat = dat_region_score
table(dat$Assay_Type,dat$Assay_Group)

                
                 ASTARR_KS91 CRISPRi-Growth CRISPRi-HCRFF  LMPRA  TMPRA WSTARR
  ASTARR              150040              0             0      0      0      0
  CRISPRi-Growth           0          72743             0      0      0      0
  CRISPRi-HCRFF            0              0           941      0      0      0
  LMPRA                    0              0             0  61478      0      0
  TMPRA                    0              0             0      0    823      0
  WSTARR                   0              0             0      0      0 146480

## Arrange STARR/MPRA

In [6]:
dat = dat_region_score
vec = c("ASTARR", "WSTARR", "LMPRA")
dat = dat %>% 
    dplyr::mutate(Assay = Assay_Type) %>% 
    dplyr::filter(Assay %in% vec) %>% 
    dplyr::select(Chrom, ChromStart, ChromEnd, Region, Assay, Score)

dat = dat %>% tidyr::spread(Assay, Score) %>% na.omit

mat_region_score = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 60618     7


Chrom,ChromStart,ChromEnd,Region,ASTARR,LMPRA,WSTARR
chr1,138321,139517,chr1:138321-139517,0.0303204,-0.591964,1.420852
chr1,267910,268557,chr1:267910-268557,-1.3684092,-0.6946085,-1.731683
chr1,778233,779389,chr1:778233-779389,2.4204654,2.3941532,2.873101


## Ranknorm STARR/MPRA

**Helper function**

In [7]:
fun_ranknorm = function(x){
    return(rank(x)/length(x))
}

**Rank normalize**

In [8]:
dat = mat_region_score
dat = dat %>% tidyr::gather(Assay, Score, -Chrom, -ChromStart, -ChromEnd, -Region)

dat = dat %>%
    dplyr::group_by(Assay) %>%
    dplyr::mutate(RankNorm = fun_ranknorm(Score)) %>%
    dplyr::ungroup()

dat = dat %>%
    dplyr::group_by(Chrom, ChromStart, ChromEnd, Region) %>%
    dplyr::mutate(Mean = mean(RankNorm)) %>%
    dplyr::ungroup()

dat_region_score_ranknorm = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 181854      8


Chrom,ChromStart,ChromEnd,Region,Assay,Score,RankNorm,Mean
chr1,138321,139517,chr1:138321-139517,ASTARR,0.0303204,0.4011845,0.5282837
chr1,267910,268557,chr1:267910-268557,ASTARR,-1.3684092,0.0358144,0.1025081
chr1,778233,779389,chr1:778233-779389,ASTARR,2.4204654,0.9607872,0.9754253


## Export results

In [9]:
txt_region = "fcc_astarr_macs_input_overlap"
txt_fdiry  = file.path(FD_RES, "region_integration", txt_region)
txt_fname  = "result.coverage.ranknorm.starrmpra.whg.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

dat = dat_region_score_ranknorm
write_tsv(dat, txt_fpath)