**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



**Set global variable**

In [2]:
TXT_FOLDER_INP = "fcc_starrmpra_junke"
TXT_FOLDER_OUT = "fcc_table"

## Import data

In [3]:
### set directory
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
dir(txt_fdiry)

In [4]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder, "summary")
txt_fname  = "description.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### assign and show
dat_cnames = dat
fun_display_table(dat)

Name,Note
Chrom,Name of the chromosome
ChromStart,The starting position of the feature in the chromosome
ChromEnd,The ending position of the feature in the chromosome
Name,Name
Score,Z score based on mean(logFC of all the bins)
Strand,Strand
Group,Assay name
Label,Assay name + direction
Dataset,Assay dataset


In [5]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)

vec_txt_fname = c(
    'K562.hg38.ASTARR.junke.bed.gz',
    'K562.hg38.WSTARR.junke.bed.gz',
    'K562.hg38.LMPRA.junke.bed.gz',
    'K562.hg38.TMPRA.junke.bed.gz'
)
vec_txt_fpath  = file.path(txt_fdiry, vec_txt_fname)

### read table
vec = dat_cnames$Name
lst = lapply(vec_txt_fpath, function(txt_fpath){
    dat = read_tsv(txt_fpath, col_names = vec, show_col_types = FALSE)
    return(dat)
})
dat = bind_rows(lst)

### assign and show
dat_region_import = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 446934      9


Chrom,ChromStart,ChromEnd,Name,Score,Strand,Group,Label,Dataset
chr1,10010,10430,peak1,-2.520916,.,ASTARR,ASTARR_R,ASTARR_TR
chr1,16220,16340,peak2,-2.338765,.,ASTARR,ASTARR_R,ASTARR_TR
chr1,17230,17440,peak3,-2.34268,.,ASTARR,ASTARR_R,ASTARR_TR


In [6]:
dat = dat_region_import
table(dat$Group)
table(dat$Label)
table(dat$Dataset)


ASTARR  LMPRA  TMPRA WSTARR 
230297  42864   6329 167444 


 ASTARR_A ASTARR_AB  ASTARR_R ASTARR_RB   LMPRA_A  LMPRA_AB   LMPRA_R  LMPRA_RB 
    35505     11680    154337     28775     25648     16603       485       128 
  TMPRA_A  TMPRA_AB   TMPRA_R  TMPRA_RB  WSTARR_A WSTARR_AB  WSTARR_R 
     6017        57       254         1     79738     25505     62201 


  ASTARR_TR LMPRA_Nadav  TMPRA_OL13  TMPRA_OL43  TMPRA_OL45   WSTARR_TR 
     230297       42864         281        2214        3834      167444 

# Filter table

In [7]:
dat = dat_region_import
vec = c("ASTARR", "WSTARR", "TMPRA", "LMPRA")
vec = c(
    paste(vec, "AB", sep="_"),
    paste(vec, "RB", sep="_")
)
print(vec)
dat = dat %>% dplyr::filter(!(Label %in% vec))

### assign and show
dat_region_filter = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] "ASTARR_AB" "WSTARR_AB" "TMPRA_AB"  "LMPRA_AB"  "ASTARR_RB" "WSTARR_RB"
[7] "TMPRA_RB"  "LMPRA_RB" 
[1] 364185      9


Chrom,ChromStart,ChromEnd,Name,Score,Strand,Group,Label,Dataset
chr1,10010,10430,peak1,-2.520916,.,ASTARR,ASTARR_R,ASTARR_TR
chr1,16220,16340,peak2,-2.338765,.,ASTARR,ASTARR_R,ASTARR_TR
chr1,17230,17440,peak3,-2.34268,.,ASTARR,ASTARR_R,ASTARR_TR


In [8]:
dat = dat_region_filter
table(dat$Group)
table(dat$Label)
table(dat$Dataset)


ASTARR  LMPRA  TMPRA WSTARR 
189842  26133   6271 141939 


ASTARR_A ASTARR_R  LMPRA_A  LMPRA_R  TMPRA_A  TMPRA_R WSTARR_A WSTARR_R 
   35505   154337    25648      485     6017      254    79738    62201 


  ASTARR_TR LMPRA_Nadav  TMPRA_OL13  TMPRA_OL43  TMPRA_OL45   WSTARR_TR 
     189842       26133         223        2214        3834      141939 

## Arrange table

In [9]:
fun_get_assay = function(vec){
    dat = tribble(
        ~ColA,    ~ColB,
        "ASTARR", "ATAC-STARR",
        "WSTARR", "WHG-STARR",
        "LMPRA",  "Lenti-MPRA",
        "TMPRA",  "Tiling-MPRA"
    )
    res = fun_str_map_match(vec, dat$ColA, dat$ColB)
    return(res)
}

fun_get_source = function(vec){
    dat = tribble(
        ~ColA,    ~ColB,
        "ASTARR_TR",   "Reddy Lab",
        "WSTARR_TR",   "Reddy Lab",
        "LMPRA_Nadav", "Nadav Lab",
        "TMPRA_OL13",  "Tewhey Lab",
        "TMPRA_OL43",  "Tewhey Lab",
        "TMPRA_OL45",  "Tewhey Lab"
    )
    res = fun_str_map_match(vec, dat$ColA, dat$ColB)
    return(res)
}

In [10]:
### get table
dat = dat_region_filter
vec = c(
    "Chrom", "ChromStart", "ChromEnd", "Group", "Label",
    "Assay", "Region", "Target", "Score", "NLog10P",
    "Method", "Source"
)

dat = dat %>%
    dplyr::mutate(
        Label   = paste(Label, "junke", sep = ":"),
        Assay   = fun_get_assay(Group),
        Region  = fun_gen_region(Chrom, ChromStart, ChromEnd),
        Target  = NA,
        NLog10P = NA,
        Method  = "Junke",
        Source  = fun_get_source(Dataset)
    ) %>%
    dplyr::select(!!!vec)

dat_region_arrange = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 364185     12


Chrom,ChromStart,ChromEnd,Group,Label,Assay,Region,Target,Score,NLog10P,Method,Source
chr1,10010,10430,ASTARR,ASTARR_R:junke,ATAC-STARR,chr1:10010-10430,,-2.520916,,Junke,Reddy Lab
chr1,16220,16340,ASTARR,ASTARR_R:junke,ATAC-STARR,chr1:16220-16340,,-2.338765,,Junke,Reddy Lab
chr1,17230,17440,ASTARR,ASTARR_R:junke,ATAC-STARR,chr1:17230-17440,,-2.34268,,Junke,Reddy Lab


**Check results**

In [11]:
dat = dat_region_arrange
table(dat$Group)
table(dat$Label)
table(dat$Source)


ASTARR  LMPRA  TMPRA WSTARR 
189842  26133   6271 141939 


ASTARR_A:junke ASTARR_R:junke  LMPRA_A:junke  LMPRA_R:junke  TMPRA_A:junke 
         35505         154337          25648            485           6017 
 TMPRA_R:junke WSTARR_A:junke WSTARR_R:junke 
           254          79738          62201 


 Nadav Lab  Reddy Lab Tewhey Lab 
     26133     331781       6271 

## Export results

In [12]:
### set file path
txt_folder = TXT_FOLDER_OUT
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.fcc_starrmpra_junke.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### set table
dat = dat_region_arrange
dat = dat %>% dplyr::arrange(Chrom, ChromStart, ChromEnd)

### write table
write_tsv(dat, txt_fpath, col_names = FALSE)