**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



**Set global variable**

In [2]:
TXT_FOLDER_INP = "encode_e2g_benchmark"
TXT_FOLDER_OUT = "fcc_table"

## Import data

In [3]:
### set directory
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
dir(txt_fdiry)

In [4]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder, "summary")
txt_fname  = "description.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### assign and show
dat_cnames = dat
fun_display_table(dat)

Name,Note
Chrom,Name of the chromosome
ChromStart,The starting position of the feature in the chromosome
ChromEnd,The ending position of the feature in the chromosome
Name,Name given to a region; Use '.' if no name is assigned.
Score,Effect Size
Region,Region coordinate of the Region-Gene pair
Target,Gene of the Region-Gene pair
NLog10P,MinusLog10PValue; -log10 of P-value
Regulated,Regulated or not
Source,Reference


In [5]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.ENCODE_E2G.benchmark.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### read table
vec = dat_cnames$Name
dat = read_tsv(txt_fpath, col_names = vec, show_col_types = FALSE)

### assign and show
dat_region_import = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 10375    12


Chrom,ChromStart,ChromEnd,Name,Score,Region,Target,NLog10P,Regulated,Source,Group,Label
chr1,3774714,3775214,CEP104|chr1:3691278-3691778:*,-0.2934319,chr1:3774714-3775214,CEP104,2.395344,True,Ulirsch2016,E2G-Benchmark,Regulated:TRUE
chr1,3774714,3775214,LRRC47|chr1:3691278-3691778:*,-0.3311781,chr1:3774714-3775214,LRRC47,2.109514,True,Ulirsch2016,E2G-Benchmark,Regulated:TRUE
chr1,3774714,3775214,SMIM1|chr1:3691278-3691778:*,-0.4720192,chr1:3774714-3775214,SMIM1,3.192703,True,Ulirsch2016,E2G-Benchmark,Regulated:TRUE


## Arrange table

In [10]:
### get table
dat = dat_region_import
vec = c(
    "Chrom", "ChromStart", "ChromEnd", "Group", "Label",
    "Assay", "Region", "Target", "Score", "NLog10P",
    "Method", "Source"
)

dat = dat %>% 
    dplyr::mutate(
        Label  = paste("E2G", Label, sep = ":"),
        Assay  = "ENCODE-E2G-Benchmark",
        Region = fun_gen_region(Chrom, ChromStart, ChromEnd),
        Method = "ENCODE-E2G"
    ) %>%
    dplyr::select(!!!vec)

dat_region_arrange = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 10375    12


Chrom,ChromStart,ChromEnd,Group,Label,Assay,Region,Target,Score,NLog10P,Method,Source
chr1,3774714,3775214,E2G-Benchmark,E2G:Regulated:TRUE,ENCODE-E2G-Benchmark,chr1:3774714-3775214,CEP104,-0.2934319,2.395344,ENCODE-E2G,Ulirsch2016
chr1,3774714,3775214,E2G-Benchmark,E2G:Regulated:TRUE,ENCODE-E2G-Benchmark,chr1:3774714-3775214,LRRC47,-0.3311781,2.109514,ENCODE-E2G,Ulirsch2016
chr1,3774714,3775214,E2G-Benchmark,E2G:Regulated:TRUE,ENCODE-E2G-Benchmark,chr1:3774714-3775214,SMIM1,-0.4720192,3.192703,ENCODE-E2G,Ulirsch2016


## Export results

In [11]:
### set file path
txt_folder = TXT_FOLDER_OUT
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.encode_e2g_benchmark.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### set table
dat = dat_region_arrange
dat = dat %>% dplyr::arrange(Chrom, ChromStart, ChromEnd)

### write table
write_tsv(dat, txt_fpath, col_names = FALSE)