**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



**Set global variable**

In [2]:
TXT_FOLDER_INP = "fcc_astarr_macs"
TXT_FOLDER_OUT = "fcc_table"

## Import data

In [3]:
### set directory
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
dir(txt_fdiry)

In [4]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder, "summary")
txt_fname  = "description.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### assign and show
dat_cnames = dat
fun_display_table(dat)

Name,Note
Chrom,Name of the chromosome
ChromStart,The starting position of the feature in the chromosome
ChromEnd,The ending position of the feature in the chromosome


In [5]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.ASTARR.macs.KS91.input.rep_all.max_overlaps.q5.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### read table
vec = dat_cnames$Name
dat = read_tsv(txt_fpath, col_names = vec, show_col_types = FALSE)

### assign and show
dat_region_import = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 150042      3


Chrom,ChromStart,ChromEnd
chr1,10038,10405
chr1,14282,14614
chr1,16025,16338


In [6]:
txt_fdiry = file.path(
    FD_RES,
    "region_coverage_fcc",
    "fcc_astarr_macs_input_overlap", 
    "STARR_ATAC_K562_Reddy_KS91",
    "overlap_score",
    "summary"
)

txt_fname = "result.coverage.TPM.FPKM.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

dat = read_tsv(txt_fpath, show_col_types = FALSE)

dat_region_coverage_astarr = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 150041      8


Chrom,ChromStart,ChromEnd,Region,Input_FPKM,Input_TPM,Output_FPKM,Output_TPM
chr1,10038,10405,chr1:10038-10405,0.0041644,3.940038,0.0007357,0.7181993
chr1,14282,14614,chr1:14282-14614,0.0030033,2.841707,0.0022621,2.2104314
chr1,16025,16338,chr1:16025-16338,0.0040487,3.830812,0.0012867,1.2597204


## Arrange table

In [7]:
dat = dplyr::inner_join(
    dat_region_import,
    dat_region_coverage_astarr,
    by = c("Chrom", "ChromStart", "ChromEnd")
)

dat_region_merge = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 150041      8


Chrom,ChromStart,ChromEnd,Region,Input_FPKM,Input_TPM,Output_FPKM,Output_TPM
chr1,10038,10405,chr1:10038-10405,0.0041644,3.940038,0.0007357,0.7181993
chr1,14282,14614,chr1:14282-14614,0.0030033,2.841707,0.0022621,2.2104314
chr1,16025,16338,chr1:16025-16338,0.0040487,3.830812,0.0012867,1.2597204


In [8]:
### get table
dat = dat_region_merge
vec = c(
    "Chrom", "ChromStart", "ChromEnd", "Group", "Label",
    "Assay", "Region", "Target", "Score", "NLog10P",
    "Method", "Source"
)

dat = dat %>% 
    dplyr::mutate(
        Group   = "ATAC",
        Label   = "ATAC",
        Assay   = "ATAC",
        Region  = fun_gen_region(Chrom, ChromStart, ChromEnd),
        Target  = NA,
        Score   = Input_TPM,
        NLog10P = NA,
        Method  = "MACS",
        Source  = "Reddy Lab"
    ) %>%
    dplyr::select(!!!vec)

dat_region_arrange = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 150041     12


Chrom,ChromStart,ChromEnd,Group,Label,Assay,Region,Target,Score,NLog10P,Method,Source
chr1,10038,10405,ATAC,ATAC,ATAC,chr1:10038-10405,,3.940038,,MACS,Reddy Lab
chr1,14282,14614,ATAC,ATAC,ATAC,chr1:14282-14614,,2.841707,,MACS,Reddy Lab
chr1,16025,16338,ATAC,ATAC,ATAC,chr1:16025-16338,,3.830812,,MACS,Reddy Lab


## Export results

In [None]:
### set file path
txt_folder = TXT_FOLDER_OUT
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.atac.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### set table
dat = dat_region_arrange
dat = dat %>% dplyr::arrange(Chrom, ChromStart, ChromEnd)

### write table
write_tsv(dat, txt_fpath, col_names = FALSE)