**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



**Set global variable**

In [2]:
TXT_FOLDER_REGION = "fcc_table"

## Import data

In [3]:
### set file path
txt_folder = TXT_FOLDER_REGION
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.*.bed.gz"
txt_fglob  = file.path(txt_fdiry, txt_fname)

vec_txt_fpath = Sys.glob(txt_fglob)
vec_txt_fname = basename(vec_txt_fpath)

vec = vec_txt_fname
for(txt in vec){cat(txt, "\n")}

K562.hg38.atac.bed.gz 
K562.hg38.encode_e2g_benchmark.bed.gz 
K562.hg38.fcc_astarr_csaw.bed.gz 
K562.hg38.fcc_crispri_growth.bed.gz 
K562.hg38.fcc_crispri_hcrff.bed.gz 
K562.hg38.fcc_starrmpra_junke.bed.gz 
K562.hg38.tss.bed.gz 


In [4]:
### read table
vec_txt_cname = c(
    "Chrom", "ChromStart", "ChromEnd", "Group", "Label",
    "Assay", "Region", "Target", "Score", "NLog10P",
    "Method", "Source"
)

lst = lapply(vec_txt_fpath, function(txt_fpath){
    dat = read_tsv(txt_fpath, col_names = vec, show_col_types = FALSE)        
    colnames(dat) = vec_txt_cname
    return(dat)
})
dat = bind_rows(lst)

### assign and show
dat_region_import = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 895792     12


Chrom,ChromStart,ChromEnd,Group,Label,Assay,Region,Target,Score,NLog10P,Method,Source
chr1,10038,10405,ATAC,ATAC,ATAC,chr1:10038-10405,,3.940038,,MACS,Reddy Lab
chr1,14282,14614,ATAC,ATAC,ATAC,chr1:14282-14614,,2.841707,,MACS,Reddy Lab
chr1,16025,16338,ATAC,ATAC,ATAC,chr1:16025-16338,,3.830812,,MACS,Reddy Lab


## Define columns

In [5]:
### setup column description
dat = tribble(
    ~Name,        ~Note,
    "Chrom",      "Name of the chromosome",
    "ChromStart", "The starting position of the feature in the chromosome",
    "ChromEnd",   "The ending position of the feature in the chromosome",
    "Group",      "Region group",
    "Label",      "Region label",
    "Assay",      "Assay or annotation name",
    "Region",     "Region coordinate",
    "Target",     "Targeted genes or guides",
    "Score",      "Score assigned to a region.",
    "NLog10P",    "-log10 of P-value",
    "Method",     "Method of analysis",
    "Source",     "Dataset or data source"
)

### assign and show
dat_cname = dat
fun_display_table(dat)

Name,Note
Chrom,Name of the chromosome
ChromStart,The starting position of the feature in the chromosome
ChromEnd,The ending position of the feature in the chromosome
Group,Region group
Label,Region label
Assay,Assay or annotation name
Region,Region coordinate
Target,Targeted genes or guides
Score,Score assigned to a region.
NLog10P,-log10 of P-value


**Check**

In [6]:
vec1 = colnames(dat_region_import)
vec2 = dat_cname$Name
all(vec1 == vec2)

In [10]:
dat = dat_region_import
table(dat$Group)


        ASTARR           ATAC CRISPRi-Growth  CRISPRi-HCRFF  E2G-Benchmark 
        542786         150041           6242            113          10375 
         LMPRA          TMPRA            TSS         WSTARR 
         26133           6271          11892         141939 

## Export results

In [11]:
### set file path
txt_folder = TXT_FOLDER_REGION
txt_fdiry  = file.path(FD_RES, "region", txt_folder, "summary")
txt_fname  = "description.concat.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### write table
dat = dat_cname
write_tsv(dat, txt_fpath)

In [8]:
### set file path
txt_folder = TXT_FOLDER_REGION
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "fcc_table.concat.starr.mpra.crispri.e2g.atac.tss.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### set table
dat = dat_region_import
dat = dat %>% dplyr::arrange(Chrom, ChromStart, ChromEnd)

### write table
write_tsv(dat, txt_fpath, col_names = FALSE)