**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



**Set global variables**

In [2]:
TXT_REGION_FOLDER = "encode_chipseq_histone"

## Define column description
The peak file is in narrowPeak format, which is a standard six field bed with four additional fields (BED6+4 format)

In [3]:
### ENCODE narrowPeak: Narrow (or Point-Source) Peaks format
dat = tribble(
    ~Name,        ~Note,
    "Chrom",      "Name of the chromosome",
    "ChromStart", "The starting position of the feature in the chromosome",
    "ChromEnd",   "The ending position of the feature in the chromosome",
    "Name",       "Name given to a region; Use '.' if no name is assigned.",
    "Score",      "Indicates how dark the peak will be displayed in the browser (0-1000).",
    "Strand",     "+/- to denote strand or orientation. Use '.' if no orientation is assigned.",
    "SignalValue","Measurement of overall (usually, average) enrichment for the region.",
    "PValue",     "Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.",
    "QValue",     "Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.",
    "Peak",       "Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called."
    
)

### assign and show
dat_cnames = dat
fun_display_table(dat)

Name,Note
Chrom,Name of the chromosome
ChromStart,The starting position of the feature in the chromosome
ChromEnd,The ending position of the feature in the chromosome
Name,Name given to a region; Use '.' if no name is assigned.
Score,Indicates how dark the peak will be displayed in the browser (0-1000).
Strand,+/- to denote strand or orientation. Use '.' if no orientation is assigned.
SignalValue,"Measurement of overall (usually, average) enrichment for the region."
PValue,Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.
QValue,Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
Peak,Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.


## Define file labeling

In [4]:
### set directory
txt_folder = TXT_REGION_FOLDER
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fglob  = file.path(txt_fdiry, "*bed*")

### get file names
vec_txt_fpath = Sys.glob(txt_fglob)
vec_txt_fname = basename(vec_txt_fpath)

### init info table
dat = data.frame(
    "Folder" = txt_folder,
    "FName"  = vec_txt_fname
)

### arrange table
dat = dat %>% tidyr::separate(
        FName, 
        c("Biosample", "Genome", "Index_Experiment", "Index_File", "Assay", "Target", "File_Type", "File_Ext"),
        sep = "\\.",
        remove = FALSE
    ) %>%
    dplyr::mutate(Label = paste(tolower(Assay), Target, Index_File, sep="_")) %>%
    dplyr::select(Folder, FName, Label) 

### assign and show
dat_region_label = dat
fun_display_table(dat)

Folder,FName,Label
encode_chipseq_histone,K562.hg38.ENCSR000AKP.ENCFF544LXB.ChIPseq.H3K27ac.bed.gz,chipseq_H3K27ac_ENCFF544LXB
encode_chipseq_histone,K562.hg38.ENCSR000AKQ.ENCFF323WOT.ChIPseq.H3K27me3.bed.gz,chipseq_H3K27me3_ENCFF323WOT
encode_chipseq_histone,K562.hg38.ENCSR000AKR.ENCFF193ERO.ChIPseq.H3K36me3.bed.gz,chipseq_H3K36me3_ENCFF193ERO
encode_chipseq_histone,K562.hg38.ENCSR000AKS.ENCFF135ZLM.ChIPseq.H3K4me1.bed.gz,chipseq_H3K4me1_ENCFF135ZLM
encode_chipseq_histone,K562.hg38.ENCSR000AKT.ENCFF749KLQ.ChIPseq.H3K4me2.bed.gz,chipseq_H3K4me2_ENCFF749KLQ
encode_chipseq_histone,K562.hg38.ENCSR000AKU.ENCFF689QIJ.ChIPseq.H3K4me3.bed.gz,chipseq_H3K4me3_ENCFF689QIJ
encode_chipseq_histone,K562.hg38.ENCSR000AKV.ENCFF891CHI.ChIPseq.H3K9ac.bed.gz,chipseq_H3K9ac_ENCFF891CHI
encode_chipseq_histone,K562.hg38.ENCSR000AKW.ENCFF462AVD.ChIPseq.H3K9me1.bed.gz,chipseq_H3K9me1_ENCFF462AVD
encode_chipseq_histone,K562.hg38.ENCSR000AKX.ENCFF909RKY.ChIPseq.H4K20me1.bed.gz,chipseq_H4K20me1_ENCFF909RKY
encode_chipseq_histone,K562.hg38.ENCSR000APC.ENCFF213OTI.ChIPseq.H2AFZ.bed.gz,chipseq_H2AFZ_ENCFF213OTI


## Save results

In [5]:
txt_folder = TXT_REGION_FOLDER
txt_fdiry  = file.path(FD_RES, "region", txt_folder, "summary")
txt_fname  = "description.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

dir.create(txt_fdiry, showWarnings = FALSE)
dat = dat_cnames
write_tsv(dat, txt_fpath)

In [6]:
txt_folder = TXT_REGION_FOLDER
txt_fdiry  = file.path(FD_RES, "region", txt_folder, "summary")
txt_fname  = "metadata.label.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

dir.create(txt_fdiry, showWarnings = FALSE)
dat = dat_region_label
write_tsv(dat, txt_fpath)