**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



## Helper functions

**Define functions**

```
GROUPS  = c("Input", "Output")
SAMPLES = c(
    paste0("Input.rep",  1:6),
    paste0("Output.rep", 1:6))

get_info = function(string, patterns){
    idx = str_detect(string = string, pattern = patterns)
    return(patterns[idx])
}

get_group  = function(strings){
    res = sapply(strings, function(string){get_info(string, GROUPS)})
    return(res)
}

get_sample = function(strings){
    res = sapply(strings, function(string){get_info(string, SAMPLES)})
    return(res)
}
```

In [2]:
###
#VEC_TXT_PREFIX = c("ASTARRseq_K562_KS91", "ASTARRseq_K562_KS274")
VEC_TXT_GROUP  = c("Input", "Output")
VEC_TXT_SAMPLE = c(
    paste0("Input.rep",  1:6),
    paste0("Output.rep", 1:6)
)

### 
#get_prefix = function(strings) {
#    vec = fun_str_map_detect(
#        strings, 
#        vec_txt_pattern = VEC_TXT_PREFIX, 
#        vec_txt_replace = VEC_TXT_PREFIX
#    )
#    return(vec)
#} 

get_prefix = function(strings) {
    lst = str_split(strings, "\\.")
    lst = lapply(lst, function(vec){vec[1]})
    vec = unlist(lst)
    return(vec)
}

###
get_group = function(strings) {
    vec = fun_str_map_detect(
        strings, 
        vec_txt_pattern = VEC_TXT_GROUP, 
        vec_txt_replace = VEC_TXT_GROUP
    )
    return(vec)
}

###
get_sample = function(strings) {
    vec = fun_str_map_detect(
        strings, 
        vec_txt_pattern = VEC_TXT_SAMPLE, 
        vec_txt_replace = VEC_TXT_SAMPLE
    )
    return(vec)
}

**Test functions**

In [3]:
txt = "ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz"
print(get_prefix(txt))
print(get_sample(txt))
print(get_group(txt))

[1] "ASTARRseq_K562_KS91"
[1] "Input.rep1"
[1] "Input"


## Metadata for ASTARR fragment counts (KS91)

**Get the file names for fragment counts**

In [4]:
### set directory
txt_assay  = "STARR_ATAC_K562_Reddy_KS91"
txt_folder = "fragment_counts"
txt_fdiry  = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder)
txt_fname  = "*WGS*bed.gz"
txt_fglob  = file.path(txt_fdiry, txt_fname)

### get file path and name
vec_txt_fpath = Sys.glob(txt_fglob)
vec_txt_fname = basename(vec_txt_fpath)

### show
for (txt in vec_txt_fname){cat(txt, "\n")}

ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep2.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep3.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep4.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep5.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep6.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep1.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep2.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep3.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep4.WGS.unstranded.bed.gz 


**Get sample info**

In [5]:
### init
dat = data.frame(
    FName  = vec_txt_fname,
    Assay  = txt_assay
)

### get sample info
dat = dat %>%
    dplyr::mutate(
        Prefix = get_prefix(FName),
        Group  = get_group(FName),
        Sample = get_sample(FName)
    ) %>%
    dplyr::arrange(Sample)

### assign and show
dat_meta = dat
fun_display_table(dat)

FName,Assay,Prefix,Group,Sample
ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Input,Input.rep1
ASTARRseq_K562_KS91.hg38.Input.rep2.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Input,Input.rep2
ASTARRseq_K562_KS91.hg38.Input.rep3.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Input,Input.rep3
ASTARRseq_K562_KS91.hg38.Input.rep4.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Input,Input.rep4
ASTARRseq_K562_KS91.hg38.Input.rep5.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Input,Input.rep5
ASTARRseq_K562_KS91.hg38.Input.rep6.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Input,Input.rep6
ASTARRseq_K562_KS91.hg38.Output.rep1.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Output,Output.rep1
ASTARRseq_K562_KS91.hg38.Output.rep2.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Output,Output.rep2
ASTARRseq_K562_KS91.hg38.Output.rep3.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Output,Output.rep3
ASTARRseq_K562_KS91.hg38.Output.rep4.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS91,ASTARRseq_K562_KS91,Output,Output.rep4


**Save results**

In [7]:
### set directory
txt_fdiry = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder, "summary")
txt_fname = "metadata.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

txt = paste("mkdir -p", txt_fdiry)
system(txt)

### write table
dat = dat_meta
write_tsv(dat, txt_fpath)

## Metadata for ASTARR fragment counts (KS274)

**Get the file names for fragment counts**

In [8]:
### set directory
txt_assay  = "STARR_ATAC_K562_Reddy_KS274"
txt_folder = "fragment_counts"
txt_fdiry  = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder)
txt_fname  = "*WGS*bed.gz"
txt_fglob  = file.path(txt_fdiry, txt_fname)

### get file path and name
vec_txt_fpath = Sys.glob(txt_fglob)
vec_txt_fname = basename(vec_txt_fpath)

### show
for (txt in vec_txt_fname){cat(txt, "\n")}

ASTARRseq_K562_KS274.hg38.Output.rep1.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS274.hg38.Output.rep2.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS274.hg38.Output.rep3.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep2.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep3.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep4.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep5.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep6.WGS.unstranded.bed.gz 


**Get sample info**

In [9]:
### init
dat = data.frame(
    FName  = vec_txt_fname,
    Assay  = txt_assay
)

### get sample info
dat = dat %>%
    dplyr::mutate(
        Prefix = get_prefix(FName),
        Group  = get_group(FName),
        Sample = get_sample(FName)
    ) %>%
    dplyr::arrange(Sample)

### assign and show
dat_meta = dat
fun_display_table(dat)

FName,Assay,Prefix,Group,Sample
ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS91,Input,Input.rep1
ASTARRseq_K562_KS91.hg38.Input.rep2.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS91,Input,Input.rep2
ASTARRseq_K562_KS91.hg38.Input.rep3.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS91,Input,Input.rep3
ASTARRseq_K562_KS91.hg38.Input.rep4.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS91,Input,Input.rep4
ASTARRseq_K562_KS91.hg38.Input.rep5.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS91,Input,Input.rep5
ASTARRseq_K562_KS91.hg38.Input.rep6.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS91,Input,Input.rep6
ASTARRseq_K562_KS274.hg38.Output.rep1.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS274,Output,Output.rep1
ASTARRseq_K562_KS274.hg38.Output.rep2.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS274,Output,Output.rep2
ASTARRseq_K562_KS274.hg38.Output.rep3.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KS274,ASTARRseq_K562_KS274,Output,Output.rep3


**Save results**

In [11]:
### set directory
txt_fdiry = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder, "summary")
txt_fname = "metadata.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

txt = paste("mkdir -p", txt_fdiry)
system(txt)

### write table
dat = dat_meta
write_tsv(dat, txt_fpath)

## Metadata for ASTARR fragment counts (KS Merge)

**Get the file names for fragment counts**

In [12]:
### set directory
txt_assay  = "STARR_ATAC_K562_Reddy_KSMerge"
txt_folder = "fragment_counts"
txt_fdiry  = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder)
txt_fname  = "*WGS*bed.gz"
txt_fglob  = file.path(txt_fdiry, txt_fname)

### get file path and name
vec_txt_fpath = Sys.glob(txt_fglob)
vec_txt_fname = basename(vec_txt_fpath)

### show
for (txt in vec_txt_fname){cat(txt, "\n")}

ASTARRseq_K562_KS274.hg38.Output.rep1.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS274.hg38.Output.rep2.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS274.hg38.Output.rep3.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep2.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep3.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep4.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep5.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Input.rep6.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep1.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep2.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep3.WGS.unstranded.bed.gz 
ASTARRseq_K562_KS91.hg38.Output.rep4.WGS.unstranded.bed.gz 


**Get sample info**

In [13]:
### init
dat = data.frame(
    FName  = vec_txt_fname,
    Assay  = txt_assay
)

### get sample info
dat = dat %>%
    dplyr::mutate(
        Prefix = get_prefix(FName),
        Group  = get_group(FName),
        Sample = get_sample(FName)
    ) %>%
    dplyr::mutate(
        Sample = case_when(
            Prefix == "ASTARRseq_K562_KS274" & Sample == "Output.rep1" ~ "Output.rep5",
            Prefix == "ASTARRseq_K562_KS274" & Sample == "Output.rep2" ~ "Output.rep6",
            Prefix == "ASTARRseq_K562_KS274" & Sample == "Output.rep3" ~ "Output.rep7",
            .default = as.character(Sample)
        )
    ) %>%
    dplyr::arrange(Sample)

### assign and show
dat_meta = dat
fun_display_table(dat)

FName,Assay,Prefix,Group,Sample
ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Input,Input.rep1
ASTARRseq_K562_KS91.hg38.Input.rep2.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Input,Input.rep2
ASTARRseq_K562_KS91.hg38.Input.rep3.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Input,Input.rep3
ASTARRseq_K562_KS91.hg38.Input.rep4.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Input,Input.rep4
ASTARRseq_K562_KS91.hg38.Input.rep5.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Input,Input.rep5
ASTARRseq_K562_KS91.hg38.Input.rep6.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Input,Input.rep6
ASTARRseq_K562_KS91.hg38.Output.rep1.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Output,Output.rep1
ASTARRseq_K562_KS91.hg38.Output.rep2.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Output,Output.rep2
ASTARRseq_K562_KS91.hg38.Output.rep3.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Output,Output.rep3
ASTARRseq_K562_KS91.hg38.Output.rep4.WGS.unstranded.bed.gz,STARR_ATAC_K562_Reddy_KSMerge,ASTARRseq_K562_KS91,Output,Output.rep4


**Save results**

In [14]:
### set directory
txt_fdiry = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder, "summary")
txt_fname = "metadata.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

txt = paste("mkdir -p", txt_fdiry)
system(txt)

### write table
dat = dat_meta
write_tsv(dat, txt_fpath)

## Metadata for WSTARR fragment counts

**Get the file names for fragment counts**

In [40]:
### set directory
txt_assay  = "STARR_WHG_K562_Reddy_A001"
txt_folder = "fragment_counts"
txt_fdiry  = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder)
txt_fname  = "*WGS*bed.gz"
txt_fglob  = file.path(txt_fdiry, txt_fname)

### get file path and name
vec_txt_fpath = Sys.glob(txt_fglob)
vec_txt_fname = basename(vec_txt_fpath)

### show
for (txt in vec_txt_fname){cat(txt, "\n")}

WSTARRseq_K562_A001.hg38.Input.rep1.WGS.unstranded.bed.gz 
WSTARRseq_K562_A001.hg38.Input.rep2.WGS.unstranded.bed.gz 
WSTARRseq_K562_A001.hg38.Input.rep3.WGS.unstranded.bed.gz 
WSTARRseq_K562_A001.hg38.Input.rep4.WGS.unstranded.bed.gz 
WSTARRseq_K562_A001.hg38.Output.rep1.WGS.unstranded.bed.gz 
WSTARRseq_K562_A001.hg38.Output.rep2.WGS.unstranded.bed.gz 
WSTARRseq_K562_A001.hg38.Output.rep3.WGS.unstranded.bed.gz 


**Get sample info**

In [41]:
### init
dat = data.frame(
    FName  = vec_txt_fname,
    Assay  = txt_assay
)

### get sample info
dat = dat %>%
    dplyr::mutate(
        Prefix = get_prefix(FName),
        Group  = get_group(FName),
        Sample = get_sample(FName)
    ) %>%
    dplyr::arrange(Sample)

### assign and show
dat_meta = dat
fun_display_table(dat)

FName,Assay,Prefix,Group,Sample
WSTARRseq_K562_A001.hg38.Input.rep1.WGS.unstranded.bed.gz,STARR_WHG_K562_Reddy_A001,WSTARRseq_K562_A001,Input,Input.rep1
WSTARRseq_K562_A001.hg38.Input.rep2.WGS.unstranded.bed.gz,STARR_WHG_K562_Reddy_A001,WSTARRseq_K562_A001,Input,Input.rep2
WSTARRseq_K562_A001.hg38.Input.rep3.WGS.unstranded.bed.gz,STARR_WHG_K562_Reddy_A001,WSTARRseq_K562_A001,Input,Input.rep3
WSTARRseq_K562_A001.hg38.Input.rep4.WGS.unstranded.bed.gz,STARR_WHG_K562_Reddy_A001,WSTARRseq_K562_A001,Input,Input.rep4
WSTARRseq_K562_A001.hg38.Output.rep1.WGS.unstranded.bed.gz,STARR_WHG_K562_Reddy_A001,WSTARRseq_K562_A001,Output,Output.rep1
WSTARRseq_K562_A001.hg38.Output.rep2.WGS.unstranded.bed.gz,STARR_WHG_K562_Reddy_A001,WSTARRseq_K562_A001,Output,Output.rep2
WSTARRseq_K562_A001.hg38.Output.rep3.WGS.unstranded.bed.gz,STARR_WHG_K562_Reddy_A001,WSTARRseq_K562_A001,Output,Output.rep3


**Save results**

In [7]:
### set directory
txt_fdiry = file.path(FD_RES, "assay_fcc", txt_assay, txt_folder, "summary")
txt_fname = "metadata.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

txt = paste("mkdir -p", txt_fdiry)
system(txt)

### write table
dat = dat_metadata_astarr
write_tsv(dat, txt_fpath)

## Review

**Check results**

In [20]:
txt_fdiry = file.path(FD_RES, "assay_fcc", "STARR*", "fragment_counts", "summary")
txt_fname = "metadata.tsv"
txt_fglob = file.path(txt_fdiry, txt_fname)

vec_txt_fpath = Sys.glob(txt_fglob)
for (txt_fpath in vec_txt_fpath){
    cat(txt_fpath, "\n")
    txt_cmd = paste("cat", txt_fpath)
    
    vec = system(txt_cmd, intern = TRUE)
    for (txt in vec){
        cat(txt, "\n")
    }
    cat("\n")
}

/mount/repo/Proj_ENCODE_FCC/results/assay_fcc/STARR_ATAC_K562_Reddy_KS274/fragment_counts/summary/metadata.tsv 
FName	Assay	Prefix	Group	Sample 
ASTARRseq_K562_KS274.hg38.Output.rep1.WGS.unstranded.bed.gz	STARR_ATAC_K562_Reddy_KS274	ASTARRseq_K562_KS274	Output	Output.rep1 
ASTARRseq_K562_KS274.hg38.Output.rep2.WGS.unstranded.bed.gz	STARR_ATAC_K562_Reddy_KS274	ASTARRseq_K562_KS274	Output	Output.rep2 
ASTARRseq_K562_KS274.hg38.Output.rep3.WGS.unstranded.bed.gz	STARR_ATAC_K562_Reddy_KS274	ASTARRseq_K562_KS274	Output	Output.rep3 

/mount/repo/Proj_ENCODE_FCC/results/assay_fcc/STARR_ATAC_K562_Reddy_KS91/fragment_counts/summary/metadata.tsv 
FName	Assay	Prefix	Group	Sample 
ASTARRseq_K562_KS91.hg38.Input.rep1.WGS.unstranded.bed.gz	STARR_ATAC_K562_Reddy_KS91	ASTARRseq_K562_KS91	Input	Input.rep1 
ASTARRseq_K562_KS91.hg38.Input.rep2.WGS.unstranded.bed.gz	STARR_ATAC_K562_Reddy_KS91	ASTARRseq_K562_KS91	Input	Input.rep2 
ASTARRseq_K562_KS91.hg38.Input.rep3.WGS.unstranded.bed.gz	STARR_ATAC_K562_Red