**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



**Set global variable**

In [2]:
TXT_FOLDER_INP = "fcc_astarr_csaw"
TXT_FOLDER_OUT = "fcc_table"

## Import data

In [3]:
### set directory
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
dir(txt_fdiry)

In [8]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder, "summary")
txt_fname  = "description.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### assign and show
dat_cnames = dat
fun_display_table(dat)

Name,Note
Chrom,Name of the chromosome
ChromStart,The starting position of the feature in the chromosome
ChromEnd,The ending position of the feature in the chromosome
Name,Name given to a region; Use '.' if no name is assigned.
Score,Score assigned to a region.
Strand,+/- to denote strand or orientation. Use '.' if no orientation is assigned.
Log2FC,"Fold change (normalized output/input ratio, in log2 space)"
Input_CPM,"Input CPM, mean across replicates"
Output_CPM,"Output CPM, mean across replicates"
MinusLog10PValue,-log10 of P-value


In [9]:
### set file path
txt_folder = TXT_FOLDER_INP
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.ASTARR.csaw.KS91.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### read table
vec = dat_cnames$Name
dat = read_tsv(txt_fpath, col_names = vec, show_col_types = FALSE)

### assign and show
dat_region_import = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 352944     14


Chrom,ChromStart,ChromEnd,Name,Score,Strand,Log2FC,Input_CPM,Output_CPM,MinusLog10PValue,MinusLog10QValue,Dataset,Group,Label
chr1,9976,10475,chr1:9976-10475,62,.,-4.184,0.204,0.002,7.358,6.299,KS91,ASTARR,ASTARR_R:csaw:KS91
chr1,14226,14675,chr1:14226-14675,79,.,-6.187,0.209,0.0,9.192,7.917,KS91,ASTARR,ASTARR_R:csaw:KS91
chr1,15976,16525,chr1:15976-16525,42,.,-2.236,0.196,0.013,5.062,4.267,KS91,ASTARR,ASTARR_R:csaw:KS91


In [10]:
dat = dat_region_import
summary(dat$Score)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    4.00   13.00   22.01   29.00  162.00 

## Arrange table

In [14]:
### get table
dat = dat_region_import
vec = c(
    "Chrom", "ChromStart", "ChromEnd", "Group", "Label",
    "Assay", "Region", "Target", "Score", "NLog10P",
    "Method", "Source"
)

dat = dat %>% 
    dplyr::mutate(
        Assay   = "ATAC-STARR",
        Region  = fun_gen_region(Chrom, ChromStart, ChromEnd),
        Target  = NA,
        Score   = Log2FC,
        NLog10P = MinusLog10PValue,
        Method  = "CSAW",
        Source  = "Reddy Lab"
    ) %>%
    dplyr::select(!!!vec)

dat_region_arrange = dat
print(dim(dat))
fun_display_table(head(dat, 3))

[1] 352944     12


Chrom,ChromStart,ChromEnd,Group,Label,Assay,Region,Target,Score,NLog10P,Method,Source
chr1,9976,10475,ASTARR,ASTARR_R:csaw:KS91,ATAC-STARR,chr1:9976-10475,,-4.184,7.358,CSAW,Reddy Lab
chr1,14226,14675,ASTARR,ASTARR_R:csaw:KS91,ATAC-STARR,chr1:14226-14675,,-6.187,9.192,CSAW,Reddy Lab
chr1,15976,16525,ASTARR,ASTARR_R:csaw:KS91,ATAC-STARR,chr1:15976-16525,,-2.236,5.062,CSAW,Reddy Lab


## Export results

In [18]:
### set file path
txt_folder = TXT_FOLDER_OUT
txt_fdiry  = file.path(FD_RES, "region", txt_folder)
txt_fname  = "K562.hg38.fcc_astarr_csaw.bed.gz"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### set table
dat = dat_region_arrange
dat = dat %>% dplyr::arrange(Chrom, ChromStart, ChromEnd)

### write table
write_tsv(dat, txt_fpath, col_names = FALSE)