**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



## Import metadata from reference file
Read the table with the file accession numbers of ATAC peaks and DHS regions

In [2]:
### set file path
txt_fdiry = file.path(FD_REF, "encode_chromatin_states")
txt_fname = "ENCODE_K562_hg38_chromatin_states.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### show and assign
dat_metadata = dat
fun_display_table(dat)

Assay,Biosample,Index_Experiment,Index_Process,Index_File,File_Type,Output_Type,Genome,Encyclopedia version,Lab,Description
cCREs,K562,ENCSR913HQX,Lab custom GRCh38 (ENCAN130HDM) processed data,ENCFF286VQG,bed bed9+,candidate Cis-Regulatory Elements,hg38,ENCODE v4,"Zhiping Weng, Umass",candidate regulatory elements for GRCh38 in K562
ChromHMM,K562,ENCSR365YNI,Lab custom GRCh38 (ENCAN395TNA) processed data,ENCFF106BGJ,bed bed9,semi-automated genome annotation,hg38,ENCODE v4,"Zhiping Weng, Umass",ChromHMM 15-state model of K562


## Generate download commands
```
wget -O FILE URL
```

In [3]:
### init
dat = dat_metadata

### setup download file name and wget command
dat = dat %>%
    dplyr::mutate(
        File_Name = paste(
            Biosample, 
            Genome, 
            Index_Experiment, 
            Index_File,
            Assay,
            "bed.gz", 
            sep=".")
    ) %>%
    dplyr::mutate(
        File_URL_Download = file.path(
            "https://www.encodeproject.org/files",
            Index_File,
            "@@download",
            paste(Index_File, "bed.gz", sep = ".")
        )
    ) %>%
    dplyr::mutate(
        CMD = paste("wget", "--append-output=run_download.log.txt", "-O", File_Name, File_URL_Download)
    )

### add Shebang and initial commands
#dat = dat %>% dplyr::select(Assay, Biosample, Index_Experiment, Index_File, File_Name, CMD)
dat = dat %>% dplyr::select(CMD)
dat = rbind('echo -n "" > run_download.log.txt', dat)
colnames(dat) = "#!/bin/bash"

### assign and show
dat_cmd = dat
fun_display_table(dat)

#!/bin/bash
"echo -n """" > run_download.log.txt"
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR913HQX.ENCFF286VQG.cCREs.bed.gz https://www.encodeproject.org/files/ENCFF286VQG/@@download/ENCFF286VQG.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR365YNI.ENCFF106BGJ.ChromHMM.bed.gz https://www.encodeproject.org/files/ENCFF106BGJ/@@download/ENCFF106BGJ.bed.gz


## Save to script
Save the command lines for each row into a bash script

In [4]:
### set output path
txt_fdiry = file.path(FD_DAT, "external", "encode_chromatin_states")
txt_fname = "run_download.sh"
txt_fpath = file.path(txt_fdiry, txt_fname)

### arrange table
dat = dat_cmd
colnames(dat) = "#!/bin/bash"

### save table
dir.create(txt_fdiry, showWarnings = FALSE)
write_tsv(dat, txt_fpath)  