**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



## Import metadata from reference file

In [3]:
txt_fdiry = file.path(FD_REF, "encode_chipseq_flagship")
dir(txt_fdiry)

In [4]:
### set file path
txt_fdiry = file.path(FD_REF, "encode_chipseq_flagship")
txt_fname = "ENCODE4 TF Accessions for Flagship_updated221025.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### show and assign
dat_metadata_import = dat
print(dim(dat))
fun_display_table(head(dat))

[1] 3092    6


Experiment Accession,Peak Accession,Biosample,Target,Lab,RFA
ENCSR753RME,ENCFF917CYG,Homo sapiens testis tissue male adult (37 years),CTCF,"Bradley Bernstein, Broad",ENCODE4
ENCSR992XTY,ENCFF112GJQ,Homo sapiens WTC11,CTCF,"Bradley Bernstein, Broad",ENCODE4
ENCSR934GQS,ENCFF483KVM,Homo sapiens ovary tissue female adult (41 years),CTCF,"Bradley Bernstein, Broad",ENCODE4
ENCSR164ASX,ENCFF238PEB,Homo sapiens upper lobe of right lung tissue male adult (60 years),CTCF,"Bradley Bernstein, Broad",ENCODE4
ENCSR000VGD,ENCFF870ZRR,Homo sapiens right lobe of liver tissue female adult (47 years),CTCF,"Bradley Bernstein, Broad",ENCODE4
ENCSR856TKC,ENCFF072KJO,Homo sapiens natural killer cell male adult (33 years),CTCF,"Bradley Bernstein, Broad",ENCODE4


## Arrange data

**Select for K562 only**

In [6]:
dat = dat_metadata_import
dat = dat %>% dplyr::filter(str_detect(Biosample, "K562"))

dat_metadata = dat
print(dim(dat))
head(dat)

[1] 735   6


Experiment Accession,Peak Accession,Biosample,Target,Lab,RFA
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
ENCSR805SIA,ENCFF500BWO,Homo sapiens K562 genetically modified (insertion) using CRISPR targeting H. sapiens PURB,PURB,"Richard Myers, HAIB",ENCODE4
ENCSR125RFR,ENCFF863ZFH,Homo sapiens K562 genetically modified (insertion) using CRISPR targeting H. sapiens ATF6,ATF6,"Richard Myers, HAIB",ENCODE4
ENCSR841GLE,ENCFF515LWL,Homo sapiens K562 genetically modified (insertion) using CRISPR targeting H. sapiens ZNF217,ZNF217,"Richard Myers, HAIB",ENCODE4
ENCSR014ARU,ENCFF121HYT,Homo sapiens K562 genetically modified (insertion) using CRISPR targeting H. sapiens ATF2,ATF2,"Richard Myers, HAIB",ENCODE4
ENCSR172XJS,ENCFF169QYL,Homo sapiens K562 genetically modified (insertion) using CRISPR targeting H. sapiens ZNF165,ZNF165,"Michael Snyder, Stanford",ENCODE4
ENCSR676KDJ,ENCFF528IDR,Homo sapiens K562 genetically modified (insertion) using CRISPR targeting H. sapiens ARID4B,ARID4B,"Michael Snyder, Stanford",ENCODE4


Show target names

In [7]:
dat = dat_metadata
vec = unique(dat$Target)
print(head(vec))
print(tail(vec))

[1] "PURB"   "ATF6"   "ZNF217" "ATF2"   "ZNF165" "ARID4B"
[1] "HDAC6"           "CHD7"            "POLR2AphosphoS5" "SAP30"          
[5] "KDM5B"           "CBX8"           


## Generate download commands
```
wget -O FILE URL
```

In [8]:
### init
dat = dat_metadata

### setup download file name and wget command
dat = dat %>%
    dplyr::mutate(
        File_Name = paste(
            "K562", 
            "hg38", 
            `Experiment Accession`, 
            `Peak Accession`,
            "ChIPseq",
            Target,
            "bed.gz", 
            sep=".")
    ) %>%
    dplyr::mutate(
        File_URL_Download = file.path(
            "https://www.encodeproject.org/files",
            `Peak Accession`,
            "@@download",
            paste(`Peak Accession`, "bed.gz", sep = ".")
        )
    ) %>%
    dplyr::mutate(
        CMD = paste("wget", "--append-output=run_download.log.txt", "-O", File_Name, File_URL_Download)
    )

### add Shebang and initial commands
#dat = dat %>% dplyr::select(Assay, Biosample, Index_Experiment, Index_File, File_Name, CMD)
dat = dat %>% dplyr::select(CMD)
dat = rbind('echo -n "" > run_download.log.txt', dat)
colnames(dat) = "#!/bin/bash"

### assign and show
dat_cmd = dat
print(dim(dat))
fun_display_table(head(dat))

[1] 736   1


#!/bin/bash
"echo -n """" > run_download.log.txt"
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR805SIA.ENCFF500BWO.ChIPseq.PURB.bed.gz https://www.encodeproject.org/files/ENCFF500BWO/@@download/ENCFF500BWO.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR125RFR.ENCFF863ZFH.ChIPseq.ATF6.bed.gz https://www.encodeproject.org/files/ENCFF863ZFH/@@download/ENCFF863ZFH.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR841GLE.ENCFF515LWL.ChIPseq.ZNF217.bed.gz https://www.encodeproject.org/files/ENCFF515LWL/@@download/ENCFF515LWL.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR014ARU.ENCFF121HYT.ChIPseq.ATF2.bed.gz https://www.encodeproject.org/files/ENCFF121HYT/@@download/ENCFF121HYT.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR172XJS.ENCFF169QYL.ChIPseq.ZNF165.bed.gz https://www.encodeproject.org/files/ENCFF169QYL/@@download/ENCFF169QYL.bed.gz


## Save to script
Save the command lines for each row into a bash script

In [9]:
### set output path
txt_fdiry = file.path(FD_DAT, "external", "encode_chipseq_flagship")
txt_fname = "run_download.sh"
txt_fpath = file.path(txt_fdiry, txt_fname)

### arrange table
dat = dat_cmd
colnames(dat) = "#!/bin/bash"

### save table
dir.create(txt_fdiry, showWarnings = FALSE)
write_tsv(dat, txt_fpath)  