**Set environment**

In [1]:
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()

You are working on        Singularity 
BASE DIRECTORY (FD_BASE): /mount 
REPO DIRECTORY (FD_REPO): /mount/repo 
WORK DIRECTORY (FD_WORK): /mount/work 
DATA DIRECTORY (FD_DATA): /mount/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /mount/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /mount/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /mount/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /mount/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /mount/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /mount/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /mount/repo/Proj_ENCODE_FCC/log 
PROJECT APP     (FD_APP): /mount/repo/Proj_ENCODE_FCC/app 
PROJECT REF     (FD_REF): /mount/repo/Proj_ENCODE_FCC/references 



**Set global variable**

In [2]:
TXT_FOLDER = "encode_rnaseq"

## Import metadata from reference file

In [3]:
txt_folder = TXT_FOLDER
txt_fdiry  = file.path(FD_REF, txt_folder)
dir(txt_fdiry)

In [4]:
### set file path
txt_folder = TXT_FOLDER
txt_fdiry  = file.path(FD_REF, txt_folder)
txt_fname = "ENCODE_K562_hg38_RNAseq.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### show and assign
dat_metadata = dat
fun_display_table(dat)

Assay,Biosample,Index_Experiment,Index_Process,Index_File,File_Type,Output_Type,Genome,Lab
RNA-seq (total RNA-seq),K562,ENCSR615EEK,ENCODE4 v1.2.1 GRCh38 V29 (ENCAN412LMP) processed data,ENCFF421TJX,tsv,gene quantifications,hg38,"Barbara Wold, Caltech"
RNA-seq (total RNA-seq),K562,ENCSR615EEK,ENCODE4 v1.2.1 GRCh38 V29 (ENCAN412LMP) processed data,ENCFF585HTZ,bigWig,plus strand signal of unique reads,hg38,"Barbara Wold, Caltech"
RNA-seq (total RNA-seq),K562,ENCSR615EEK,ENCODE4 v1.2.1 GRCh38 V29 (ENCAN412LMP) processed data,ENCFF876JOV,bigWig,minus strand signal of unique reads,hg38,"Barbara Wold, Caltech"


## Generate download commands
```
wget -O FILE URL
```

In [5]:
fun_get_file_ext1 = function(txt) {
    vec1 = c("bed",    "bigWig")
    vec2 = c("bed.gz", "bw")
    res  = fun_str_map_detect(txt, vec1, vec2, .default=txt)
    return(res)
}

fun_get_file_ext2 = function(txt) {
    vec1 = c("bed")
    vec2 = c("bed.gz")
    res  = fun_str_map_detect(txt, vec1, vec2, .default=txt)
    return(res)
}

fun_get_file_label = function(txt){
    vec1 = c("quantifications", "plus strand", "minus strand")
    vec2 = c("total", "total.strand_pos", "total.strand_neg")
    res  = fun_str_map_detect(txt, vec1, vec2, .default=txt)
    return(res)
}

dat  = dat_metadata
vec1 = dat$File_Type
vec2 = dat$Output_Type
dat  = data.frame(
    File_Type    = vec1,
    File_Ext1    = fun_get_file_ext1(vec1),
    File_Ext2    = fun_get_file_ext2(vec1),
    Output_Type  = vec2,
    Output_Label = fun_get_file_label(vec2)
)
dat

File_Type,File_Ext1,File_Ext2,Output_Type,Output_Label
<chr>,<chr>,<chr>,<chr>,<chr>
tsv,tsv,tsv,gene quantifications,total
bigWig,bw,bigWig,plus strand signal of unique reads,total.strand_pos
bigWig,bw,bigWig,minus strand signal of unique reads,total.strand_neg


In [6]:
### init
dat = dat_metadata

### setup download file name and wget command
dat = dat %>%
    dplyr::mutate(
        File_Ext1  = fun_get_file_ext1(File_Type),
        File_Ext2  = fun_get_file_ext2(File_Type),
        File_Assay = "RNAseq",
        File_Label = fun_get_file_label(Output_Type)
    ) %>%
    dplyr::mutate(
        File_Name = paste(
            Biosample, 
            Genome, 
            Index_Experiment, 
            Index_File,
            File_Assay,
            File_Label,
            File_Ext1, 
            sep=".")
    ) %>%
    dplyr::mutate(
        File_URL_Download = file.path(
            "https://www.encodeproject.org/files",
            Index_File,
            "@@download",
            paste(Index_File, File_Ext2, sep = ".")
        )
    ) %>%
    dplyr::mutate(
        CMD = paste("wget", "--append-output=run_download.log.txt", "-O", File_Name, File_URL_Download)
    )

### add Shebang and initial commands
#dat = dat %>% dplyr::select(Assay, Biosample, Index_Experiment, Index_File, File_Name, CMD)
dat = dat %>% dplyr::select(CMD)
dat = rbind('echo -n "" > run_download.log.txt', dat)
colnames(dat) = "#!/bin/bash"

### assign and show
dat_cmd = dat
fun_display_table(dat)

#!/bin/bash
"echo -n """" > run_download.log.txt"
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR615EEK.ENCFF421TJX.RNAseq.total.tsv https://www.encodeproject.org/files/ENCFF421TJX/@@download/ENCFF421TJX.tsv
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR615EEK.ENCFF585HTZ.RNAseq.total.strand_pos.bw https://www.encodeproject.org/files/ENCFF585HTZ/@@download/ENCFF585HTZ.bigWig
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR615EEK.ENCFF876JOV.RNAseq.total.strand_neg.bw https://www.encodeproject.org/files/ENCFF876JOV/@@download/ENCFF876JOV.bigWig


## Save to script
Save the command lines for each row into a bash script

In [7]:
### set output path
txt_folder = TXT_FOLDER
txt_fdiry  = file.path(FD_DAT, "external", txt_folder)
txt_fname  = "run_download.sh"
txt_fpath  = file.path(txt_fdiry, txt_fname)

### save table
dir.create(txt_fdiry, showWarnings = FALSE)
dat = dat_cmd
write_tsv(dat, txt_fpath)  