# ESGF replication request form

![form-submission](../fig/form-submission.png)

This form is intended to request data to be replicated from other ESGF nodes to be made
locally available in the DKRZ CMIP data pool.

The specification of a requested data collection is based on the search facets describing the data collection. These facets correspond directly to the search categories you use to find data  in one of the ESGF portals (e.g. https://esgf-data.dkrz.de/).

## specification of ESGF data to be replicated
To be able to automate the data replication process as much as possible we recommend the following steps, which are supported in this form. In case you have problems with this approach please contact us directly via mail (esgf-replication 'at' dkrz.de). 

- **Step 1:** define your data request based on the search facets you need to characterize the data collection in one of the ESGF portals.
- **Step 2:** write down your facet selection choices in the specific format supported by the [synda replication tool](http://prodiguer.github.io/synda/): 
   - The specification is based on so called [selection files](https://github.com/Prodiguer/synda/blob/master/sdt/doc/selection_file.md) see [examples](thttps://github.com/Prodiguer/synda/tree/master/sdt/selection/sample) for a set of examples
   - put your selection files in this form (see the section "Edit and store your replica selection files" below)
- **Step 3:** Test your selection file(s) with respect to correctnes (see section "check your selection files" below)
- **Step 4:** Test your selection file(s) with respect to data volume adressed (see section "check data volume" below)
- **Step 5:** Submit your replication request

**General remarks:**

- We recommend to install the synda application at your lab in case you have recurring needs for data to be made available at DKRZ, this way you can perform step 1 to step 5 at your lab.
- We recommend to split your request into a set of small well defined selection files instead of specifying one complex file characterizing your data needs


### Please provide your last name and the password for this form

In [1]:
MY_LAST_NAME = "kindermann"   # e.gl MY_LAST_NAME = "schulz" 
#-------------------------------------------------
from dkrz_forms import form_handler, form_widgets
#form_info = form_widgets.check_pwd(MY_LAST_NAME)
#sf = form_handler.init_form(form_info)
#form = sf.sub.entity_out.report

## Step 2: Edit and store your replica selection file(s)

Please provide the facet values charaterizing your data request. You can find the appropriate settings either 
- by using an ESGF portal and remembering your search facets or
- by playing around with the cells below until your request is fully specified or
- by installing the synda tool at your lab and using the tool directly

an [example seclection](http://prodiguer.github.io/synda/sdt/selection_file.html) file looks like: 

     project="CMIP5"
     model="CNRM-CM5 CSIRO-Mk3-6-0"
     experiment="historical amip"
     ensemble="r1i1p1"
     variable[atmos][mon]="tasmin tas psl"
     variable[ocean][fx]="areacello sftof"
     variable[land][mon]="mrsos,nppRoot,nep"
     variable[seaIce][mon]="sic evap"
     variable[ocnBgchem][mon]="dissic fbddtalk"

You can store your request using the cells below by adding `%%writefile seclection/myfilename.txt`  as a first line. Please select "myfilenamee" carefully to be able to remember later the dataset which this file charecterizes e.g. `%%writefile erich_cmip5_atmos_vars_for_exp1.txt`


### store your selection files in cells below

- fill the following cell and evaluate it "Shift-Enter" to store your selection file unter the specified name.
- add new cells for additional selection files using "Insert" --> "Incsert Cell Below" in the top navigation bar

In [23]:
%%writefile selection/tst.txt -

selection_file = """

project="CMIP5"
model="CNRM-CM5"
experiment="historical amip"
ensemble="r1i1p1"
variable[atmos][mon]="tas"

"""

UsageError: unrecognized arguments: -o


In [31]:
# provide the list of selection file names

sel_file_list = ["cmip5_a_c_d","cmip5_e_f_g"]

In [47]:
from IPython.display import display
import ipywidgets as widgets
 
la = widgets.Layout( height='100px',  width='500px')
text_widgets = {}
for file in sel_file_list:
     text_widgets[file] = widgets.Textarea(
         value='# put selection file info for  selection file: '+file+'.txt below',
         placeholder='Type something',
         description="selection:",
         disabled=False,
         layout=la
     )


In [48]:
for (key,val) in text_widgets.items():
    display(val)

In [None]:
%%writefile selection/.....txt 

...please fill .. 

In [28]:
TST.value

u'test = alksjflkakjfk\n\n'

## Step3: Check your selection file(s)

you can use the cells below to check the coorectness of your selection files, specify your selection file names in the `synda dump -s` command below. The first cells lists the files corresponding to your selection file, the second outputs a complete summary of information with respect to your selection file. 

In [10]:


file_list = !synda search -s selection/tst.txt


In [20]:
#print file_list
result_list = []
for entry in file_list:
    parts = entry.split(' ')
    print parts
    result_list.append(parts[2])
print result_list    

['new', '', 'cmip5.output1.CNRM-CERFACS.CNRM-CM5.historical.mon.atmos.Amon.r1i1p1.v20110901']
['new', '', 'cmip5.output1.CNRM-CERFACS.CNRM-CM5.amip.mon.atmos.Amon.r1i1p1.v20111006']
['cmip5.output1.CNRM-CERFACS.CNRM-CM5.historical.mon.atmos.Amon.r1i1p1.v20110901', 'cmip5.output1.CNRM-CERFACS.CNRM-CM5.amip.mon.atmos.Amon.r1i1p1.v20111006']


In [6]:
%%bash 
synda check selection -s selection/test.txt

Checking cordex_cnrm_req27506_20170103.txt..
Checking cordex_dhmz_req27539_20170110.txt..
Checking cordex_hms_req27511_20170105.txt..
Checking cordex_hms_req27511_20170117.txt..
Checking cordex_knmi_req_01_2017.txt..
Checking input4MIPs.txt..
Checking ms_file_select20170905.txt..
Checking test.txt..
Checking test1.txt..


## Step 4: Check the data volume associated to your requests

by evaluating the cell below, the data volume which is associated to your request(s) specified in
the selection file(s). It is just an indication for you to be sure that the request is too big by 
underspecifying your request. The part which actually needs to be replicated and is not yet in the 
DKRZ pool will be communited to you after the submission.

(After the output of the data volume an error is indicated, which is intentional - as the data should not be actually downloaded, an example output looks like:

    47 file(s) will be added to the download queue.
    Once downloaded, 4.6 GB of additional disk space will be used.
    Do you want to continue? [Y/n] 
    *** Error occured at 2017-08-04 15:05:42.932972 ***
    ... error message ....
)

In [8]:
%%bash
synda show -s selection/tst.txt

Please specify a dataset name.


## Step 5: Submit your data replication request 

Please provide the file names of the selection files you tested above and which you now want to submit to the DKRZ data managers. 

In [None]:
form.selection_files =  ["selection/file1.txt", "selection/file2.txt"] # your file names as specified above 
form_handler.save_form(sf,"..my comment..") # add a comment to remember this specific 


form_handler.email_form_info(sf)  # do not change
form_handler.form_submission(sf)  # do not change

In [None]:
form_handler.email_form_info(sf)
form_handler.form_submission(sf)

## Appendix: Example synda calls

play around with synda ..

#### Explore Metadata

example synda calls to search and explore metadata

In [None]:
%%bash 

# synda dump tas GFDL-ESM2M -F line -f -C size,filenam
synda variable tas
# synda search cmip5 MOHC HadGEM2-A amip4
# synda search cmip5 mon atmos -l 1000xCO2 mon atmos Amon r1i1p1

In [None]:
%%bash 

synda -h