# DKRZ CMIP6 data submission form for ESGF publication

![form-submission](../fig/form-submission.png)

### Overview

You want to store and publish CMIP6 data at DKRZ ? Then this form will provide some background information and guide you through the process. To organize the data ingest and data publication process we need some information from you with respect to the specific CMIP6 subset you want to transfer to DKRZ. Thus in this form we provide you with a generic overview of preconditions you need to be aware of before submitting data. Thereafter we collect general properties about the CMIP6 subset you want to provide. The form has to be filled before the publication process can be started. In case you have questions please contact data-ingest@dkrz.de.	

#### Preconditions for your data submission

You need to be aware of a set of technical requirements which have to be addressed before CMIP6 data submission to DKRZ and ESGF data publication are possible. They are collected at the  official [WCRP CMP Phase6 (CMIP6) site](https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6) in the [Guide to CMP6 Participation](https://pcmdi.llnl.gov/CMIP6/Guide/). In the following a short summary of key prerequisites is given:

* Your institution as well as your model has to be registered on the [WCRP-CMIP github site](https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/.github/RegistrationGuidance.md)
* Contact and citation information has to be registered in the [citation GUI](http://cera-www.dkrz.de/citeXA) [documentation of GUI](http://cera-www.dkrz.de/docs/pdf/CMIP6_Citation_Userguide.pdf?id=37)

* Your data conforms to the [CMIP6 specifications for file names, directory structures and CMIP6 Data Reference Syntax (DRS)](http://goo.gl/v1drZl) 
   * Directory structure:
   <pre><code>
    &lt;mip_era&gt;/&lt;activity_id&gt;/&lt;institution_id&gt;/&lt;source_id&gt;/
        &lt;experiment_id&gt;/&lt;member_id&gt;/&lt;table_id&gt;/&lt;variable_id&gt;/&lt;grid_label&gt;/version&gt;
   </code>
   </pre>
                     
   * File naming convention: 
   <pre><code>   &lt;variable_id&gt;_&lt;table_id&gt;_&lt;source_id&gt;_&lt;experiment_id&gt;&lt;member_id&gt;
            _&lt;grid_label[_&lt;time_range&gt;].nc
   </code>
   </pre>  

   * Please make sure your data is quality checked before submission to a data center. Two tools for checking are recommended:
      * CMOR/PREPARE checker (minimal check): 
        * github: https://github.com/PCMDI/cmor
        * documentation: https://cmor.llnl.gov/mydoc_cmip6_validator/
      * DKRZ_QA checker (incluces CMOR/PREPARE checker optionally):
        * github: https://github.com/IS-ENES-Data/QA-DKRZ
        * documentation: http://qa-dkrz.readthedocs.io/en/latest/

# Start submission procedure
The submission is based on this interactive document consisting of "cells" you can modify and then evaluate.
Evaluation of cells is done by selecting the cell and pressing the keys "Shift" + "Enter".
<br /> Please fill and evaluate the following cell to initialize your form based on the information provided as part of the form generation (name, email, etc.)

In [None]:
MY_LAST_NAME = "...."   # e.g. MY_LAST_NAME = "schulz" 

#-------------------------------------------------

from dkrz_forms import form_handler, form_widgets, checks
form_info = form_widgets.check_pwd(MY_LAST_NAME)
sf = form_handler.init_form(form_info)
form = sfg.sub.entity_out.report

## Step 1: provide generic data submission related information 

### Type of submission
please specify the type of this data submission:
- "initial_version" for first submission of data
- "new _version" for a re-submission of previousliy submitted data
- "retract" for the request to retract previously submitted data

In [None]:
form.submission_type = "..."  # example: sf.submission_type = "initial_version"

In [None]:
### to be discussed

- contact information
- technical info, e.g. used cmor version 
- compliance with CMIP6 dir structure and naming convention
- ....

## Step 2: provide generic model related information 

- generic model related facets characterizing the submission
  - institute_id
  - model_id 
   ....
- es-doc known and filled or filling planed ?


In [None]:
form.institution = "..." # example: sf.institution = "Alfred Wegener Institute"

##### institute_id
The value of this field has to equal the value of the global NetCDF attribute 'institute_id' 
in the data files and must equal the 4th directory level. It is needed before the publication 
process is started in order that the value can be added to the relevant CORDEX list of CV1 
if not yet there. Note that 'institute_id' has to be the first part of 'model_id'

In [None]:
form.institute_id = "..." # example: sf.institute_id = "AWI"

##### model_id
The value of this field has to be the value of the global NetCDF attribute 'model_id' 
in the data files. It is needed before the publication process is started in order that 
the value can be added to the relevant CORDEX list of CV1 if not yet there.
Note that it must be composed by the 'institute_id' follwed by the RCM CORDEX model name, 
separated by a dash. It is part of the file name and the directory structure.

In [None]:
form.model_id = "..." # example: sf.model_id = "AWI-HIRHAM5"

## Step 3: Provide specific information for this submission

- variables, grid, calendar, ... 
- example file name
- ..

#### experiment_id and time_period
Experiment has to equal the value of the global NetCDF attribute 'experiment_id' 
in the data files. Time_period gives the period of data for which the publication 
request is submitted. If you intend to submit data from multiple experiments you may 
add one line for each additional experiment or send in additional publication request sheets.

In [None]:
form.experiment_id = "..."  # example: sf.experiment_id = "evaluation"
                          # ["value_a","value_b"] in case of multiple experiments
form.time_period = "..." # example: sf.time_period = "197901-201412" 
                       # ["time_period_a","time_period_b"] in case of multiple values

#### Example file name 
Please provide an example file name of a file in your data collection, 
this name will be used to derive the other 

In [None]:
form.example_file_name = "..." # example: sf.example_file_name = "tas_AFR-44_MPI-M-MPI-ESM-LR_rcp26_r1i1p1_MPI-CSC-REMO2009_v1_mon_yyyymm-yyyymm.nc"

In [None]:
# Please run this cell as it is to check your example file name structure 
# to_do: implement submission_form_check_file function - output result (attributes + check_result)
form_handler.cordex_file_info(sf,sf.example_file_name)

#### information on the grid_mapping

the NetCDF/CF name of the data grid ('rotated_latitude_longitude', 'lambert_conformal_conic', etc.), 
i.e. either that of the native model grid, or 'latitude_longitude' for the regular -XXi grids

In [None]:
form.grid_mapping_name = "..." # example: sf.grid_mapping_name = "rotated_latitude_longitude"

Does the grid configuration exactly follow the specifications in ADD2 (Table 1) 
in case the native grid  is 'rotated_pole'? If not, comment on the differences; otherwise write 'yes' or 'N/A'. If the data is not delivered on the computational grid it has to be noted here as well.

In [None]:
form.grid_as_specified_if_rotated_pole = "..." # example: sf.grid_as_specified_if_rotated_pole = "yes"

## Variable list
list of variables submitted -- .. support tooling needed probably .. to be defined:

In [None]:

form.variable_list_day = [
"clh","clivi","cll","clm","clt","clwvi",
"evspsbl","evspsblpot",
"hfls","hfss","hurs","huss","hus850",
"mrfso","mrro","mrros","mrso",
"pr","prc","prhmax","prsn","prw","ps","psl",
"rlds","rlus","rlut","rsds","rsdt","rsus","rsut",
"sfcWind","sfcWindmax","sic","snc","snd","snm","snw","sund",
"tas","tasmax","tasmin","tauu","tauv","ta200","ta500","ta850","ts",
"uas","ua200","ua500","ua850",
"vas","va200","va500","va850","wsgsmax",
"zg200","zg500","zmla"
]

form.variable_list_mon = [
"clt",
"evspsbl",
"hfls","hfss","hurs","huss","hus850",
"mrfso","mrro","mrros","mrso",
"pr","psl",
"rlds","rlus","rlut","rsds","rsdt","rsus","rsut",
"sfcWind","sfcWindmax","sic","snc","snd","snm","snw","sund",
"tas","tasmax","tasmin","ta200",
"ta500","ta850",
"uas","ua200","ua500","ua850",
"vas","va200","va500","va850",
"zg200","zg500"
]
form.variable_list_sem = [
"clt",
"evspsbl",
"hfls","hfss","hurs","huss","hus850",
"mrfso","mrro","mrros","mrso",
"pr","psl",
"rlds","rlus","rlut","rsds","rsdt","rsus","rsut",
"sfcWind","sfcWindmax","sic","snc","snd","snm","snw","sund",
"tas","tasmax","tasmin","ta200","ta500","ta850",
"uas","ua200","ua500","ua850",
"vas","va200","va500","va850",
"zg200","zg500"  
]

form.variable_list_fx = [
"areacella",
"mrsofc",
"orog",
"rootd",
"sftgif","sftlf"   
]

#### Exclude variable list

In each CORDEX file there may be only one variable which shall be published and searchable at the ESGF portal (target variable). In order to facilitate publication, all non-target variables are included in a list used by the publisher to avoid publication. A list of known non-target variables is [time, time_bnds, lon, lat, rlon ,rlat ,x ,y ,z ,height, plev, Lambert_Conformal, rotated_pole]. Please enter other variables into the left field if applicable (e.g. grid description variables), otherwise write 'N/A'.



In [None]:
form.exclude_variables_list = "..." # example: sf.exclude_variables_list=["bnds", "vertices"]

#### Uniqueness of tracking_id and creation_date
In case any of your files is replacing a file already published, it must not have the same tracking_id nor 
the same creation_date as the file it replaces. 
Did you make sure that that this is not the case ? 
Reply 'yes'; otherwise adapt the new file versions.


In [None]:
form.uniqueness_of_tracking_id = "..." # example: sf.uniqueness_of_tracking_id = "yes"

## Step 3: provide information on data quality, terms of use etc. 

Choose among the following options for the quality status:
* 'unchecked': no quality checks performed on data
* 'PREPARE': CMOR/PREPARE tool checked 
* 'QA-DKRZ': QA-DKRZ tool checked 
* 'other': checked by other means, please provide additional information ..

In [None]:
form.data_qc_status = "..."  # example: sf.data_qc_status = "PREPARE"
form.data_qc_comment = "..." # any comment on the quality status of the files

### Terms of use
Please give the terms of use that shall be asigned to the data.
The options are 'unrestricted' and 'non-commercial only'.
For the full text 'Terms of Use' of CORDEX data refer to
http://cordex.dmi.dk/joomla/images/CORDEX/cordex_terms_of_use.pdf 

In [None]:
form.terms_of_use = "..." # example: sf.terms_of_use = "unrestricted"

## Step 4: provide information on data storage and data access 
(and other information needed for data transport and data publication)


If there is any directory structure deviation from the CORDEX standard please specify here. 
Otherwise enter 'compliant'. Please note that deviations MAY imply that data can not be accepted.

In [None]:
form.directory_structure = "..." # example: sf.directory_structure = "compliant"

Give the path where the data reside, for example:
blizzard.dkrz.de:/scratch/b/b364034/. If not applicable write N/A and give data access information in the data_information string

In [None]:
form.data_path = "..."        # example: sf.data_path = "mistral.dkrz.de:/mnt/lustre01/work/bm0021/k204016/CORDEX/archive/"
form.data_information = "..." # ...any info where data can be accessed and transfered to the data center ... "

## Step 5: Check your submission before submission

In [None]:
# simple consistency check report for your submission form
res = form_handler.check_submission(sf)
form.sub['status_flag_validity'] = res['valid_submission']
form_handler.DictTable(res)

## Step 6: Save and review your form

your form will be stored (the form name consists of your last name plut your keyword)

In [None]:
form_handler.form_save(sf)

In [None]:
# evaluate this cell if you want a reference (provided by email)
# (only available if you access this form via the DKRZ hosting service)
form_handler.email_form_info(sf)

## Step 7: officially submit your form
the form will be submitted to the DKRZ team to process
you also receive a confirmation email with a reference to your online form for future modifications 

In [None]:
form_handler.email_form_info(sf)
form_handler.form_submission(sf)