# DKRZ data ingest workflow information update

(Disclaimer: This demo notebook is for data managers only !)

Updating information with respect to the data ingest workflow (e.g. adding quality assurance information or data publication related information) should be done in a well structured way - based on well defined steps.

These steps update consistent information sets with respect to specific workflow action (e.g. data publication)

Thus the submission_forms package provides a collection of components to support these activities. 

A consistent update step normally consists of
* update on who did what, when (e.g. data manager A quality checked data B at time C ..)
* update on additional information on the activity (e.g. add the quality assurance record)
* updatee the status of the individual workflow step (open, paused, action-required, closed etc.) 

The following generic status states are defined: 

* ACTIVITY_STATUS = "0:open, 1:in-progress ,2:action-required, 3:paused,4:closed"          
* ERROR_STATUS = "0:open,1:ok,2:error"
* ENTITY_STATUS = "0:open,1:stored,2:submitted,3:re-opened,4:closed"
* CHECK_STATUS = "0:open,1:warning, 2:error,3:ok"

In [11]:
# import necessary packages
from dkrz_forms import form_handler, utils, wflow_handler, checks
from datetime import datetime
from pprint import pprint

## demo examples - step by step

The following examples can be adopted to the data managers needs by e.g. creating targeted jupyter notebooks or python scripts Data managers have two separate application scenarios for data ingest information management: 

### Step 1: find and load a specific data ingest activity related form

* Alternative A)
   * check out out git repo https://gitlab.dkrz.de/DKRZ-CMIP-Pool/data_forms_repo
   * this repo contains all completed submission forms
   * all data manager related changes are also committed there
   * subdirectories in this repo relate to the individual projects (e.g. CMIP6, CORDEX, ESGF_replication, ..)
   * each entry there contains the last name of the data submission originator  
   
* Alternative B) (not yet documented, only prototype) 
   * use search interface and API of search index on all submision forms
  
  

In [2]:
# load workflow form object
info_file = "/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json"
my_form = utils.load_workflow_form(info_file)

In [3]:
# show the workflow steps for this form (long-name, short-name) 
# to select a specific action, you can use the long name, e.g. 'data_ingest' or the related short name e.g. 'ing'
wflow_dict = wflow_handler.get_wflow_description(my_form)
pprint(wflow_dict)

{'data_ingest': 'ing',
 'data_publication': 'pub',
 'data_quality_assurance': 'qua',
 'data_submission': 'sub',
 'data_submission_review': 'rev'}


### Step 2: indicate who is working on which workflow step

In [4]:
# 'start_action' updates the form with information on who is currently working on the form 
# internal information on this (timestamp, status information) is automatically set ..
# the resulting 'working version' of the form is commited to the work repository

wflow_handler.start_action('data_submission_review',my_form,"stephan kindermann")



Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master 7e2c7a3] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## stephan kindermann: rev started
 1 file changed, 5 insertions(+), 5 deletions(-)


DKRZ Form object 

### Step 3: indicate the update and closure of a specific workflow step

In [5]:
review_report = {}
review_report['comment'] = 'needed to change and correct submission form'
review_report['additional_info'] = "mail exchange with a@b with respect to question ..."

myform = wflow_handler.finish_action('data_submission_review',my_form,"stephan kindermann",review_report)



Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master 659cb5c] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## stephan kindermann: rev finished
 1 file changed, 12 insertions(+), 9 deletions(-)


### interactive "help": use ?form.part and tab completion:

In [9]:
my_form.rev.entity_out.report

{'additional_info': 'mail exchange with a@b with respect to question ...',
 'comment': 'needed to change and correct submission form'}

## Display status of report

In [12]:
report = checks.check_report(my_form,"sub")
checks.display_report(report)

Vocalbulary test: CV_CORDEX, experiment_id
Vocalbulary test: CV_CORDEX,institute_id
Vocalbulary test: CV_CORDEX,institution
Vocalbulary test: CV_CORDEX,model_id


0,1,2
Value,Check Result,Check Comment
data_information,NOT OK,Warning: string not set
data_path,NOT OK,Warning: string not set
data_qc_comment,NOT OK,Warning: string not set
data_qc_status,NOT OK,Warning: string not set
directory_structure,NOT OK,Warning: string not set
example_file_name,NOT OK,Warning: string not set
exclude_variables_list,NOT OK,Warning: string not set
experiment_id,OK,CV check not yet implemented
grid_as_specified_if_rotated_pole,NOT OK,Warning: string not set


In [13]:
my_form.rev.entity_in.check_status




## Display status of form

In [14]:
my_form.sub.activity.ticket_url

'https://dm-rt.dkrz.de/Ticket/Display.html?id='

In [17]:
part = checks.check_step_form(my_form,"sub")
checks.display_check(part,"sub")


Workflow step:  sub
PROV entity:  agent


0,1,2
Value,Check Result,Check Comment
email,OK,mandatory string not empty
first_name,OK,mandatory string not empty
keyword,OK,mandatory string not empty
last_name,OK,mandatory string not empty
responsible_person,OK,mandatory string not empty


PROV entity:  entity_in


0,1,2
Value,Check Result,Check Comment
form_dir,OK,mandatory string not empty
form_path,OK,mandatory string not empty
source_path,NOT OK,Error: mandatory string not set
tag,OK,Warning: optional parameter not set
version,OK,Warning: optional parameter not set


PROV entity:  entity_out


0,1,2
Value,Check Result,Check Comment
check_status,OK,ok: Valid option setting
checks_done,NOT OK,Error: mandatory string not set
commit_message,OK,ok: parameter is optional
form_json,OK,mandatory string not empty
form_name,OK,mandatory string not empty
form_repo,OK,mandatory string not empty
form_repo_path,OK,mandatory string not empty
pwd,OK,mandatory string not empty
status,OK,ok: Valid option setting


PROV entity:  activity


0,1,2
Value,Check Result,Check Comment
comment,OK,Warning: optional parameter not set
commit_hash,OK,Warning: optional parameter not set
end_time,OK,Warning: optional parameter not set
error_status,NOT OK,Error: option parameter not set
keyword,OK,mandatory string not empty
method,NOT OK,Error: mandatory string not set
pwd,NOT OK,Error: mandatory string not set
start_time,OK,mandatory string not empty
status,OK,ok: Valid option setting


In [18]:
## global check
res  = checks.check_generic_form(my_form)
checks.display_checks(my_form,res)


Workflow step:  data_submission
PROV entity:  agent


0,1,2
Value,Check Result,Check Comment
email,OK,mandatory string not empty
first_name,OK,mandatory string not empty
keyword,OK,mandatory string not empty
last_name,OK,mandatory string not empty
responsible_person,OK,mandatory string not empty


PROV entity:  entity_in


0,1,2
Value,Check Result,Check Comment
form_dir,OK,mandatory string not empty
form_path,OK,mandatory string not empty
source_path,NOT OK,Error: mandatory string not set
tag,OK,Warning: optional parameter not set
version,OK,Warning: optional parameter not set


PROV entity:  entity_out


0,1,2
Value,Check Result,Check Comment
check_status,OK,ok: Valid option setting
checks_done,NOT OK,Error: mandatory string not set
commit_message,OK,ok: parameter is optional
form_json,OK,mandatory string not empty
form_name,OK,mandatory string not empty
form_repo,OK,mandatory string not empty
form_repo_path,OK,mandatory string not empty
pwd,OK,mandatory string not empty
status,OK,ok: Valid option setting


PROV entity:  activity


0,1,2
Value,Check Result,Check Comment
comment,OK,Warning: optional parameter not set
commit_hash,OK,Warning: optional parameter not set
end_time,OK,Warning: optional parameter not set
error_status,NOT OK,Error: option parameter not set
keyword,OK,mandatory string not empty
method,NOT OK,Error: mandatory string not set
pwd,NOT OK,Error: mandatory string not set
start_time,OK,mandatory string not empty
status,OK,ok: Valid option setting


Workflow step:  data_submission_review
PROV entity:  agent


0,1,2
Value,Check Result,Check Comment
responsible_person,OK,mandatory string not empty


PROV entity:  entity_in


0,1,2
Value,Check Result,Check Comment
check_status,NOT OK,Error: option parameter not set
checks_done,NOT OK,Error: mandatory string not set
commit_message,OK,Warning: optional parameter not set
form_json,NOT OK,Error: mandatory string not set
form_name,NOT OK,Error: mandatory string not set
form_repo,NOT OK,Error: mandatory string not set
form_repo_path,NOT OK,Error: mandatory string not set
pwd,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set


PROV entity:  entity_out


0,1,2
Value,Check Result,Check Comment
check_status,OK,ok: Valid option setting
comment,OK,Warning: optional parameter not set
date,NOT OK,Error: mandatory string not set
status,OK,ok: Valid option setting
tag,OK,Warning: optional parameter not set


PROV entity:  activity


0,1,2
Value,Check Result,Check Comment
comment,OK,Warning: optional parameter not set
end_time,OK,ok: parameter is optional
error_status,OK,ok: Valid option setting
start_time,OK,mandatory string not empty
status,OK,ok: Valid option setting
ticket_id,NOT OK,Error: mandatory string not set
ticket_url,NOT OK,Error: mandatory string not set
timestamp,OK,ok: parameter is optional


Workflow step:  data_ingest
PROV entity:  agent


0,1,2
Value,Check Result,Check Comment
responsible_person,NOT OK,Error: mandatory string not set


PROV entity:  entity_in


0,1,2
Value,Check Result,Check Comment
check_status,NOT OK,Error: option parameter not set
comment,OK,Warning: optional parameter not set
date,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
tag,OK,Warning: optional parameter not set


PROV entity:  entity_out


0,1,2
Value,Check Result,Check Comment
check_status,NOT OK,Error: option parameter not set
comment,OK,Warning: optional parameter not set
date,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
tag,OK,Warning: optional parameter not set


PROV entity:  activity


0,1,2
Value,Check Result,Check Comment
comment,OK,Warning: optional parameter not set
end_time,OK,Warning: optional parameter not set
error_status,NOT OK,Error: option parameter not set
start_time,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
ticket_id,NOT OK,Error: mandatory string not set
timestamp,OK,Warning: optional parameter not set


Workflow step:  data_quality_assurance
PROV entity:  agent


0,1,2
Value,Check Result,Check Comment
responsible_person,NOT OK,Error: mandatory string not set


PROV entity:  entity_in


0,1,2
Value,Check Result,Check Comment
check_status,NOT OK,Error: option parameter not set
comment,OK,Warning: optional parameter not set
date,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
tag,OK,Warning: optional parameter not set


PROV entity:  entity_out


0,1,2
Value,Check Result,Check Comment
check_status,NOT OK,Error: option parameter not set
comment,OK,Warning: optional parameter not set
date,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
tag,OK,Warning: optional parameter not set


PROV entity:  activity


0,1,2
Value,Check Result,Check Comment
comment,OK,Warning: optional parameter not set
end_time,OK,Warning: optional parameter not set
error_status,NOT OK,Error: option parameter not set
follow_up_ticket,OK,Warning: optional parameter not set
qua_tool_version,NOT OK,Error: mandatory string not set
start_time,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
ticket_id,NOT OK,Error: mandatory string not set


Workflow step:  data_publication
PROV entity:  agent


0,1,2
Value,Check Result,Check Comment
responsible_person,NOT OK,Error: mandatory string not set
trigger,NOT OK,Error: option parameter not set


PROV entity:  entity_in


0,1,2
Value,Check Result,Check Comment
check_status,NOT OK,Error: option parameter not set
comment,OK,Warning: optional parameter not set
date,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
tag,OK,Warning: optional parameter not set


PROV entity:  entity_out


0,1,2
Value,Check Result,Check Comment
check_status,NOT OK,Error: option parameter not set
comment,OK,Warning: optional parameter not set
date,NOT OK,Error: mandatory string not set
search_string,OK,Warning: optional parameter not set
status,NOT OK,Error: option parameter not set
tag,OK,Warning: optional parameter not set


PROV entity:  activity


0,1,2
Value,Check Result,Check Comment
comment,OK,Warning: optional parameter not set
end_time,OK,Warning: optional parameter not set
error_status,NOT OK,Error: option parameter not set
follow_up_ticket,OK,Warning: optional parameter not set
start_time,NOT OK,Error: mandatory string not set
status,NOT OK,Error: option parameter not set
ticket_id,NOT OK,Error: mandatory string not set
timestamp,OK,Warning: optional parameter not set


In [19]:
print(my_form.sub.entity_out.status)
print(my_form.rev.entity_in.form_json)
print(my_form.sub.activity.ticket_id)

0:open
mandatory
optional


In [21]:
pprint(my_form.workflow)

[['sub', 'data_submission'],
 ['rev', 'data_submission_review'],
 ['ing', 'data_ingest'],
 ['qua', 'data_quality_assurance'],
 ['pub', 'data_publication']]


### Appendix

Sometimes it is necessary to modify specific information and not relay on the generic steps described above
here are some examples

In [None]:
workflow_form = utils.load_workflow_form(info_file)
   
review = workflow_form.rev

# any additional information keys can be added,
# yet they are invisible to generic information management tools ..
workflow_form.status = "review"

review.activity.status = "1:in-review"
review.activity.start_time = str(datetime.now())
review.activity.review_comment = "data volume check to be done"
review.agent.responsible_person = "sk"

sf = form_handler.save_form(workflow_form, "sk: review started")

review.activity.status = "3:accepted"
review.activity.ticket_id = "25389"
review.activity.end_time = str(datetime.now())

review.entity_out.comment = "This submission is related to submission abc_cde"
review.entity_out.tag = "sub:abc_cde"  # tags are used to relate different forms to each other
review.entity_out.report = {'x':'y'}   # result of validation in a dict (self defined properties)

# ToDo: test and document save_form for data managers (config setting for repo)   
sf = form_handler.save_form(workflow_form, "kindermann: form_review()")

### add data ingest step related information

__Comment:__ alternatively in tools workflow_step related information could also be 
directly given and assigned via dictionaries, yet this is only 
recommended for data managers making sure the structure is consistent with
the preconfigured one given in config/project_config.py 
* example validation.activity.\__dict\__ = data_manager_generated_dict

In [22]:
workflow_form = utils.load_workflow_form(info_file)
   
ingest = workflow_form.ing

In [None]:
?ingest.entity_out

In [23]:
# agent related info
workflow_form.status = "ingest"

ingest.activity.status = "started"
ingest.agent.responsible_person = "hdh"
ingest.activity.start_time=str(datetime.now())

# activity related info

ingest.activity.comment = "data pull: credentials needed for remote site"
sf = form_handler.save_form(workflow_form, "kindermann: form_review()")



Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master cc66608] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## kindermann: form_review()
 1 file changed, 6 insertions(+), 6 deletions(-)


In [24]:
ingest.activity.status = "completed"
ingest.activity.end_time = str(datetime.now())

# report of the ingest process (entity_out of ingest workflow step)
ingest_report = ingest.entity_out
ingest_report.tag = "a:b:c"  # tag structure to be defined
ingest_report.status = "completed"
# free entries for detailed report information
ingest_report.report.remote_server = "gridftp.awi.de://export/data/CMIP6/test"
ingest_report.report.server_credentials = "in server_cred.krb keypass"
ingest_report.report.target_path = ".."
sf = form_handler.save_form(workflow_form, "kindermann: form_review()")



Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master 9729fc4] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## kindermann: form_review()
 1 file changed, 12 insertions(+), 8 deletions(-)


In [None]:
ingest_report.report.

### workflow step: data quality assurance

In [26]:
from datetime import datetime
workflow_form = utils.load_workflow_form(info_file)
   
qua = workflow_form.qua

In [27]:
workflow_form.status = "quality assurance"
qua.agent.responsible_person = "hdh"

qua.activity.status = "starting" 
qua.activity.start_time = str(datetime.now())

sf = form_handler.save_form(workflow_form, "hdh: qa start")





Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master b4c89d6] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## hdh: qa start
 1 file changed, 5 insertions(+), 5 deletions(-)


In [28]:
qua.entity_out.status = "completed"
qua.entity_out.report = {
    "QA_conclusion": "PASS",
    "project": "CORDEX",
    "institute": "CLMcom",
    "model": "CLMcom-CCLM4-8-17-CLM3-5",
    "domain": "AUS-44",
    "driving_experiment":  [ "ICHEC-EC-EARTH"],
    "experiment": [ "history", "rcp45", "rcp85"],
    "ensemble_member": [ "r12i1p1" ],
    "frequency": [ "day", "mon", "sem" ],
    "annotation":
    [
        {
            "scope": ["mon", "sem"],
            "variable": [ "tasmax", "tasmin", "sfcWindmax" ],
            "caption": "attribute <variable>:cell_methods for climatologies requires <time>:climatology instead of time_bnds",
            "comment": "due to the format of the data, climatology is equivalent to time_bnds",
            "severity": "note"
        }
    ]
}
sf = form_handler.save_form(workflow_form, "hdh: qua complete")




Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master f47665c] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## hdh: qua complete
 1 file changed, 43 insertions(+), 5 deletions(-)


### workflow step: data publication

In [29]:
workflow_form = utils.load_workflow_form(info_file)

workflow_form.status = "publishing"

pub = workflow_form.pub
pub.agent.responsible_person = "katharina"
pub.activity.status = "starting"
pub.activity.start_time = str(datetime.now())

sf = form_handler.save_form(workflow_form, "kb: publishing")



Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master bc31ef3] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## kb: publishing
 1 file changed, 5 insertions(+), 5 deletions(-)


In [30]:
pub.activity.status = "completed"
pub.activity.comment = "..."
pub.activity.end_time = ".."
pub.activity.report = {'model':"MPI-M"}   # activity related report information

pub.entity_out.report = {'model':"MPI-M"} # the report of the publication action - all info characterizing the publication
sf = form_handler.save_form(workflow_form, "kb: published")




Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master 5f236e9] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## kb: published
 1 file changed, 12 insertions(+), 8 deletions(-)


In [31]:
sf = form_handler.save_form(workflow_form, "kindermann: form demo run 1")



Form Handler - save form status message:
entity_in.form_path and entity_out.form_repo_path
/home/testuser/CORDEX/CORDEX_aa_11.ipynb
/opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.ipynb
 --- form stored in transfer format in: /opt/jupyter/notebooks/form_directory/CORDEX/CORDEX_aa_11.json
 
 --- commit message:[master 6526994] Form Handler: submission form for user aa saved using prefix CORDEX_aa_11 ## kindermann: form demo run 1
 1 file changed, 3 insertions(+), 3 deletions(-)


In [32]:
sf.sub.activity.commit_hash


'optional'