# Basic workflow composition with Ophidia

This notebook provides a very simple example of how to build an Ophidia workflow and submit it by reusing some of the concepts shown during the demo. 

A workflow allows users to code a set of data processing and analytics steps into reusable documents. Moreover, the workflow manager can optimize the workflow execution to run concurrently independent operations (tasks).

<hr style="height:7px;border-top:2px solid #0000FF" />

### 1- Preliminary steps

As first step, let's connect to the Ophidia Server. The **PyOphidia** module and the **ESDM-PAV Client** module will be used to submit the workflow to the Ophidia workflow manager for the computation

In [None]:
from esdm_pav_client import Workflow, Experiment, Task
import sys
from PyOphidia import cube,client
cube.Cube.setclient(read_env=True)

<hr style="height:1px;border-top:1px solid #0000FF" />

In this example we will be using a single NetCDF file produced by the CMCC-CESM model and related to the tasmin variable for the period 2096-2100. The file is located under ```/home/ophidia/notebooks/```

In [None]:
import glob
glob.glob('/home/ophidia/notebooks/tasmin_*.nc')

<hr style="height:7px;border-top:2px solid #0000FF" />

### 2- Workflow composition

The workflow in this example consists of four sequential tasks:

1. Creation of a container for the datacubes 
2. Import of the NetCDF file
3. Extraction of a multi-dimensional subset
4. Computation of the average over the time series

The overall workflow structure is the following:
    
<img src="../imgs/Example_workflow.svg" alt="Example_workflow">

<hr style="height:1px;border-top:1px solid #0000FF" />

#### Define global arguments

First of all we need to define the global arguments of the workflow and in particular its ```name``` and its ```execution_mode``` as shown in the following code.

```sync``` means that when submitting the workflow the function will block until the execution is completed. 

In [None]:
e1 = Experiment(
    name= "Example workflow",
    author= "CMCC",
    abstract= "Example workflow",
    exec_mode= "sync",
    ncores="1"
)

<hr style="height:1px;border-top:1px solid #0000FF" />

### Define the tasks

The first task of the workflow is the *oph_createcontainer* operator to create an empty container to organize all the datacubes imported and produced during the workflow execution.

The ```on_error``` argument is set to ```skip``` in order to simply skip the task in case of error; for example if a container of the same name already exists.

In [None]:
t1 = e1.newTask(name="Create container",
                type="ophidia",
                operator='oph_createcontainer',
                on_error='skip',
                arguments={'container': 'example',
                           'dim': 'lat|lon|time',
                           'dim_type': 'double|double|double',
                           'hierarchy': 'oph_base|oph_base|oph_time'})

<hr style="height:1px;border-top:1px solid #0000FF" />

The second task defined is the *oph_importnc* operator to load data from a NetCDF into an Ophidia datacube.

In this case we use the ```$1``` and ```$2``` variables to define the operator ```input``` and ```measure``` arguments at runtime. The same could be applied to another argument.

A dependency with respect to the previous task is set, in order to run the data import only after the container has been created. 

The ```on_error``` argument is set to ```abort``` so that the whole workflow stops if the tasks fails. This is the default value so it does not need to be specified for each task.

In [None]:
t2 = e1.newTask(name="Import",
                type="ophidia",
                operator='oph_importnc',
                on_error='abort',
                arguments={'measure': '$2',
                          'container': 'example',
                          'import_metadata': 'yes',
                          'imp_dim': 'time', 
                          'imp_concept_level': 'd',
                          'hierarchy': 'oph_base|oph_base|oph_time',
                          'description': 'Imported cube', 
                          'input': '$1'},
                dependencies={t1:''})

<hr style="height:1px;border-top:1px solid #0000FF" />

The third task defines a *oph_subset* task to extract a portion of the datacube from the imported one.

A data dependency on the import task is defined in order to use the output cube from the previous task as input to this one.

In [None]:
t3 = e1.newTask(name="Subset",
                type="ophidia",
                operator='oph_subset', 
                arguments={'subset_filter': '30:70|-20:40', 
                           'subset_dims': 'lat|lon', 
                           'subset_type': 'coord', 
                           'description': 'Subsetted cube'},
                dependencies={t2:'cube'})

<hr style="height:1px;border-top:1px solid #0000FF" />

Finally the *oph_reduce* task is added to compute the average over the time series of the *Subsetted Cube*.

Again a data dependency is defined in order to use the output cube from the previous task as input to this one.

In [None]:
t4 = e1.newTask(name="Reduce",
                type="ophidia",
                operator='oph_reduce', 
                arguments={'operation': 'avg', 
                           'dim': 'time',
                           'description': 'Reduced cube'},
                dependencies={t3:'cube'})

<hr style="height:1px;border-top:1px solid #0000FF" />

We can check the workflow structure before submit it. 

In [None]:
e1.check(visual=True)

<hr style="height:7px;border-top:2px solid #0000FF" />

### 3. Submit the workflow

Now the ```example_workflow``` workflow can be submitted to the workflow manager, which in turn will take care of the execution of the various tasks. This can be done with the ```w1.submit()``` Python line.

At this stage we need to set the values for the ```$1``` and ```$2``` variables defined for the *Import* task.

In [None]:
w1 = Workflow(e1)
w1.submit("/home/ophidia/notebooks/tasmin_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc", "tasmin")

<hr style="height:1px;border-top:1px solid #0000FF" />

We can check the experiment execution graph

In [None]:
w1.monitor(frequency=1, iterative=True, visual_mode=True)

<hr style="height:1px;border-top:1px solid #0000FF" />

The function returns at the end of the output the status of the general workflow. If the execution was completed successfully this should be ```OPH_STATUS_COMPLETED```.

We can check the three datacubes create with the ```list``` PyOphidia function:

In [None]:
cube.Cube.list(level=2)

<hr style="height:7px;border-top:2px solid #0000FF" />

### 4. Final remarks

You've completed your first workflow with Ophidia. If you would like to get more technical info about the workflow features provided by Ophidia check the documentation [**http://ophidia.cmcc.it/documentation/users/workflow/index.html**](http://ophidia.cmcc.it/documentation/users/workflow/index.html).  

Before moving to the other hands-on notebooks, clear the cube space:

In [None]:
cube.Cube.deletecontainer(container="example",force='yes')

<hr style="height:1px;border-top:1px solid #0000FF" />

You can now move to the last hands-on notebook [**2-Summer_days_workflow**](2-Summer_days_workflow.ipynb).

If you are interested in running other workflows examples check the [**Examples**](../Examples/) folder.