# Basic workflow composition with Ophidia

This notebook provides a very simple example of how to build an Ophidia workflow with the JSON schema and submit it by reusing some of the concepts shown during the demo. 

A workflow allows users to code a set of data processing and analytics steps into reusable documents. Moreover, the workflow manager can optimize the workflow execution to run concurrently independent operations (tasks).

### 1- Preliminary steps

As first step, let's connect to the Ophidia Server. The **PyOphidia** module will be used to submit the workflow to the Ophidia workflow manager for the computation

In [None]:
import sys
from PyOphidia import cube,client
cube.Cube.setclient(read_env=True)

In this example we will be using a single NetCDF files produced by the CMCC-CESM model and related to the tasmin variable for the period 2096-2100. The file is located under ```/home/ophidia/notebooks/```

In [None]:
import glob
glob.glob('/home/ophidia/notebooks/tasmin_*.nc')

### 2- Workflow composition

The workflow in this example consists of four sequential tasks:

1. Creation of a container for the datacubes 
2. Import of the NetCDF file
3. Extraction of a multi-dimensional subset
4. Computation of the average over the time series

The overall workflow structure is the following:
    
<img src="../imgs/Example_workflow.svg" alt="Example_workflow">

#### Define global arguments

First of all we need to define the global arguments of the workflow and in particular its ```name``` and its ```execution_mode``` as show in the JSON code.

```sync``` means that when submitting the workflow the function will block until the execution is completed. 

In [None]:
example_workflow = """{
        "name": "Example workflow",
        "author": "CMCC",
        "abstract": "Example workflow",
        "exec_mode": "sync",
        "ncores": "1",
        "cwd": "/",
        "tasks":
        [
"""

### Define the tasks

The first task of the workflow is the *oph_createcontainer* operator to create an empty container to organize all the datacubes imported and produced during the workflow execution.

The ```on_error``` argument is set to ```skip``` in order to simply skip the task in case of error; for example if a container of the same name already exists.

In [None]:
example_workflow += """
                {
                        "name": "Create container",
                        "operator": "oph_createcontainer",
                        "arguments": 
                        [
                                "container=example",
                                "dim=lat|lon|time",
                                "dim_type=double|double|double",
                                "hierarchy=oph_base|oph_base|oph_time"
                        ],
                        "on_error": "skip"
                },
"""

The second task defined is the *oph_importnc* operator to load data from a NetCDF into an Ophidia datacube.

In this case we use the ```$1``` and ```$2``` variables to define the operator ```src_path``` and ```measure``` arguments at runtime. The same could be applied to anyother argument.

A flow dependency with respect to the previous task is set, in order to run the data import only after the container has been created. 

The ```on_error``` argument is set to ```break``` so that the whole workflow stops if the tasks fails. This is the default value so it does not need to be specified for each task.

In [None]:
example_workflow += """
                {
                        "name": "Import",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=$1",                            
                                "measure=$2",
                                "container=example",
                                "imp_dim=time",
                                "description=Imported cube"
                        ],
                        "dependencies": [
                                { "task": "Create container"}
                        ],
                        "on_error": "break"
                },   
"""

The third task defines a *oph_subset* task to extract a portion of the datacube from the imported one.

A data dependency of ```"type": "single"``` on the import task is defined in order to use the output cube from the previous task as input to this one.

In [None]:
example_workflow += """
                {
                        "name": "Subset",
                        "operator": "oph_subset",
                        "arguments":
                        [
                                "subset_filter=30:70|-20:40",
                                "subset_dims=lat|lon",
                                "subset_type=coord",
                                "description=Subsetted cube"
                        ],
                        "dependencies": [
                                { "task": "Import", "type": "single" }
                        ]
                },
"""

Finally a *oph_reduce* task is added to compute the average over the time series of the *Subsetted Cube*.

Again a data dependency of ```"type": "single"``` is defined in order to use the output cube from the previous task as input to this one.

In [None]:
example_workflow += """
                {
                        "name": "Reduce",
                        "operator": "oph_reduce",
                        "arguments":
                        [
                                "operation=avg",
                                "description=Reduced cube"
                        ],
                        "dependencies": [
                                { "task": "Subset", "type": "single" }
                        ]
                }                
        ]
}
"""

### 3. Submit the workflow

Now the ```example_workflow``` workflow can be submitted to the Ophidia workflow manager, which in turn will take care of the execution of the various task. This can be done with the ```cube.Cube.client.wsubmit``` Python line.

At this stage we need to set the values for the ```$1``` and ```$2``` variables defined for the *Import* task.

In [None]:
cube.Cube.client.wsubmit(example_workflow, 
                         "/home/ophidia/notebooks/tasmin_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc", 
                         "tasmin")

The function returns on the top part of the output the status of the general workflow. It the execution was completed successfully this should be ```OPH_STATUS_COMPLETED```.

The following table reports a summary about the total number of task from the workflow executed and those completed with success.

The third table report the status of each of the tasks composing the workflow.

Finally the execution time is also reported at the end of the output.

We can check the three datacubes create with the ```list``` PyOphidia function:

In [None]:
cube.Cube.list(level=2)

### 4. Final remarks

You've completed your first workflow with Ophidia. If you would like to get more technical info about the workflow features provided by Ophidia check the documentation [**http://ophidia.cmcc.it/documentation/users/workflow/index.html**](http://ophidia.cmcc.it/documentation/users/workflow/index.html).  

Before moving to the other hands-on notebooks, clear the cube space:

In [None]:
cube.Cube.deletecontainer(container="example",force='yes')

You can now move to the last hands-on notebook [**2-Summer_days_workflow**](2-Summer_days_workflow.ipynb).

If you are interested in running other workflows examples check the [**Examples**](../Examples/) folder.