## Practical examples of simple workflow building and integration with PyOphidia module

First of all, import **PyOphidia** module and setup a connection to the **Ophidia Server**

In [None]:
import sys
from PyOphidia import cube,client
cube.Cube.setclient(read_env=True)

### 1. Basic workflow: import + subset + export

The following JSON object is an example of workflow with 3 tasks: 
- one independent task: *Import* to import a NetCDF file into a datacube
- two dependent tasks: *Subset* and *Export* to perform a subsetting operation along the dimensions of the  datacube and export the result into a new NetCDF file.


<img src="1_Basic_workflow.svg" alt="basic_workflow" width="100">


In [None]:
import os
if not os.path.exists("/data/output"):
    os.makedirs("/data/output")

In [None]:
workflow = """{
        "name": "Basic workflow",
        "author": "CMCC",
        "abstract": "Perform some basics operations using workflows",
        "exec_mode": "sync",
        "ncores": "2",
        "on_exit": "oph_delete",
        "cwd": "/",
        "tasks":
        [
                {
                        "name": "Import",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=$1",
                                "measure=$2",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp"
                        ]
                },        
                {
                        "name": "Subset",
                        "operator": "oph_subset",
                        "arguments":
                        [
                                "subset_filter=JJA",
                                "subset_dims=time",
                                "subset_type=coord",
                                "description=JJA"
                        ],
                        "dependencies": [
                                { "task": "Import", "type": "single", "argument":"cube" }
                        ]
                },
                {
                        "name": "Export",
                        "operator": "oph_exportnc2",
                        "arguments": [
                            "output_name=JJA",
                            "output_path=/data/output"
                        ],
                        "dependencies": [
                            { "task": "Subset", "type": "single"}
                        ]
                }
        ]
}"""

Let's start building the first workflow step by step!

First of all, we define several **global attributes**, which include a number of metadata and default parameters values common to all the tasks. 
Some of these keywords are mandatory:
- **name**: the title of the workflow
- **author**: the author’s name
- **abstract**: a short description of the workflow

The parameter **exec_mode** specifies the execution mode, synchronous or asynchronous, and it refers to the entire workflow, not to single tasks. In case of synchronous mode the workflow will be executed in a blocking way, so the submitter will have to wait until it will be finished to display the results. If the execution mode is asynchronous, the workflow will be processed in a non-blocking way (like a batch mode), allowing the submitter to immediately take the control and eventually submit other requests. After sending an asynchronous request, the user can get workflow output (if available) by exploiting the **view** command.

Since a lot of tasks could be launched in parallel, an important parameter is the number of cores per task (**ncores**) which specifies the default value to be associated with all the workflow’s tasks. This value can be overridden with another one tailored to a task with a different behaviour.

By using the **on_exit** parameter the user can select the cubes that will be dropped out when a workflow ends.
The default value of *on_exit* can be set as global attribute. Admitted values are:
- oph_delete: remove the output cube
- oph_deletecontainer: remove the output container (valid only for OPH_CREATECONTAINER)
- nop: no operation has to be applied (default).

In [None]:
workflow = """{
        "name": "Basic workflow",
        "author": "CMCC",
        "abstract": "Perform some basics operations using workflows",
        "exec_mode": "sync",
        "ncores": "2",
        "on_exit": "oph_delete",
        "cwd": "/",
        "tasks":
        [
"""        

Each task is uniquely identified within the workflow by its **name** and it is related to a specific Ophidia Operator set as **operator**. According to that operator, the user can optionally insert a JSON array of key-value pairs (**arguments**) in order to call the operator with the appropriate arguments.

The first task of the workflow is related to the *oph_importnc* operator. According to the operator specification, we need to specify the mandatory arguments: *src_path* and *measure*. In this example, the two arguments values, $1 and $2, will be replaced with the workflow input parameters before sending the request to the Ophidia Server.

In addition, we can specify other arguments such as:
- *import_metadata=yes* to import also metadata from the input NetCDF file
- *imp_dim=time* to arrange data in order to operate on time series
- *imp_concept_level=d* to set the concept level to *day*

In [None]:
workflow += """

{
                        "name": "Import",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=$1",
                                "measure=$2",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp"
                        ]
                }, 

"""

The second task is related to the *oph_subset* operator. 

For example, we can consider the whole spatial domain and specify a subset only on the time range, as indicated by the *subset_dims* parameter.
We can select a particular season by using the corresponding code for the *subset filter* argument:
    - DJF for winter
    - MAM for spring
    - JJA for summer
    - SON for autumn

The input cube is the output cube of the previous task (*Import*), so we have to specify a dependency between these two tasks.

The dependencies can be specified by a JSON array put in the section **dependencies** of the dependent task (child). Each item of this array is a JSON object related to a specific parent task which the child depends on. 
In general, it reports: the name of the **parent task** and the **type** of the dependency:
- *single*: if the child task exploits only one output of parent task;
- *all*: if the child task processes all the outputs of parent task (e.g. dependencies between massive operations);
- *embedded*: to specify a simple flow dependency (the child task has to begin only after the parent task has finished), with no dependency on the outputs of the parent task

By default, for the first two options, the **argument** parameter of a dependency is set to **cube** so that it can be usually omitted.
It is also possible to specify the particular operator argument whose value is depending on the output produced by another task.

In the example below, we set
- **task** equal to the name of the parent task (i.e. Import);
- **type** equal to **single** since the child task will use only one output from the parent task.

We can omit **argument** since by default the *cube* parameter of the *oph_subset* operator will be set to the PID of the cube imported in the *Import* task.


In [None]:
workflow+= """

{
                        "name": "Subset",
                        "operator": "oph_subset",
                        "arguments":
                        [
                                "subset_filter=JJA",
                                "subset_dims=time",
                                "subset_type=coord",
                                "description=JJA"
                        ],
                        "dependencies": [
                                { "task": "Import", "type": "single", "argument": "cube" }
                        ]
                },

"""

Finally, we can export the subsetted cube by using the *oph_exportnc2* operator.

In this case we have to:
- provide arguments for the oph_exportnc2 operator: *output_name* and *output_path*
- set a **'single'** dependency from the previous (i.e. Subset) task

In [None]:
workflow+= """

{
                        "name": "Export",
                        "operator": "oph_exportnc2",
                        "arguments": [
                            "output_name=JJA",
                            "output_path=/data/output"
                        ],
                        "dependencies": [
                            { "task": "Subset", "type": "single"}
                        ]
                }
        ]
}

"""

The JSON object related to our first workflow is ready!

We have to define the workflow input arguments:
- path to nc file
- nc filename 
- variable to be imported

In [None]:
path="/data/"
file="tasmax_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc"
variable="tasmax"

To submit the workflow within the notebook we can use the **wsubmit** PyOphidia method, providing:
- the workflow name
- the input parameters

In our first example the workflow has two parameters:
- the source path used in the oph_importnc operator to load the NetCDF file
- the variable name use to set the measure argument in the import operator

The output of the workflow is a report with the final status of each task. By default three objects are included:
 - a text object **Workflow Status**, which reports the workflow status;
 - a table object **Workflow Progress**, which reports the total number of tasks and the number of completed tasks;
 - a table object **Workflow Task List**, which reports a list with some information about each task: job identifier, Marker ID, Task Name, Task Type (simple or massive) and Task Status.

In [None]:
cube.Cube.client.wsubmit(workflow,path+file, variable)

Check exported file

In [None]:
! ls -lh /data/output | grep "\.nc"

The workspace is already empty (we have no datacubes) because we've used **"on_exit": "oph_delete"** as global attribute.

In [None]:
cube.Cube.list(level=2)

We just need to remove the container automatically created by the *oph_importnc* operator. By default, the container name is equal to the name of the imported file.

In [None]:
cube.Cube.deletecontainer(container=file,force='yes')

### 2. Workflows: iterative and parallel interfaces

Let's now consider a slightly more complex workflow, in which we are going to:
- import and subset multiple NetCDF files in parallel
- merge all the subsetted datacubes
- perform a reduction (avg, max, min, ...) operation
- export the output datacube

<img src="2_Iterative_parallel.svg" alt="iterative_parallel" width="800">

As input files, we can use the daily NetCDF files produced by the CMCC-CM model and related to the *tasmax* variable for the years 2011-2013. 

In [None]:
! ls /data

The JSON object associated to the workflow consists of several tasks. In addition to the Ophidia Data Import/Export & Analysis operators, we are going to exploit the **OPH_FOR** and **OPH_ENDFOR** flow control operators to implement a "for" loop. 

Unlike other Ophidia operators, they does not operate on data or metadata, but they could be adopted to set particular flow control rules for the Workflow manager. In particular, they are used to begin/end a sub-section that has to be executed several times.

In [None]:
workflow_loop = """{
        "name": "Loop operations",
        "author": "CMCC",
        "abstract": "Perform some basics operations using workflows",
        "exec_mode": "sync",
        "ncores": "1",
        "on_exit": "oph_delete",
        "cwd": "/",
        "tasks":
        [
                {
                        "name": "Create container",
                        "operator": "oph_createcontainer",
                        "arguments": 
                        [
                                "container=workflow",
                                "dim=lat|lon|time",
                                "dim_type=double|double|double",
                                "hierarchy=oph_base|oph_base|oph_time"
                        ]
                },
                {
                        "name": "Start loop",
                        "operator": "oph_for",
                        "arguments": 
                        [
                                "key=year",
                                "values=2011|2012|2013",
                                "parallel=yes"
                        ],
                        "dependencies": [
                                { "task": "Create container"}
                        ]
                },
                {
                        "name": "Import",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=/data/tasmax_day_CMCC-CM_rcp85_r1i1p1_@{year}0101-@{year}1231.nc",                           
                                "measure=$1",
                                "container=workflow",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp @{year}"
                        ],
                        "dependencies": [
                                { "task": "Start loop"}
                        ]
                },   
                {
                        "name": "Subset",
                        "operator": "oph_subset",
                        "arguments":
                        [
                                "subset_filter=JJA",
                                "subset_dims=time",
                                "subset_type=coord",
                                "description=JJA @{year}"
                        ],
                        "dependencies": [
                                { "task": "Import", "type": "single" },
                                { "task": "Start loop"}
                        ]
                        
                },
                {
                        "name": "End loop year",
                        "operator": "oph_endfor",
                        "arguments": [],
                        "dependencies": [
                                { "task": "Subset", "type": "all" }
                        ]
                },
                {
                        "name": "Merge",
                        "operator": "oph_mergecubes",
                        "arguments":
                        [
                                "description=Merged cube"
                        ],
                        "dependencies": [
                                { "task": "End loop year", "type": "all", "argument": "cubes" }
                        ]
                        
                },
                {
                    "name": "Reduce",
                    "operator": "oph_reduce",
                    "arguments": [
                        "operation=avg",
                        "description=Reduced cube",
                        "dim=time"
                    ],
                    "dependencies": [
                        { "task": "Merge", "type": "single" }
                    ]
                },
                {
                        "name": "Export",
                        "operator": "oph_exportnc2",
                        "arguments": [
                            "output_name=avg_JJA",
                            "output_path=/data/output"
                        ],
                        "dependencies": [
                            { "task": "Reduce", "type": "single"}
                        ]
                },
                {
                        "name": "Delete container",
                        "operator": "oph_deletecontainer",
                        "arguments": [
                                "container=workflow",
                                "force=yes"
                        ],
                        "dependencies": [
                                { "task": "Export", "type": "embedded" }
                        ]
                }
                
        ]
}"""

Let's build the workflow step by step.

In the following cell we define some global attributes as in the previous example.

In [None]:
workflow_loop = """{
        "name": "Loop operations",
        "author": "CMCC",
        "abstract": "Perform some basics operations using workflows",
        "exec_mode": "sync",
        "ncores": "1",
        "on_exit": "oph_delete",
        "cwd": "/",
        "tasks":
        [
"""

**Create a new container**

This is the first task, so it has no dependencies. We just have to provide the proper arguments to the *oph_createcontainer* operator:
- the container name
- the name and the type of the dimensions allowed
- the concept hierarchy name of the dimensions

In [None]:
workflow_loop += """
{
                        "name": "Create container",
                        "operator": "oph_createcontainer",
                        "arguments": 
                        [
                                "container=workflow",
                                "dim=lat|lon|time",
                                "dim_type=double|double|double",
                                "hierarchy=oph_base|oph_base|oph_time"
                        ]
                },
"""

**FOR statement**

The OPH_FOR operator is used to configure the iterative block and, in particular, to set the number N of loops to be executed. By this aim, we have to provide an ordered list of N labels to be assigned to cycles in order to distinguish one cycle from another one. The list is assigned to the **values** parameter, separating each value by |. 

In our example, we provide a list of years in order to import the corresponding NetCDF file in the next task.

A name has to be associated to the list values by setting the **key** parameter (e.g. *year*), which is used in the inner tasks in the form **@{key_name}** to access the current value of the counter/label. 

We set **parallel** to *\"yes\"* for parallel processing.
If the option is enabled for a OPH_FOR, the engine, before executing the workflow, transforms it into an equivalent version in which iterative blocks are expanded into N independent sub-workflows, where N is the number of initial iterations. The new workflow is then executed taking into account the usual rules based on task dependencies.

Finally, we define a simple flow dependency (*type=embedded*), since this task has to begin only after the previous "Create container" task has finished.


In [None]:
workflow_loop += """
{
                        "name": "Start loop",
                        "operator": "oph_for",
                        "arguments": 
                        [
                                "key=year",
                                "values=2011|2012|2013",
                                "parallel=yes"
                        ],
                        "dependencies": [
                                { "task": "Create container", "type":"embedded"}
                        ]
                },
"""

**Import and subset multiple datacubes in parallel**

The two inner tasks to be repeated (import and subset) have to depend on OPH_FOR directly or indirectly, namely they depend on other tasks in the iterative block. 

Setting the parameters of these tasks the user is able to exploit the value of the label associated with current iteration. 

*IMPORT task*

The **src_path** as well the **description** parameters in the *oph_importnc* operator are defined in a parametrized way to get the current value of the **year** key for each iteration.
This task has a simple flow dependency from the "Start loop" task in order to start after the "Start loop" task and retrieve the right value of the label associated with current iteration. 

In [None]:
workflow_loop += """
{
                        "name": "Import",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=/data/tasmax_day_CMCC-CM_rcp85_r1i1p1_@{year}0101-@{year}1231.nc",                           
                                "measure=$1",
                                "container=workflow",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp @{year}"
                        ],
                        "dependencies": [
                                { "task": "Start loop", "type":"embedded"}
                        ]
                }, 
"""

*SUBSET task*

This task has two dependencies:
- a *flow* dependency from the "Start loop" task to get the current value of the label, which is used in the *description* parameter
- a *single* dependency from the "Import" task since each subset operation has to be performed on the corresponding datacube imported at the previous import step, so the output from the *Import* task is the input for the *Subset*

In [None]:
workflow_loop += """
{
                        "name": "Subset",
                        "operator": "oph_subset",
                        "arguments":
                        [
                                "subset_filter=JJA",
                                "subset_dims=time",
                                "subset_type=coord",
                                "description=JJA @{year}"
                        ],
                        "dependencies": [
                                { "task": "Import", "type": "single" },
                                { "task": "Start loop", "type":"embedded"}
                        ]
                        
                },
"""

*End loop*

The OPH_ENDFOR operator ends an iterative block, has no arguments and depends on the inner tasks.

In our example, it depends on the "Subset" task and the dependency type is **all**. In this way, it can gather PIDs of all cubes generated by the (subset) inner task and transfer them to next tasks.

In [None]:
workflow_loop += """
{
                        "name": "End loop",
                        "operator": "oph_endfor",
                        "arguments": [],
                        "dependencies": [
                                { "task": "Subset", "type": "all" }
                        ]
                },
"""

**Merge all the subsetted datacubes into a single datacube**

All the subsetted datacubes can be now merged into a single datacube by using the **oph_mergecubes** operator: the resulting datacube will contain the JJA subset for each of the imported years. 

As for the previous task, we need to specify an **all** dependency to get all the datacubes PIDs from the previous task. In addition, we have to set the *argument* parameter to *cubes* so that the value of the *cubes* parameter for the *oph_mergecubes* operator will be set to a list of pipe-separated PIDs retrieved from the "End loop" task.

In [None]:
workflow_loop += """
{
                        "name": "Merge",
                        "operator": "oph_mergecubes",
                        "arguments":
                        [
                                "description=Merged cube"
                        ],
                        "dependencies": [
                                { "task": "End loop", "type": "all", "argument": "cubes" }
                        ]
                        
                },
"""

**Perform a reduction operation**

Starting from the merged datacube, we can perform a reduction operation with respect to the implicit dimension (time).

We just need to define a *single* dependency between the **Reduce** task and the previous **Merge** task.

The reduced cube will contain the average value for the tasmax variable over the 2011-2013 JJA period for each point in the spatial domain.

In [None]:
workflow_loop += """
{
                    "name": "Reduce",
                    "operator": "oph_reduce",
                    "arguments": [
                        "operation=avg",
                        "description=Reduced cube",
                        "dim=time"
                    ],
                    "dependencies": [
                        { "task": "Merge", "type": "single" }
                    ]
                },
"""

**Export the averaged datacube**

In a similar way, we can define an *\"Export\"* task that depends on the *"\Reduce\"* task to export data into a single NetCDF file.

In [None]:
workflow_loop += """
{
                        "name": "Export",
                        "operator": "oph_exportnc2",
                        "arguments": [
                            "output_name=avg_JJA",
                            "output_path=/data/output"
                        ],
                        "dependencies": [
                            { "task": "Reduce", "type": "single"}
                        ]
                },
"""

**Empty workspace**

In [None]:
workflow_loop += """
{
                        "name": "Delete container",
                        "operator": "oph_deletecontainer",
                        "arguments": [
                                "container=workflow",
                                "force=yes"
                        ],
                        "dependencies": [
                                { "task": "Export", "type": "embedded" }
                        ]
                }
                
        ]
}"""

**Run workflow**

In [None]:
cube.Cube.client.wsubmit(workflow_loop,variable)

In [None]:
! ls -lh /data/output | grep "\.nc"

###  3. Workflows: Selection interface

The Selection interface provides further flexibility by enabling the Workflow manager to execute one or more tasks based on boolean conditions that could be checked at run-time and depend on input parameters, data, metadata, etc.

The development of the Selection interface involved the design of new Ophidia operators:
 - OPH_IF
 - OPH_ELSEIF
 - OPH_ELSE
 - OPH_ENDIF

Similarly to other flow control operators, they does not process data or metadata directly, but they could be adopted to enable (or to skip) the execution of a set of tasks based on run-time conditions.

In the following workflow, we'll consider a selection statement with two selection blocks.

<img src="3_Selection_Interface.svg" alt="selection_interface" width="800">



In [None]:
workflow_if = """{
        "name": "Selection Interface",
        "author": "CMCC",
        "abstract": "Selection statement with two selection blocks",
        "exec_mode": "sync",
        "ncores": "1",
        "tasks":
        [
                {
                        "name": "IF block",
                        "operator": "oph_if",
                        "arguments": [ "condition=$1" ]
                },
                {
                        "name": "Import and subset",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=$2",
                                "measure=$3",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp imported and subsetted",
                                "subset_dims=lat|lon|time",
                                "subset_filter=$4",
                                "subset_type=coord"
                        ],
                        "dependencies":
                        [
                                { "task": "IF block" }
                        ]
                },
                {
                        "name": "ELSE block",
                        "operator": "oph_else",
                        "arguments": [  ],
                        "dependencies":
                        [
                                { "task": "IF block" }
                        ]
                },
                {
                        "name": "Import data",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=$2",
                                "measure=$3",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp imported"
                        ],
                        "dependencies":
                        [
                                { "task": "ELSE block" }
                        ]
                },
                {
                        "name": "Subset data",
                        "operator": "oph_subset",
                        "arguments":
                        [
                               "subset_dims=lat|lon|time",
                               "subset_filter=$4",
                               "subset_type=coord",
                               "description=Max Temp subsetted" 
                        ],
                        "dependencies":
                        [
                                { "task": "Import data",
                                  "type": "single" }
                        ]
                },
                {
                        "name": "Selection block end",
                        "operator": "oph_endif",
                        "arguments": [ ],
                        "dependencies":
                        [
                                { "task": "Subset data"},
                                { "task": "Import and subset" }
                        ]
                }
        ]
}"""

**Workflow global attributes**

In [None]:
workflow_if = """{
        "name": "Selection Interface",
        "author": "CMCC",
        "abstract": "Selection statement with two selection blocks",
        "exec_mode": "sync",
        "ncores": "1",
        "tasks":
        [
"""

**IF block**

The selection interface is used to code two possible implementations of a task that imports data into the Ophidia platform from an external source:
 
     A) import only the subset from the input file
     B) import all the dataset and then extract a data subset

The actual implementation to be adopted is selected by means of the input parameter $1: a numerical non-zero value for option A, 0 for option B.

In [None]:
workflow_if += """
{
                        "name": "IF block",
                        "operator": "oph_if",
                        "arguments": [ "condition=$1" ]
                },
"""

**CASE A: Import only the subset from the input file**

In general, the set of tasks belonging to the branch that begins from OPH_IF and ends to OPH_ENDIF is the sub-workflow to be executed in case the condition set for OPH_IF is satisfied.

In our example, there is only one task, named *\"Import and subset\"*, which is related to the *oph_importnc* operator and has a flow dependency from the "IF block" task.

The *src_path* and the *measure* arguments will be set according to the second and third workflow input arguments.

To import only a subset from the input file we have to specify in addition the following parameters:
- **subset_dims**: the dimension names used for the subsetting
- **subset_type=coord** so that the filter is considered on dimension values
- **subset_filter**: list of pipe-separated filters associated to each dimension specified in *subset_dims* (set according to the fourth workflow input argument)



In [None]:
workflow_if += """
{
                        "name": "Import and subset",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=$2",
                                "measure=$3",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp imported and subsetted",
                                "subset_dims=lat|lon|time",
                                "subset_filter=$4",
                                "subset_type=coord"
                        ],
                        "dependencies":
                        [
                                { "task": "IF block" }
                        ]
                },
"""

**ELSE block**

The task with the OPH_ELSE operator has to be a child of the task with the OPH_IF operator. It has no arguments: it simply starts the last sub-block of a selection block "if".

In [None]:
workflow_if += """
{
                        "name": "ELSE block",
                        "operator": "oph_else",
                        "arguments": [  ],
                        "dependencies":
                        [
                                { "task": "IF block" }
                        ]
                },
"""

**CASE B: import all the dataset and then extract a data subset**

The set of tasks belonging to the branch that begins from OPH_ELSE and ends to OPH_ENDIF is the sub-workflow to be executed in case the condition set for OPH_IF is NOT satisfied.

In our example, we have two tasks:
- the first one, **\"Import data\"**, is related to the *oph_importnc* operator and is child of the task with the "OPH_ELSE" operator.
- the second one,**\"Subset data\"**, is related to the *oph_subset* operator and has a "single" dependency from the "Import data" task since the input datacube to be subsetted is the datacube generated from the import task.

In [None]:
workflow_if += """
{
                        "name": "Import data",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=$2",
                                "measure=$3",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp imported"
                        ],
                        "dependencies":
                        [
                                { "task": "ELSE block" }
                        ]
                },
"""

In [None]:
workflow_if += """
{
                        "name": "Subset data",
                        "operator": "oph_subset",
                        "arguments":
                        [
                               "subset_dims=lat|lon|time",
                               "subset_filter=$4",
                               "subset_type=coord",
                               "description=Max Temp subsetted" 
                        ],
                        "dependencies":
                        [
                                { "task": "Import data",
                                  "type": "single" }
                        ]
                },

"""

**ENDIF block**

The *oph_endif* operator simply closes a selection block "if".

If we want to gather the PID of the output datacube produced in each of the two branches, we have to specify a dependency from both final tasks (*\"Subset data\"* and *\"Import and subset\"*) of each sub-workflow.

In [None]:
workflow_if += """
{
                        "name": "Selection block end",
                        "operator": "oph_endif",
                        "arguments": [ ],
                        "dependencies":
                        [
                                { "task": "Subset data"},
                                { "task": "Import and subset" }
                        ]
                }
        ]
}
"""

Define input argument:
- path to nc file
- nc filename
- variable to be imported 
- subset filter (lat|lon|time)
- flag to be evaluated by the IF..ELSE statement

In [None]:
path="/data/"
file="tasmax_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc"
variable="tasmax"
lat_lon_time="-50:10|20:140|150:240"
import_and_subset=0

Run workflow

In [None]:
cube.Cube.client.wsubmit(workflow_if,import_and_subset,path+file, variable,lat_lon_time)

Check produced datacube. As we can note:
- if **import_and_subset** equal **1** ---> datacube is imported and subsetted at the same time
- else ---> datacube is first imported, then subsetted
    

In [None]:
cube.Cube.list(level=2)

Check subsetted datacube

In [None]:
subsetted_cube = cube.Cube(pid='...')
subsetted_cube.info()

In [None]:
cube.Cube.deletecontainer(container=file,force='yes')