# Example of workflow operating on multiple files

This notebook performs the same analysis on multiple different input datasets (computing the **Icing Days index**) and then combines the resulting datacubes into a single datacube to perform a final statistical analysis. In includes:

- a set of key-value pairs as additional global attributes shared between all the tasks
- the task list as a JSON array
- some information regarding task dependencies 
- the use of parallel for loops

Starting from the daily maximum temperature TX, the Icing Days index is the number of days where $TX < T$ (T is  a reference temperature, i.e. 0°C)

As first step, let's connect to the Ophidia Server

In [None]:
import sys
from PyOphidia import cube,client
cube.Cube.setclient(read_env=True)

The JSON object associated to the workflow is shown in the cell below.

The task list includes the following tasks:

1. **Createcontainer**
 - the *oph_createcontainer* operator (see http://ophidia.cmcc.it/documentation/users/operators/OPH_CREATECONTAINER.html) is used to create a working container for the workflow datacubes
 - the task has no dependencies

2. **Start loop**
 - the **oph_for** task (see http://ophidia.cmcc.it/documentation/users/operators/OPH_FOR.html) allows to start a for loop which will run the iterations concurrently thanks to the ```"parallel=yes"``` option. The loop index is set according to the ```"key=year"``` argument with values defined via ```values=$1``` (defined at runtime)
 - the task has an **embedded** dependency from the **Createcontainer** task since it only needs to wait for the previous task completion

3. **Import**
 - the input NetCDF data set located at ```src_path``` (set to the first workflow input parameter) is imported into the Ophidia platform, with maximum temperature in °K (see http://ophidia.cmcc.it/documentation/users/operators/OPH_IMPORTNC.html)
 - the ```measure``` is set according to the second workflow input parameter
 - data is arranged in order to operate on time series (as indicated by the ```imp_dim``` parameter)
 - the task has an **embedded** dependency from the **Start Loop** task since it only needs to wait for the previous task completion
 
4. **Icing Days mask**
 - the *oph_apply* operator (see http://ophidia.cmcc.it/documentation/users/operators/OPH_APPLY.html) is used to identify the summer days: $\{day \mid TX(day) < 273.15\}$ 
 - we are basically creating a mask by using the *oph_predicate* primitive (see http://ophidia.cmcc.it/documentation/users/primitives/OPH_PREDICATE.html)
 - the task has a **single** dependency from the **Import** task since it exploits only one output of parent task
 
5. **Count Icing days**
 - count days above the given threshold on yearly basis
 - the *oph_reduce2* operator (see http://ophidia.cmcc.it/documentation/users/operators/OPH_REDUCE2.html) is used with ```operation=sum``` and ```concept_level=y```
 - **single** dependency from **Icing Days mask**

6. **End loop**
 - *oph_endfor* operator (see http://ophidia.cmcc.it/documentation/users/operators/OPH_ENDFOR.html) to end the parallel for loop
 - **all** dependency from **Count Icing Days** task to pass the datacubes from the depending task for all the various branches to the tasks outside the loop

7. **Merge datacubes**
 - *oph_mergecubes2* operator (see http://ophidia.cmcc.it/documentation/users/operators/OPH_MERGECUBES2.html) to merge multiple datacubes and create a new dimension (```dim=new_dim```)
 - **all** dependency from **End Loop** task to pass all the datacubes from the for loop as ```cubes``` of the operator
 
8. **Final reduce**
 - *oph_reduce2* operator (see http://ophidia.cmcc.it/documentation/users/operators/OPH_REDUCE2.html) to perform the average ```operation=avg``` on dimension ```dim=new_dim```
 - **single** dependency from **Merge datacubes**
  
<img src="../imgs/Multi_year_Icing_Days.svg" alt="Multi_year_Icing_Days">

The tasks in the central box (the for loop) will be executed in parallel thanks to the ```parallel=yes``` option.

Note the use of **@{year}** keyword to identify the value associated with a given branch of the loop for.

In [None]:
multifile_workflow = """{
        "name": "Multi-year Icing Days",
        "author": "CMCC",
        "abstract": "Stats for Icing Days index over several years",
        "exec_mode": "sync",
        "ncores": "1",
        "cwd": "/",
        "tasks":
        [
                {
                        "name": "Create container",
                        "operator": "oph_createcontainer",
                        "arguments": 
                        [
                                "container=icing_days",
                                "dim=lat|lon|time",
                                "dim_type=double|double|double",
                                "hierarchy=oph_base|oph_base|oph_time"
                        ],
                        "on_error": "skip"
                },
                {
                        "name": "Start loop",
                        "operator": "oph_for",
                        "arguments": 
                        [
                                "key=year",
                                "values=$1",
                                "parallel=yes"
                        ],
                        "dependencies": [
                                { "task": "Create container"}
                        ]
                },
                {
                        "name": "Import",
                        "operator": "oph_importnc",
                        "arguments":
                        [
                                "src_path=/home/ophidia/notebooks/tasmax_day_CMCC-CESM_rcp85_r1i1p1_@{year}0101-@{year}1231.nc",                            
                                "measure=tasmax",
                                "container=icing_days",
                                "import_metadata=yes",
                                "imp_dim=time",
                                "imp_concept_level=d",
                                "vocabulary=CF",
                                "hierarchy=oph_base|oph_base|oph_time",
                                "description=Max Temp @{year}"
                        ],
                        "dependencies": [
                                { "task": "Start loop"}
                        ]
                },   
                {
                        "name": "Icing Days mask",
                        "operator": "oph_apply",
                        "arguments":
                        [
                                "measure_type=auto",
                                "query=oph_predicate(measure,'x-273.15','<0','1','0')",
                                "description=Icing days mask @{year}"
                        ],
                        "dependencies": [
                                { "task": "Import", "type": "single" }
                        ]
                },
                {
                        "name": "Count icing days",
                        "operator": "oph_reduce2",
                        "arguments":
                        [
                                "operation=sum",
                                "dim=time",
                                "concept_level=y",
                                "description=Icing days count @{year}"
                        ],
                        "dependencies": [
                                { "task": "Icing Days mask", "type": "single" }
                        ]
                },
                {
                        "name": "End loop",
                        "operator": "oph_endfor",
                        "arguments": [],
                        "dependencies": [
                                { "task": "Count icing days", "type": "all"}
                        ]
                },
                {
                        "name": "Merge datacubes",
                        "operator": "oph_mergecubes2",
                        "arguments": 
                        [
                                "dim=new_dim",
                                "description=Merged cube"
                        ],
                        "dependencies": [
                                { "task": "End loop", "type": "all", "argument": "cubes" }
                        ]
                },
                {
                        "name": "Final average",
                        "operator": "oph_reduce2",
                        "arguments": [
                                "operation=avg",
                                "dim=new_dim",
                                "description=Final cube"
                        ],
                        "dependencies": [
                                { "task": "Merge datacubes", "type": "single" }
                        ]
                }                
        ]
}"""

Once the workflow is defined, it can be executed very easily on different years by changing the related argument.

Let's define the workflow input arguments for the example

In [None]:
year_index="2096|2097|2098|2099|2100"

Submit the workflow. Note that, even though the total tasks defined is 8, the workflow management system will actually execute 20 tasks based on the current ```year_index``` definition.

In [None]:
cube.Cube.client.wsubmit(multifile_workflow, year_index)

We can plot a map for by considering the PID associated to the 'Final cube' datacube

In [None]:
cube.Cube.list(level=2)

In [None]:
# Get PID of 'First year'
firstyear = cube.Cube(pid='http://127.0.0.1/ophidia/.../...')

In [None]:
%matplotlib inline
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from cartopy.mpl.geoaxes import GeoAxes
from cartopy.util import add_cyclic_point
import numpy as np
import warnings
warnings.filterwarnings("ignore")

fig = plt.figure(figsize=(15, 6), dpi=100)

#Add Geo axes to the figure with the specified projection (PlateCarree)
projection = ccrs.PlateCarree()
ax = plt.axes(projection=projection)

#Draw coastline and gridlines
ax.coastlines()

gl = ax.gridlines(crs=projection, draw_labels=True, linewidth=1, color='black', alpha=0.9, linestyle=':')
gl.xlabels_top = False
gl.ylabels_right = False

data = firstyear.export_array(show_time='yes')
lat = data['dimension'][0]['values'][ : ]
lon = data['dimension'][1]['values'][ : ]
var = data['measure'][0]['values'][ : ]
var = np.reshape(var, (len(lat), len(lon)))

#Wraparound points in longitude
var_cyclic, lon_cyclic = add_cyclic_point(var, coord=np.asarray(lon))
x, y = np.meshgrid(lon_cyclic,lat)

#Define color levels for color bar
levStep = (np.nanmax(var)-np.nanmin(var))/20
clevs = np.arange(np.nanmin(var),np.nanmax(var)+levStep,levStep)

#Set filled contour plot
cnplot = ax.contourf(x, y, var_cyclic, clevs, transform=projection,cmap=plt.cm.Oranges)
plt.colorbar(cnplot,ax=ax)

ax.set_aspect('auto', adjustable=None)

plt.title('Icing Days Average')
plt.show()

Before running the other examples, empty the workspace

In [None]:
cube.Cube.deletecontainer(container='icing_days',force='yes')

In [None]:
cube.Cube.list(level=2)