# Introduction to PyOphidia

This notebook guides you through the implementation and execution of a sample climate indicator exploiting the **PyOphidia** module, as shown in the demo.

The goal of this training is to give an overview of the features while implementing a real indicator from the *extreme climate indices* set. The indicator that you're going to implement during this training is the *Daily temperature range (DTR)*: i.e. the monthly mean difference between the maximum and minimum daily temperatures. The full list of indices is provided at [http://etccdi.pacificclimate.org/list_27_indices.shtml](http://etccdi.pacificclimate.org/list_27_indices.shtml).

Before starting the actual implementation of the indicator, let's play a little with the basic features of PyOphidia.

## 1. Getting started with PyOphidia

PyOphidia is a Python package used to interact with the Ophidia Framework and it provides a convenient way to submit requests to an Ophidia server or to develop your own application using Python. It runs on Python 2 or 3 and provides 2 main modules:

* client.py: low level class to submit any type of requests (simple tasks and workflows);
* cube.py: high level cube-oriented class to interact directly with cubes.

This tutorial will mainly exploit the cube class feature.

Before running any other operation, a new session with the Ophidia Server must be established. Run the following code cell to set a new connection by pressing the **play** button on the top bar or **[shift + enter] keys**. 

In [None]:
from PyOphidia import cube
cube.Cube.setclient(read_env=True)

If successful, the output will show something like:
    
```python
Current cdd is /
Current session is https://127.0.0.1/ophidia/sessions/456546436462436547544775644646/experiment
Current cwd is /
The last produced cube is https://127.0.0.1/ophidia/1/1
```

<hr style="height:1px;border-top:1px solid #0000FF" />

Once the connection has been established, it's possible to run the actual data management and analytics operators. 

The first operator to test is *list*, which provides a graphical (ASCII-based) view of the data available in the user's space. The option ```level=2``` represents the level of verbosity. 

If this is the first experiment you're running, your space should be empty, otherwise you'll see some Ophidia containers/datacubes created in the previous sessions. Try it yourself by running the following line.

In [None]:
cube.Cube.list(level=2)

<hr style="height:1px;border-top:1px solid #0000FF" />

Now it's time to load the first dataset into your space by exploiting the *importnc* operator. Run the following command to load a CMIP5 NetCDF (*.nc*) dataset produced by [*CMCC Foundation*](https://www.cmcc.it) with the *CESM model* creating a new datacube. It should take a few seconds.

As you can see, the method uses a lot of different arguments to load the data. The two most important are:

* *src_path*, the path of the file to be imported
* *measure*, the variable to be imported (*tasmax*, the maximum daily temperature)

If you want to learn more about all the parameters available in the *importnc* operator, you can check the [documentation page](http://ophidia.cmcc.it/documentation/users/operators/OPH_IMPORTNC.html).

In [None]:
%%time
tasmax = cube.Cube.importnc(
    src_path='/home/ophidia/notebooks/tasmax_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc',
    measure='tasmax',
    imp_dim='time',
    ncores=1,
    description='Max Temperatures',
    )

<hr style="height:1px;border-top:1px solid #0000FF" />

You can now import the second dataset related to the *tasmin* variable, i.e. the minimum daily temperature. 

Note the different value used for the **ncores** parameter. The Ophidia Framework provides an environment for the execution of parallel data analytics exploiting the underlying cluster features. This time the operator will run the import with 4 parallel processes and the execution time should take less. 

In [None]:
%%time
tasmin = cube.Cube.importnc(
    src_path='/home/ophidia/notebooks/tasmin_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc',
    measure='tasmin',
    imp_dim='time',
    ncores=4,
    description='Min Temperatures',
    )

<hr style="height:1px;border-top:1px solid #0000FF" />

At this stage you should have at least 2 containers and 2 datacubes inside your space. You can run again the *list* operator to verify this yourself. Datacubes are identified by a string that looks like:  `https://127.0.0.1/ophidia/1/1`

In [None]:
cube.Cube.list(level=2)

<hr style="height:1px;border-top:1px solid #0000FF" />

If you observe carefully the last two executed python lines, you'll notice that the methods are called in a slightly different way. 

In fact, all operators that create a new datacube in the user's space, like *importnc*, are categorized as *data operators* and produce as output a python Object enclosing the information regarding that datacube. In this way, it is possible to apply operators directly on the cube Object without the necessity to refer to the datacube identifier.

On the other hand, the operators that don't create a datacube, such as *list*, are categorized as *metadata operators* and are actually Class Methods that simply produce a visual output without any callable Object.

You're now ready to run some analytical operations on the newly imported datacubes. The next section will guide you through some basic data analysis operations required for the DTR indicator.

<hr style="height:7px;border-top:2px solid #0000FF" />

## 2. Running data analytics operations

You can run different type of operations on the datacubes available in your space. Ophidia provides around 50 data and metadata operators supporting operations including: data aggregations, complex mathematical operations, predicate evaluation, subsetting, datacube intercomparison, metadata management, as well as import and export of datacubes ([check this page for the full list](http://ophidia.cmcc.it/documentation/users/operators/index.html)).

Operators applied to datacubes require a python Object referencing that cube. In the last code block we created the **tasmin** cube Object, so now we can apply other operations to this datacube.

The following cell code will perform a simple data reduction operation (i.e. the average over the whole time range) on the **tasmin** cube Object and produce another cube Object called **testCube1**. The parameters in the function specify the type of operations to be performed. In particular:

* *concept_level*: represents the concept level used for the operation
* *operation*: specify the reduction operation (*avg*, the average)


The following commands show the information related to this newly created datacube and a portion of its content (with the option ```limit_filter=1```) (note that these don't create any new datacube).

In [None]:
testCube1 = tasmin.reduce2(
    dim='time',
    concept_level='A',
    operation='avg',
    description="Overall average tasmin",
    ncores=2    
)

testCube1.info()
testCube1.explore(limit_filter=1)

<hr style="height:1px;border-top:1px solid #0000FF" />

You can also compute other types of statistical values over different time ranges. The *reduce2* operator [documentation page](http://ophidia.cmcc.it/documentation/users/operators/OPH_REDUCE2.html) provides the full description of the alternatives implemented by the operator.

Try to rerun the code above by replacing the following arguments to get the maximum temperature on a yearly basis and check the difference in the *Dimension Information* section.
* concept_level='y'
* operation='max'

Operations can also be applied in cascade in a single line of code, like in the following line, which computes the average over all datacube dimensions. The resulting **testCube2** Object will reference the final datacube created by the sequence of operations. 

In [None]:
testCube2 = tasmin.reduce(operation='avg', ncores=2).merge().aggregate(operation='avg', description="Mean tasmin")

<hr style="height:1px;border-top:1px solid #0000FF" />

Most of the Ophidia data operators working on datacubes are applied on a cube Object and produce another cube Object, however some operators require more than one input datacube, like the *intercomparison*. In this case the additional datacubes must be specified in specific arguments. 

Let's get back to the implementation of the DTR indicator exploiting the concepts that you've just learned. The following code will compute the daily temperature range with the *intecomparison* operator, i.e. the difference among the **tasmax** datacube and the **tasmin** datacube, creating a new cube Object called **dailyDTR**. The second datacube is specified in the *cube2* argument, while the difference is specified by the ```operation='sub'```. 




The second operator computes the monthly mean values from the daily temperature ranges (setting ```concept_level='M'``` and ```operation='avg'```) in the *reduce2* operator, creating a new Object **monthlyDTR**.

In [None]:
dailyDTR = tasmax.intercube(
    cube2=tasmin.pid,
    operation='sub',
    description="Daily DTR",
    measure='dtr',
    ncores=2
    )

monthlyDTR = dailyDTR.reduce2(
    dim='time',
    concept_level='M',
    operation='avg',
    description="Monthly DTR",
    ncores=2    
    )

monthlyDTR.explore(limit_filter=5)

<hr style="height:7px;border-top:2px solid #0000FF" />

## 3. Extracting the results of the computation

Ophidia allows exporting the datacube as a NetCDF file and, thanks to the seamless integration with the python environment, it is possible to export it in python-friendly structures and plot it using well-known python modules, such as Matplotlib and Cartopy.

The datacube created in the previous step (**monthlyDTR**) contains the data for several months in the time range [2096-2100]. We would like to plot the data on a map so, for the sake of simplicity, let's extract a single time step (in the example *January 2096*) using the *subset* operator with the following code.

In [None]:
firstMonthDTR = monthlyDTR.subset(
    subset_dims='time',
    subset_filter='2096-01',
    description="Subset Monthly DTR",
    subset_type='coord',
    ncores = 4
)

data = firstMonthDTR.export_array()

<hr style="height:1px;border-top:1px solid #0000FF" />

The final row of the cell above allows exporting the data related to the **firstMonthDTR** datacube in a python-friendly structure, which can then be used as input for the plotting libraries.

You can explore the info contained in this structure with the following command. As you can see, the structure contains an array of values for each dimension (i.e. *lat*, *lon*, *time*) and variable (*dtr*) belonging to the datacube.

In [None]:
from IPython.lib.pretty import pprint
pprint(data)

<hr style="height:1px;border-top:1px solid #0000FF" />

We can also use the *to_dataset* method in order to export the datacube into an **Xarray dataset**. 

In [None]:
dataset = firstMonthDTR.to_dataset()

<hr style="height:1px;border-top:1px solid #0000FF" />

We can explore the dataset

In [None]:
dataset

<hr style="height:1px;border-top:1px solid #0000FF" />

Finally, let's create a simple map with the DTR data extracted so far. The following code will create a map exploiting Matplotlib and Cartopy libraries showing the DTR for the various points on the globe. 

You're free to change the properties to update the graphical layout. Check the Cartopy documentation for additional examples ([https://scitools.org.uk/cartopy/docs/latest/gallery/index.html](https://scitools.org.uk/cartopy/docs/latest/gallery/index.html)).

Note how the values from the dimensions *lat* and *lon* are used to define the map grid.

To plot the map, we can use the array obtained from the *export_array()* method ...

In [None]:
%matplotlib inline
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from cartopy.mpl.geoaxes import GeoAxes
from cartopy.util import add_cyclic_point
import numpy as np
import warnings
warnings.filterwarnings("ignore")

fig = plt.figure(figsize=(15, 6), dpi=100)

#Add Geo axes to the figure with the specified projection (PlateCarree)
projection = ccrs.PlateCarree()
ax = plt.axes(projection=projection)

#Draw coastline and gridlines
ax.coastlines()

gl = ax.gridlines(crs=projection, draw_labels=True, linewidth=1, color='black', alpha=0.9, linestyle=':')
gl.xlabels_top = False
gl.ylabels_right = False

lat = data['dimension'][0]['values'][ : ]
lon = data['dimension'][1]['values'][ : ]
var = data['measure'][0]['values'][ : ]
var = np.reshape(var, (len(lat), len(lon)))

#Wraparound points in longitude
var_cyclic, lon_cyclic = add_cyclic_point(var, coord=np.asarray(lon))
x, y = np.meshgrid(lon_cyclic,lat)

#Define color levels for color bar
levStep = (np.nanmax(var)-np.nanmin(var))/20
clevs = np.arange(np.nanmin(var),np.nanmax(var)+levStep,levStep)

#Set filled contour plot
cnplot = ax.contourf(x, y, var_cyclic, clevs, transform=projection,cmap=plt.cm.YlOrRd)
plt.colorbar(cnplot,ax=ax)

ax.set_aspect('auto', adjustable=None)

plt.title('DTR')
plt.show()

... or the dataset obtained from *to_dataset()* method.

In [None]:
%matplotlib inline
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from cartopy.mpl.geoaxes import GeoAxes
from cartopy.util import add_cyclic_point
import numpy as np
import warnings
warnings.filterwarnings("ignore")

fig = plt.figure(figsize=(15, 6), dpi=100)

#Add Geo axes to the figure with the specified projection (PlateCarree)
projection = ccrs.PlateCarree()
ax = plt.axes(projection=projection)

#Draw coastline and gridlines
ax.coastlines()

gl = ax.gridlines(crs=projection, draw_labels=True, linewidth=1, color='black', alpha=0.9, linestyle=':')
gl.xlabels_top = False
gl.ylabels_right = False

lat = dataset['lat'].values
lon = dataset['lon'].values
var = dataset['dtr'].values
var = np.reshape(var, (len(lat), len(lon)))

#Wraparound points in longitude
var_cyclic, lon_cyclic = add_cyclic_point(var, coord=np.asarray(lon))
x, y = np.meshgrid(lon_cyclic,lat)

#Define color levels for color bar
levStep = (np.nanmax(var)-np.nanmin(var))/20
clevs = np.arange(np.nanmin(var),np.nanmax(var)+levStep,levStep)

#Set filled contour plot
cnplot = ax.contourf(x, y, var_cyclic, clevs, transform=projection,cmap=plt.cm.YlOrRd)
plt.colorbar(cnplot,ax=ax)

ax.set_aspect('auto', adjustable=None)

plt.title('DTR')
plt.show()

<hr style="height:7px;border-top:2px solid #0000FF" />

## 4. Final remarks

Congrats! You've completed this training regarding some basics operations that can be performed with PyOphidia.

If you want to clear your user space before running other notebooks, run the following commands:

In [None]:
cube.Cube.deletecontainer(container='tasmin_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc',force='yes')
cube.Cube.deletecontainer(container='tasmax_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc',force='yes')

<hr style="height:1px;border-top:1px solid #0000FF" />

You can now move to the second tutorial notebook [**Summer Days**](2-Summer_Days.ipynb).