![](../nci-logo.png)

-------
# Data Calculations and Visualisations Notebooks

### In this notebook:

- Using iPython Notebooks with NetCDF data within the VDI
    - <a href='#part1'>Launch Jupyter Notebook</a>  
    - <a href='#part2'>Setup and Load Data</a>  
    - <a href='#part3'>Common Operations</a> 
    - <a href='#part4'>Resampling and Rolling Mean</a> 
    - <a href='#part5'>Climatologies</a>
    - <a href='#part6'>Plot - Extras</a>
---------


<a id='part1'></a> 
## Launch the Jupyter Notebook application

#### Using the public hh5 conda environment managed by CLEX

Many python modules are available under the hh5 conda environment that is maintained by CLEX, as well as additional modules such as that of CleF used in the previous examples. This environment is publically available and developed to service the CLEX users allowing use cases for the wider communinty.
```
    $ module use /g/data3/hh5/public/modules
    $ module load conda/analysis3
```  

Launch the Jupyter Notebook application:
```
    $ jupyter notebook
``` 

<div class="alert alert-info">
<b>NOTE: </b> This will launch the <b>Notebook Dashboard</b> within a new web browser window. 
</div>

<a id='part2'></a> 
## Setup and Load Data

#### Load the required modules

In [None]:
import xarray as xr
import netCDF4 as nc

#### Opening Multiple Files at Once

xarray's `open_mfdataset` allows multiple files to be opened simultaneously.

In [None]:
!ls /g/data/oi10/replicas/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/Amon/tas/gr1/v20180701
path = '/g/data/oi10/replicas/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/Amon/tas/gr1/v20180701/*'
f_hist = xr.open_mfdataset(path)
tas = f_hist.tas - 273.15

<a id='part3'></a> 
## Common Operations

Xarray supports many of the statistical operations commonly performed on climate data arrays. Examples include:

    Mean
    Standard Deviation
    Minimum
    Maximum

The operations are done with numpy-like syntax that many users are familiar with: `.mean()`, `.stddef()`, etc. For example, calculating a simple mean would be done via:


In [None]:
tas.mean()

The above mean() happened very quickly because of lazy loading, which we talked about before. The resultant array still remains an xarray DataArray.

### Xarray Operations and Missing Values

In general when using numpy, a user would need to mask out the missing or nan values of an array. As long as the nan or missing values are defined in the data array, the operations in general will be performed excluding those values.

In [None]:
import numpy as np
nan_test=xr.DataArray([1,3,np.nan])
nan_test.mean()

### Xarray Operations on Subsets

As with data subsetting, one of the advantages of xarray it that it permits operations to be performed specified via a list of dimension labels.

For example, to calculate the mean in time would be:

In [None]:
tas.mean(dim='time')

To calculate the mean over lat and lon you will need to take into account the size of the grid cells and weight the mean by the area of those grid cells.

In CMIP6 the cell areas are included under the CMIP_table parameter 'fx' for atmosphere and land parameters and 'Ofx' for ocean. In this case we will use area of atmospheric cells areacella.

In [None]:
file_area = xr.open_dataset('/g/data/oi10/replicas/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/\
r1i1p1f1/fx/areacella/gr1/v20180701/areacella_fx_GFDL-CM4_historical_r1i1p1f1_gr1.nc')
area = file_area.areacella
area

We can now calculate the area weighted mean using the above variable and xarray operations.

In [None]:
tas_area = tas*area
tas_wmean = tas_area.sum(['lat','lon']) / (area.sum())
tas_anom = tas_wmean - tas_wmean.mean()
tas_anom.plot()

### Exercise

Find the minimum temperature in a box spanning the coordinates

    lon: 80 - 120
    lat: 10N - 20N

<a href="#ans1" data-toggle="collapse">Answer</a>
<div class="collapse" id="ans1">
<pre><code>
tas.sel(lat=slice(10,20),lon=slice(80,120)).min()
</code></pre>
</div>

<a id='part4'></a> 
## Resampling and Rolling Mean

The previous time-series plot was a bit hectic. We probably want to resample in time or take a rolling mean to remove the smaller scales of the data and see the longer term trend.

xarray very easily support both resampling and rolling means with the `.rolling` and the `.resample` operations. 

Starting with a rolling mean we'll use a 1 year window (time=12 as is monthly data).

In [None]:
tas_roll = tas_anom.rolling(time=12,center=True).mean()
tas_anom.plot()
tas_roll.plot()

Resampling can work in a similar way to the rolling mean. Resampling works with time data (with datetime64 variable in dimensions) and allows you to change the time frequency of your data. If you have a monthly dataset and you want that as a yearly dataset (or a 6hrly dataset that you want daily, etc) you can use the resample command while taking the mean over the resampling period.

In the example below we have defined the new resampling period to be annual 'A' and have taken the mean over the monthly data in each year to produce a yearly average time series. Compare it to the rolling mean case above.

In [None]:
tas_res=tas_anom.resample(time='A').mean(dim='time')
tas_roll.plot()
tas_res.plot()

### Exercise

Try rolling or resampling at other frequencies. Look at smaller subsets to see the difference.

<a id='part5'></a> 
## Climatologies

With `.groupby()` you can easily create a climatology and calculate anomalies. In this example, working with monthly data, a new dimenion will be created that is the months of the year. 

Lets create a climatology from pre-1900 data:

In [None]:
climatology = tas.sel(time=slice('1800','1900')).groupby('time.month').mean(dim='time')
climatology

### Exercise

Instead of grouping by month, group by 'season'. 
Plot one of the seasons.

<a href="#ans2" data-toggle="collapse">Answer</a>
<div class="collapse" id="ans2">
<pre><code>
climatology_seas = tas.sel(time=slice('1800','1900')).groupby('time.season').mean(dim='time')
climatology_seas.sel(season='DJF').plot()
</code></pre>
</div>

<a id='part6'></a> 
## Plot - Extras

### Subplots

In the seasonal climatology case we have created a climiatolgy with 4 periods in time relating the the seasons. It is very simple to plot all the seasons in a single plot simple by specifying which variable defines the `row` and how many columns to produce (we have also defined the colourbar limits using `vmin` and `vmax`):

In [None]:
climatology_seas.plot(row='season',col_wrap=2,vmin=-10,vmax=40)

We can integrate projections with the xarray datasets using the cartopy command.

In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

plt.figure(figsize=(6, 4))
ax = plt.axes(projection=ccrs.Orthographic(central_longitude=140))
ax.coastlines()

climatology_seas.sel(season='DJF').plot(transform=ccrs.PlateCarree(), vmin=-2, vmax=30,
            cbar_kwargs={'shrink': 0.8})

### Exercise

Make your own plot using your favorite projection: https://scitools.org.uk/cartopy/docs/latest/crs/projections.html