# Simple Xarray demo
This provides simple example of using Xarray. See https://docs.xarray.dev for more examples and detailed documentation.

Choose **Help -> Show Contextual Help** to get interactive help.

In [None]:
import xarray as xr 

## Manipulations requiring only NetCDF metadata access
The commands in this section run quickly, as they use the metadata in the NetCDF files, not the full volume of numerical data (which is a much larger).

### Create a virtual dataset by concatenating many files along time axis

Define the file paths to open. Note the wildcard (`*`) - this refers to 61 individual NetCDF files. 

In [None]:
files = '/g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle1/output*/ocean/ocean_month.nc'

Create an Xarray dataset by concatenating these files on the time axis

In [None]:
ds = xr.open_mfdataset(files, parallel=True)  # concatenate many NetCDF files into one virtual dataset object

Display a representation of the resulting dataset. Click the little triangles to see details.
Note that the dataset contains over 100 data variables. Click the icons to the right of each variable for metadata and information on the data each one contains.

In [None]:
ds

### Extract and subset temperature data
Any variable can be extracted by name using this dot notation:

In [None]:
t = ds.pot_temp  # select 4D potential temperature dataarray from dataset

Display a representation of the variable. Note that it is 4-dimensional, of size 732x50x300x360. This would be over 14GB, but hasn't been loaded into memory yet.

From the `time` axis you can see it contains 732 monthly means (61 years, from 1958 to 2018).

Note: The spatial dimensions are 50x300x360, as this is from a coarse 1° model. 0.1° model data is 135 times larger (spatially 75x2700x3600), so 61 years would be about 2000GB. This is more than can fit into a node's memory, but it could still be handled with this code because only the metadata is needed at this stage (the data itself doesn't need to be loaded).

In [None]:
t

Now we select a subset of the data, specified by calendar dates rather than indices. (The requested date range extends beyond the data range, so only a subset of these dates is returned.) Note that the memory requirement has dropped to about 2GB. 

In [None]:
t = t.sel(time=slice('2010-01-01', '2022-01-01'))  # subset on time axis

In [None]:
t

The data is global, but let's also restrict the spatial range to focus on the region near Australia. Again, we can specify this using physical units rather than grid indices.

In [None]:
t = t.sel(xt_ocean=slice(-270, -150)).sel(yt_ocean=slice(-60, 20))

The data volume is further reduced

In [None]:
t

Let's now just select data from 50m depth. Note that the vertical grid doesn't have data at exactly that level (check this by clicking the cylindrical icon for `st_ocean` above), so we interpolate. Use interactive help to see what other interpolation methods are available.

In [None]:
t = t.interp(st_ocean=50, method='linear')  # extract 2D interpolated data at 50m depth

In [None]:
t

We can get a monthly climatology (the avrage of every January, and the average of every February, etc etc) by using `groupby` and then `mean`:

In [None]:
t = t.groupby('time.month').mean()  # monthly averages

now we have a `month` dimension instead of `time`

In [None]:
t

We can then select the first month to get a January average, and convert from K to °C:

In [None]:
t = t.sel(month=1) - 273.15   # January mean, converted to °C

This is now a really small amount of data

In [None]:
t

## Plotting result: requires data access and calculation
None of the manipulation and calculation we've specified so far (subsetting, interpolation, grouping, averaging) has actually taken place - this is **lazy evaluation**. But now we want to plot it, so the deferred calculation will need to be undertaken. But it's still quick, because only the required subset of data actually needs to be read.

We're just doing a simple plot here, but there are [more sophisticated things you can do](https://docs.xarray.dev/en/latest/user-guide/plotting.html).

In [None]:
%%time
t.plot()  # plot – this is when data access and calculation occur

## Exercise
Extract the salinity variable and plot the 1980 average in the Atlantic ocean at 100m depth.