# Multi-dimensional data access with Xarray
Henrik Andersson, 2021-01-15

In [None]:
pip install netcdf4

NCEP reanalysis data is used as an example of multi-dimensional data.

For this example we will use 2m temperature, but the full list of variables is [here](https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.surface.html).

First step is to download the data (later we will try an alternative where data subsets are downloaded on demand).

In [None]:
!curl -O ftp://ftp2.psl.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface_gauss/air.2m.gauss.2020.nc 

In [None]:
!ls

In [None]:
import xarray as xr

In [None]:
ds = xr.open_dataset("air.2m.gauss.2020.nc")
ds

Printing the `ds` object reveals lots of useful information.
* Dimensions
* Coordinates
* Variables
* Metadata

One of the strengths of `xarray` is the simple syntax to select variables and timesteps.

To select the air temperature variables on a specific time and plot it with useful axes and legend takes only one line of code.

In [None]:
ds.air.sel(time="2020-04-01").plot()

# DHI data

Ok, now we now that xarray can be used to read public data.

But it can also be useful for our own data, as an alternative to storing data in dfs files.

Some benefits of the NetCDF format:
 * Standard format, for distribution to customer un-familiar with dfs
 * Support for multiple time axes
 * Support for multiple spatial including non-equidistant axes
 * Keep all relevant data in a single file
 * Support for arbitrary metadata

 
 Let's take a look at some output from the Wave Hunter project

In [None]:
! curl -O https://wavehunter.z6.web.core.windows.net/wave_hunter_2020021000.nc

In [None]:
ds = xr.open_dataset("wave_hunter_2020021000.nc")
ds

The main output from the Wave Hunter project, was a map of probability for the coming two days. To be able to clearly label the temporal axis as a date and not a datetime I think is very helpful. Thus probability uses the `date` dimension.

The probability for the first date in this forecast:

In [None]:
ds.probability.isel(date=0).plot()

Better to use a static color scale from 0-1.

In [None]:
# Change the scale to [0,1]
ds.probability.isel(date=0).plot(vmin=0.0,vmax=1.0)

The underlying data used in the calculation are also useful to store for retrospective inspection. These are stored with `time` as the temporal dimension.

In [None]:
ds["Wind speed"].isel(time=0).plot()

It is also possible to access the same data using datetime as a string.

In [None]:
ds["Wind speed"].sel(time="2020-02-10 00:00").plot()

In order to get a timeseries, we can slice in the two spatial dimensions

In [None]:
ds["Wind speed"].sel(latitude=59.0,longitude=0.0).plot()

Calculations can be made similar to how you use NumPy. The result you get is a data-array.

In [None]:
ds.probability.isel(date=0).max()

To get the result in a NumPy array you can use the .values property.

In [None]:
maxp = ds.probability.isel(date=0).max().values

type(maxp)

In [None]:
maxp

I hope this has gotten you excited to use the xarray package.

The [xarray documentation](http://xarray.pydata.org/en/stable/index.html) is excellent and contains a lot of examples.