## 3.2 NetCDF Data - How to get it, read it and understand it

### 3.2.1 Importing the Modules and some Options

Before we can start to look at our data, we will always need to import our three most important modules: `matplotlib`, `numpy` and `xarray`. We will do some more presettings, so everything will look well sized and nice in our notebook.

In [1]:
# Display the plots in the notebook:
%matplotlib inline

In [2]:
# Import the tools we are going to need today:
import matplotlib.pyplot as plt  # plotting library
import numpy as np  # numerical library
import xarray as xr  # netCDF library
import cartopy  # Map projections libary, needed to display world maps
import cartopy.crs as ccrs  # Projections list
# Some defaults:
plt.rcParams['figure.figsize'] = (12, 5)  # Default plot size
np.set_printoptions(threshold=20)  # avoid to print very large arrays on screen
# The commands below are to ignore certain warnings.
import warnings
warnings.filterwarnings('ignore')

### 3.2.2 Get NetCDF Data

The data we are going to use in this chapter is from ECMWF. We will use a dataset, downloaded with the methods explained in the chapter 02. 

The dataset contains the monthly means of daily means of the 2 meter temperature as well as the total precipitation. You can either get the data [here]() or download it on your own, following the instructions of the previous chapter!

### 3.2.3 Read NetCDF Data

NetCDF files are binary files, which means that you can't just open them in a text editor. You need a special reader for it. 

To handle NetCDF data, python comes with the module [xarray](http://xarray.pydata.org/en/stable/). We already imported it before as `xr`! Xarray provides a lot of useful methods. The one to read NetCDF files is `xr.open_dataset(datadir)`. See the example underneath to understand how it works!

In [3]:
dataDIR = './data/ERA5-LowRes-MonthlyAvg-t2m_tp.nc' 
# the path to which you saved the netCDF file in the step before
# Here I downloaded the file in the "data" folder which I placed in the same folder as the notebook --> the dot "." 
# in the beginning means "look in the current foler"

ds = xr.open_dataset(dataDIR) # the data of the netCDF File will be stored in "ds" (dataset)

**Note**: you'll have to give an absolute or relative path to the file for this to work. For example ``'C:\PATH\TO\FILE\ERA5-MonthlyAvg-2tm_tp-75-rolled.nc'`` in windows.

Let's see, what our **ds** object looks like!

In [4]:
ds

Alright. So let's go through this step by step. 

Our `ds`is of the type `xarray.DataSet` (convince yourself above). A`xarray.Dataset` generally consists of the following key properties: 

* **Dimensions**: dimension names for each axis (in our case:'latitude', 'longitude', 'month'), specifies the number of elements for each Dimension, e.g. `time: 488`.
* **Coordinates**: a container of arrays (coordinates) that label each point. E.g.: 1-dim arrays of numbers (like a coordinate vector) , DateTime objects (for time labeling), or strings. On the right hand side you can see the actual values that the coordinates have.
* **Variables**: a numpy.ndarray holding the array’s values, this is where the actual data is stored! In our case, we can expect three arrays of size [241, 480, 488].
* **Attributes**: does not contain any data, is a container that holds arbitrary metadata (attributes), like the title of the data, additional information about the dataset,...

A `xarray.DataSet` is a collection of `xarray.DataArray`s. Each NetCDF file contains such a DataSet.

So what is a `xarray.DataArray`?
It is a multi-dimensional array with labeled or named dimensions. DataArray objects add metadata such as dimension names, coordinates, and attributes (defined below) to underlying “unlabeled” data structures such as our normal numpy arrays.

In our example above, each `xarray.DataArray` would consist of one of the listed Data Variables, e.g. `t2m` the 2 meter temperature. Together those three DataArrays form a `xarray.DataSet`, which in turn is stored in our downloaded netCDF file. 


The **xarray logo** gives us a visual understanding of how a xarray Dataset looks like:

![4D-Data: Data of a specific area dependent on height z and time](xarraylogo.png "")

![](dataset-diagram.png "")

For us, the most interesting two properties will be the coordinates and the variables. Let's have a closer look at them! 

#### Coordinates

You can adress all the different properties of the `xarray.DataArray` via the dot `.` expression:

In [5]:
ds.time # adress the coordinate 'time'

**Time** goes from 1979 to 2019 and has a resolution of one month. You can read this out of the data listed to the right of `time` at the coordinates. The type of the given values is datetime64.

The **spatial coordinates** are as easy to understand:

In [6]:
ds.latitude

Latitude goes from 90 to -90 and has the unit 'degrees north'. The spatial resolution of this dataset is 0.75°, as you can easily see from the given values. 

In [7]:
ds.longitude

#### Variables

As for coordinates, variables can also be accessed directly from the dataset via the `.` syntax! By doing this, you will actually extract one DataArray from the whole DataSet. We can try it now for e.g. the ` t2m` variable.

In [8]:
ds.t2m

The **attributes** of a variable are extremely important, they cary the *metadata* and must be specified by the data provider. Here we can read in which units the variable is defined (K for Kelvin), as well as a description of the variable (the "long_name" attribute), and sometimes also what the valid value range is (not here). 

From the upper description "... values with dtype=float32", we can also see the data type of the values of our data: in our case this is "float32", a floating point number with 32 bits. 

In [9]:
ds.tp

The total precipitation is given in m! Since mm is the common unit for precipitation data, we will need to calculate that in order to get expected values of precipitation.

In [10]:
tp_mm = ds.tp  / 1000

In [11]:
ds.tp.long_name # adress the attributes of a variable via a second '.'!

'Total precipitation'