**Agenda**

- HadCRUT4
- netCDF
- CF Conventions
- netCDF4-python

**Install required libraries**

`conda install netCDF4 xarray numpy matplotlib`

# Introduction

We have learned numpy, a low level tool to deal with multidimensional arrays. Today we will see how higher level tools integrate with numpy to work with multidimensional data in the context of meteorology.

Also, we have worked with Numpy arrays in memory, these arrays disappear when the program ends. We will see how to persist multidimensional data using netCDF but don't forget that numpy arrays can be saved to disk using [numpy.save](https://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html). The binary format is described [here](https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.format.html#module-numpy.lib.format).

## HadCRUT4 Dataset

 HadCRUT4 is a global temperature dataset, providing gridded temperature anomalies across the world as well as averages for the hemispheres and the globe as a whole. CRUTEM4 and HadSST3 are the land and ocean components of this overall dataset, respectively.

These datasets have been developed by the Climatic Research Unit (University of East Anglia) in conjunction with the Hadley Centre (UK Met Office), apart from the sea surface temperature (SST) dataset which was developed solely by the Hadley Centre. These datasets will be updated at roughly monthly intervals into the future. Hemispheric and global averages as monthly and annual values are available as separate files. 

- [HadCRUT4 Dataset](https://crudata.uea.ac.uk/cru/data/temperature/)
- [CRUTEM4 Dataset](https://www.metoffice.gov.uk/hadobs/crutem4/)
- [HadSST3 Hadley Centre SST Dataset](https://www.metoffice.gov.uk/hadobs/hadsst3/)

## [netCDF](https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_introduction.html)

The Network Common Data Form, or netCDF, is an interface to a library of data access functions for storing and retrieving data in the form of arrays. An array is an n-dimensional (where n is 0, 1, 2, ...) rectangular structure containing items which all have the same data type (e.g., 8-bit character, 32-bit integer). A scalar (simple single value) is a 0-dimensional array.

NetCDF is an abstraction that supports a view of data as a collection of self-describing, portable objects that can be accessed through a simple interface. Array values may be accessed directly, without knowing details of how the data are stored. Auxiliary information about the data, such as what units are used, may be stored with the data. Generic utilities and application programs can access netCDF datasets and transform, combine, analyze, or display specified fields of the data. The development of such applications has led to improved accessibility of data and improved re-usability of software for array-oriented data management, analysis, and display.

In [None]:
!ncdump -h 'absolute.nc'

**Answer the following questions related to `absolute.nc`**

- How many dimensions has the dataset? Which are their names?



- How many variables has the dataset? Which are their names?



- How many coordinate variables has the dataset? Which are their names?



- Which units are used for measuring temperature?



- How many temperature values (tem variable) has `absolute.nc` got?



- Why there are only 12 values in the time variable if data is monthly?




In [None]:
!ncdump -h 'HadCRUT.4.6.0.0.median.nc'

[Chunking](https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters)

### Remote netCDF

See [OpenDAP (Data Access Protocol)](https://www.unidata.ucar.edu/software/netcdf/docs/dap2.html)

In [None]:
!ncdump -h 'https://thredds.ucar.edu/thredds/dodsC/nws/metar/ncdecoded/files/Surface_METAR_20191007_0000.nc'

## CF conventions

[CF conventions](http://cfconventions.org/) are designed to promote the processing and sharing of files created with the NetCDF API. The CF conventions are increasingly gaining acceptance and have been adopted by a number of projects and groups as a primary standard. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.

## netCDF4-python

[netcdf4-python](http://unidata.github.io/netcdf4-python/netCDF4/index.html) is a Python interface to the netCDF C library.

netCDF version 4 has many features not found in earlier versions of the library and is implemented on top of HDF5. This module can read and write files in both the new netCDF 4 and the old netCDF 3 format, and can create files that are readable by HDF5 clients. The API modelled after Scientific.IO.NetCDF, and should be familiar to users of that module.

In [None]:
import netCDF4 as nc4
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
ds = nc4.Dataset('HadCRUT.4.6.0.0.median.nc')
print(ds)

In [None]:
print(ds.variables)

In [None]:
ta = ds.variables['temperature_anomaly']
print(ta)

In [None]:
plt.pcolormesh(ta[-1,:,:])

In [None]:
ta_data = ta[:].data
print(f'Type: {type(ta_data)}')
print(f'Shape: {ta_data.shape}')
print(f'Ndim: {ta_data.ndim}')
print(f'Dtype: {ta_data.dtype}')
print(f'Flags:\n{ta_data.flags}')

**Exercise**

- Print the value of the maximun anomaly
- Print the indices of the numpy array where the value is located (check with `ta_data[X,Y,Z]`)
- Convert from Kelvin to Celsius and print the maximun anomaly (`C=K-273.15`)

## xarray

[xarray](http://xarray.pydata.org/en/stable/index.html) (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures.

Xarray is inspired by and borrows heavily from pandas, the popular data analysis package focused on labelled tabular data. It is particularly tailored to working with netCDF files, which were the source of xarray’s data model, and integrates tightly with dask for parallel computing.

In [None]:
import xarray as xr

In [None]:
hadcrut4 = xr.open_dataset('HadCRUT.4.6.0.0.median.nc')
print(hadcrut4)

### DataSet

[DataSet](http://xarray.pydata.org/en/stable/data-structures.html#dataset) is xarray’s multi-dimensional equivalent of a DataFrame. It is a dict-like container of labeled arrays (DataArray objects) with aligned dimensions. It is designed as an in-memory representation of the data model from the netCDF file format.

### DataArray

[DataArray](http://xarray.pydata.org/en/stable/data-structures.html#dataarray) is xarray’s implementation of a labeled, multi-dimensional array. It has several key properties:

- values: a numpy.ndarray holding the array’s values
- dims: dimension names for each axis (e.g., ('x', 'y', 'z'))
- coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings)
- attrs: dict to hold arbitrary metadata (attributes)

In [None]:
print(type(hadcrut4))
print(type(hadcrut4.latitude))
print(type(hadcrut4.longitude))
print(type(hadcrut4.time))
print(type(hadcrut4.temperature_anomaly))

In [None]:
print(hadcrut4.time) # Note how xarray interprets time values!

In [None]:
print(hadcrut4.temperature_anomaly)

In [None]:
print(hadcrut4.temperature_anomaly.values) # numpy array

### matplotlib integration

In [None]:
x = hadcrut4.temperature_anomaly.sel(time=hadcrut4.time[-1])
print(x)

In [None]:
x.plot()

### Global temperature anomaly

In [None]:
x = hadcrut4.temperature_anomaly.resample(time='1Y').mean()
print(type(x))

In [None]:
x.plot()

# Exercises

## 1 - Get the date for the min and max anomalies

**Solution**

(numpy.datetime64('1862-12-31T00:00:00.000000000'),
 numpy.datetime64('2016-12-31T00:00:00.000000000'))

## 2 - Plot anomalies for northen and southern poles (Hint: [where](http://xarray.pydata.org/en/stable/indexing.html#masking-with-where))

## 3 - Get temperature values used to calculate anomalies in 2018 (as a numpy array (.values))

**Look at temperature units**

In [None]:
absolute = xr.open_dataset('absolute.nc')
print(absolute)

In [None]:
print(absolute.tem)

In [None]:
print(t2018[~np.isnan(t2018)].sum(), t2018[~np.isnan(t2018)].max()) # 6214262.5, 310.1098