## What is netCDF?

netCDF stands for Unidata's Network Common Form, is a data format generally used to create, access, and share array-oriented scientific data. In the field of earth science, most data centers are publishing datasets in this format, considerably because of several reasons:

* netCDF file includes both data and the information about the data (metadata), making it *self-describing*.

* A netCDF file can be accessed by various types of computers, can also be read and written simultaneously, so it is *portable* and *sharable*.

* In netCDF, data subsets can be easily extracted, and additional data can be appended to the existing file as long as the data structure is aligned, thus it's a *scalable* and *appendable* data format.

* Older versions of netCDF data are as well supported as the current and future versions by most softwares, so it's a good option for *data archiving* too.

Quite often we find it uneasy to interpret a netCDF file especially if we are new to this data format. So let's walk through the content structure of a netCDF file together in this chapter!

## netCDF Content Structure

A netCDF file typically includes the **data** itself and the information about the data (also called **metadata**). In the earth science field, the data are often physical variable(s), e.g. sea water temperature, earth surface temperature, wind speed etc. In common, the measurement of those variables spans across multiple longitudes and latitudes, as well as time or/and vertical dimensions such as altitude, depth, pressure levels, etc. Therefore, the data stored in a netCDF file are usually multi-dimensional. 

The metadata, on the other hand, provide useful attribute information about the data. In a standardized netCDF file, metadata should provide information on each dimension, measured variable, as well as the complete dataset. Important questions like what the dataset is about, what variables are measured, in which unit they're measured and so on should be answered by the metadata, so that data users can interpret the story told by the dataset.

To take a closer look into the content structure of a standard netCDF file, we take the NCEP-NCAR reanalysis [dataset](https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis/Dailies/surface_gauss/) of daily mean air temperature at 2 meters above surface for year 1948 as an example.

In [1]:
import xarray as xr

ds = xr.open_dataset("/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/air.2m.gauss.1948.nc",
                     decode_cf=False)
ds.info()

xarray.Dataset {
dimensions:
	lat = 94 ;
	lon = 192 ;
	time = 366 ;
	nbnds = 2 ;

variables:
	float32 lat(lat) ;
		lat:units = degrees_north ;
		lat:actual_range = [ 88.542 -88.542] ;
		lat:long_name = Latitude ;
		lat:standard_name = latitude ;
		lat:axis = Y ;
	float32 lon(lon) ;
		lon:units = degrees_east ;
		lon:long_name = Longitude ;
		lon:actual_range = [  0.    358.125] ;
		lon:standard_name = longitude ;
		lon:axis = X ;
	float64 time(time) ;
		time:long_name = Time ;
		time:delta_t = 0000-00-01 00:00:00 ;
		time:avg_period = 0000-00-01 00:00:00 ;
		time:standard_name = time ;
		time:axis = T ;
		time:units = hours since 1800-01-01 00:00:0.0 ;
		time:actual_range = [1297320. 1306080.] ;
		time:coordinate_defines = start ;
	float32 air(time, lat, lon) ;
		air:long_name = mean Daily Air temperature at 2 m ;
		air:units = degK ;
		air:precision = 2 ;
		air:GRIB_id = 11 ;
		air:GRIB_name = TMP ;
		air:var_desc = Air temperature ;
		air:dataset = NCEP Reanalysis Daily Averages ;
		

The data contained in this file span across 4 dimensions lon, lat, time, nbnds.

lon, lat and time are coordinate variables, annotating / labeling the lon, lat and time dimensions.

air and time_bnds (auxiliary variable?) are the data variables.

The core data is air temperature, first look in to this; spans along lon, lat, time dimensions.

time_bnds, time boundaries...

This
variable `lat(lat)`, the `lat` on the left is the name of this variable, while the bracketed `lat` on the right refers to the dimension `lat`; the variable `lat` is a function of the dimension `lat`, or to say the variable `lat` depends only on the dimension `lat`. As it is, such variable like `lat` is a *coordinate variable*, it annotates / labels the corresponding dimension. A coordinate variable is usually named after the dimension it's related to. In the example, variable `lat`, `lon` and `time` are all coordinate variables.

A coordinate variable must be one-dimensional with the same name as the dimension it's bounded to, has to be monotonically increasing or decreasing, and doesn't allow missing values. If a coordinate variable doesn't fill all these requirements, it is not a coordinate variable but an auxiliary coordinate variable instead. Unlike a coordinate variable, an auxiliary coordinate variable can be multidimensional, and there is no relationship between its name and the name(s) of its dimension(s). It doesn't have to have monotonic or unique values, and it allows missing values as well.

In our example, `time_bnds` is an auxiliary coordinate variable.

The main body of the air temperature data makes up the *data variable* `air`. The variable `air` is the *data variable* that contains the measured air temperature in our example. `air(time, lat, lon)` indicates that this variable is made up of a three-dimensional array, containing air temperature values for a specific space defined by latitudes and longitudes and for multiple times specified by the time coordinates.




In [6]:
ds.time.data

array([1297320., 1297344., 1297368., 1297392., 1297416., 1297440.,
       1297464., 1297488., 1297512., 1297536., 1297560., 1297584.,
       1297608., 1297632., 1297656., 1297680., 1297704., 1297728.,
       1297752., 1297776., 1297800., 1297824., 1297848., 1297872.,
       1297896., 1297920., 1297944., 1297968., 1297992., 1298016.,
       1298040., 1298064., 1298088., 1298112., 1298136., 1298160.,
       1298184., 1298208., 1298232., 1298256., 1298280., 1298304.,
       1298328., 1298352., 1298376., 1298400., 1298424., 1298448.,
       1298472., 1298496., 1298520., 1298544., 1298568., 1298592.,
       1298616., 1298640., 1298664., 1298688., 1298712., 1298736.,
       1298760., 1298784., 1298808., 1298832., 1298856., 1298880.,
       1298904., 1298928., 1298952., 1298976., 1299000., 1299024.,
       1299048., 1299072., 1299096., 1299120., 1299144., 1299168.,
       1299192., 1299216., 1299240., 1299264., 1299288., 1299312.,
       1299336., 1299360., 1299384., 1299408., 1299432., 12994

In [42]:
ds.time_bnds.data

array([[1297320., 1297344.],
       [1297344., 1297368.],
       [1297368., 1297392.],
       [1297392., 1297416.],
       [1297416., 1297440.],
       [1297440., 1297464.],
       [1297464., 1297488.],
       [1297488., 1297512.],
       [1297512., 1297536.],
       [1297536., 1297560.],
       [1297560., 1297584.],
       [1297584., 1297608.],
       [1297608., 1297632.],
       [1297632., 1297656.],
       [1297656., 1297680.],
       [1297680., 1297704.],
       [1297704., 1297728.],
       [1297728., 1297752.],
       [1297752., 1297776.],
       [1297776., 1297800.],
       [1297800., 1297824.],
       [1297824., 1297848.],
       [1297848., 1297872.],
       [1297872., 1297896.],
       [1297896., 1297920.],
       [1297920., 1297944.],
       [1297944., 1297968.],
       [1297968., 1297992.],
       [1297992., 1298016.],
       [1298016., 1298040.],
       [1298040., 1298064.],
       [1298064., 1298088.],
       [1298088., 1298112.],
       [1298112., 1298136.],
       [129813

Variable attributes provide information on individual variables, like what the data under this variable means, which units the data have and so on.
Global attributes provide information on the dataset as a whole, for instance title and description about the dataset. Here, the attribute `Convention` is also given. It indicates which metadata convention is applied to this dataset. A metadata convention standardizes the formulation of the metadata, thus is a crucial component to make a dataset fulfill the [FAIR principle](https://www.nature.com/articles/sdata201618). We will dive deeper into metadata conventions in the next chapter.

In this chapter, we elaborated on the core content structure of a standardized netCDF file. Dimension defined, three types of variables: 
* Coordinate Variable
* Auxiliary Coordinate Variable
* Data Variable

Two parts of attributes:
* Variable Attribute
* Global Attribute
They make up the metadata of a netCDF file.

In the next chapter, we'll discuss about a broadly adopted metadata convention in Earth System Science called CF-Conventions. 

In [32]:
ds.to_dataarray()[1][0][0][0]