# CF Conventions 101

In the previous chapter, we learned about the most used data format in earth system science, the netCDF file and its core centent structure. It is clear that metadata are essential to make a netCDF file self-described. In this chapter, we are going to learn about a widely implemented metadata standard for netCDF data format, the **Climate Forecast Conventions (CF Conventions)**. We are interested in a metadata standard is mostly because it makes scientific datasets more findable, accessible, interoperable, and reusable (the FAIR principle).

The CF Conventions is not the only metadata standard for netCDF files. Its accessor, the COARDS Conventions, is for example a widely implemented metadata standard too. Some big data providers also have their own metadata standards. However, with good compatibility and flexibility, the CF Conventions is gaining more popularity. For instance, the COARDS Conventions places rigid restriction on the order of dimensions while the CF Conventions doesn't. And as a successor of the COARDS Conventions, the CF Conventions is *backward compatible* with COARDS, which means programs that can process CF conforming datasets should likely be able to process COARDS conforming datasets too.

Now let's take a closer look at how the CF Conventions formulates the metadata in a netCDF file with an example [dataset](https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc), containing sea surface temperatures collected by [PCMDI](https://en.wikipedia.org/wiki/Program_for_Climate_Model_Diagnosis_and_Intercomparison).

In [2]:
import xarray as xr

ds = xr.open_dataset("/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/tos_O1_2001-2002.nc",
                     decode_cf=False)
ds.info()

xarray.Dataset {
dimensions:
	lon = 180 ;
	bnds = 2 ;
	lat = 170 ;
	time = 24 ;

variables:
	float64 lon(lon) ;
		lon:standard_name = longitude ;
		lon:long_name = longitude ;
		lon:units = degrees_east ;
		lon:axis = X ;
		lon:bounds = lon_bnds ;
		lon:original_units = degrees_east ;
	float64 lon_bnds(lon, bnds) ;
	float64 lat(lat) ;
		lat:standard_name = latitude ;
		lat:long_name = latitude ;
		lat:units = degrees_north ;
		lat:axis = Y ;
		lat:bounds = lat_bnds ;
		lat:original_units = degrees_north ;
	float64 lat_bnds(lat, bnds) ;
	float64 time(time) ;
		time:standard_name = time ;
		time:long_name = time ;
		time:units = days since 2001-1-1 ;
		time:axis = T ;
		time:calendar = 360_day ;
		time:bounds = time_bnds ;
		time:original_units = seconds since 2001-1-1 ;
	float64 time_bnds(time, bnds) ;
	float32 tos(time, lat, lon) ;
		tos:standard_name = sea_surface_temperature ;
		tos:long_name = Sea Surface Temperature ;
		tos:units = K ;
		tos:cell_methods = time: mean (interval: 30 

In the CF Conventions, metadata are divided in "required" and "optional".

For all kinds of variables, `units` and at least one of `long_name` and `standard_name` is required. Both `long_name` and `standard_name` label a variable; `standard_name` must be a controlled vocabulary as defined in the ["CF Standard Name Table"](https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html), while `long_name` allows data providers to compose a definition. If a variable has a `standard_name`, its `units` can be looked up in the "CF Standard Name Table" as well. 

As you can see, the example dataset contains three coordinate variables: `lon`, `lat`, and `time`. As regulated by the CF Conventions, a variable representing the longitude must literally have "longitude" as `standard_name` and "degrees_east" as `units`; similarly, the `standard_name` and `units` for a latitude variable must be "latitude" and "degrees_north". As for the time variable, it must have "time" as `standard_name` and a string similar to the form of "[time-interval] since YYYY-MM-DD hh:mm:ss" as `units`, e.g. in our example the unit of the time variable is `seconds since 2001-1-1`. "seconds", "minutes", "hours", and "days" are the most commonly used time intervals; it is not recommended to use "months" or "years" as the length of these time intervals can vary.



In case a variable is not pre-defined in the "CF Standard Name Table", data providers need to define it in `long_name` and provide `units` that's parsable by the UDUNITS library.

## Another Chapter

Here is a [reference to the intro](intro.md).

Here is a reference to the [previous chapter](section-1).