# Introduction to the NCAS CF Data Tools, `cf-python` and `cf-plot`

## Context and learning objectives

**What are the NCAS CF Data Tools and why do they all have 'cf' in the name?**

The NCAS CF Data Tools are a suite of Python libraries which are designed to facilitate working with data for research in the earth sciences and aligned domains. The two that are of most relevance to the average user, and those wanting to process, analyse and visualise atmospheric data, are cf-python (https://ncas-cms.github.io/cf-python/) and cf-plot (https://ncas-cms.github.io/cf-plot/build/). We will be focusing on use of cf-python and cf-plot today.

The 'cf' in the names of the NCAS CF Data Tools corresponds to the CF Conventions, a metadata standard, because they are built around this standard in the form of using the CF Data Model, which as well as performance is considered a 'unique selling point' of the tools.


**What are the CF Conventions?**

The CF Conventions are a metadata standard which is becoming the de-facto convention across geoscience to cover the description of data so that sharing and intercomparison is simpler. See https://cfconventions.org/ for more information.


**What are we going to learn in this session?**

Our learning aim is to be able to use the NCAS CF Data Tools Python libraries, namely cf-python and cf-plot to process, analyse and visualise netCDF and PP datasets, whilst appreciating the context and 'unique selling point' of the libraries as being built to use the CF Conventions, a metadata standard for earth science data, to make it simpler to do what you want to do with the datasets, by working on top of a Data Model for CF.

We have six distinct objectives, matching the sections in this notebook (except one section we split into two so we can introduce plotting earlier on). By the end of this lesson you should be familiar and have practiced:

* Using cf-python to read dataset(s) and view the (meta)data at different detail levels;
* Using cf-python to edit the data and write out the edited data to file;
* Reducing datasets using cf-python: subspacing and collapsing (part 1, basic, and part 2, more advanced);
* Visualising datasets using cf-plot: contour and vector plots;
* Data analysis using cf-python and cf-plot: arithmetic, statistics and plots of trends;
* Changing the underlying grid of data using cf-python: regridding.


***

## Setting up

**In this section we set up this Notebook, import the libraries and check the data we will work with, ready to use the libraries within this notebook.**

* Set up notebook for nice outputs in this Jupyter Notebook (not required in interactive Python or a script)

In [4]:
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

* Import cf-python and cf-plot and inspect the versions

In [5]:
import cfplot as cfp
import cf

In [6]:
print("cf-python version is", cf.__version__)
print("cf-plot version is", cfp.__version__)
print("CF Conventions version is", cf.CF())

cf-python version is 3.17.0
cf-plot version is 3.3.0
CF Conventions version is 1.11


* See what datasets we have to explore

In [10]:
# Note that in IPython ! preceeeds a shell command
!ls -1 ../ncas_data

aaaaoa.pmh8dec.pp
alpine_precip_DJF_means.nc
data1.nc
data2.nc
data3.nc
data5.nc
IPSL-CM5A-LR_r1i1p1_tas_n96_rcp45_mnth.nc
land.nc
model_precip_DJF_means_low_res.nc
model_precip_DJF_means.nc
precip_1D_monthly.nc
precip_1D_yearly.nc
precip_2010.nc
precip_DJF_means.nc
qbo.nc
regions.nc
ta.nc
tripolar.nc
ua.nc
u_n216.nc
u_n96.nc
vaAMIPlcd_DJF.nc
va.nc
wapAMIPlcd_DJF.nc


***

## Using cf-python to read dataset(s) and view the (meta)data at different detail levels

**In this section we look at the basic use of cf-python, reading in one or more datasets from file and inspecting the data and the metadata at different levels of detail to suit the amount of information you want to see.**

In [14]:
# Read a chosen data file
fieldlist = cf.read('../ncas_data/data1.nc')

In [15]:
# See the 'fieldlist' that cf-python interprets from the data read in
fieldlist

[<CF Field: long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1>,
 <CF Field: air_temperature(time(1), pressure(23), latitude(160), longitude(320)) K>,
 <CF Field: eastward_wind(time(1), pressure(23), latitude(160), longitude(320)) m s**-1>,
 <CF Field: northward_wind(time(1), pressure(23), latitude(160), longitude(320)) m s**-1>]

In [20]:
# Select a particular field from the fieldlist of interest [TODO explain 'field' concept as cell, with diagram?]
fieldlist[0]

<CF Field: long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1>

In [21]:
f = fieldlist[0]

In [22]:
# Minimal detail
f

<CF Field: long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1>

In [25]:
# Medium level of detail with 'print'
print(f)

Field: long_name=Potential vorticity (ncvar%PV)
-----------------------------------------------
Data            : long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1
Dimension coords: time(1) = [1964-01-21 00:00:00]
                : pressure(23) = [1000.0, ..., 1.0] mbar
                : latitude(160) = [89.14151763916016, ..., -89.14151763916016] degrees_north
                : longitude(320) = [0.0, ..., 358.875] degrees_east


In [26]:
# Maximal detail using 'dump()'
f.dump()

-----------------------------------------------
Field: long_name=Potential vorticity (ncvar%PV)
-----------------------------------------------
Conventions = 'CF-1.7'
_FillValue = 2e+20
date = '21/01/64'
history = 'Sun Sep 16 11:26:16 BST 2012 - CONVSH V1.92 16-February-2006'
long_name = 'Potential vorticity'
missing_value = 2e+20
name = 'PV'
source = 'GRIB data'
time = '00:00'
title = 'Potential vorticity'
units = 'K m**2 kg**-1 s**-1'
valid_max = 0.018913519
valid_min = -0.008174051

Data(time(1), pressure(23), latitude(160), longitude(320)) = [[[[1.3371172826737165e-06, ..., -0.0072057610377669334]]]] K m**2 kg**-1 s**-1

Domain Axis: latitude(160)
Domain Axis: longitude(320)
Domain Axis: pressure(23)
Domain Axis: time(1)

Dimension coordinate: time
    long_name = 't'
    standard_name = 'time'
    time_origin = '21-JAN-1964:00:00:00'
    units = 'days since 1964-01-21 00:00:00'
    Data(time(1)) = [1964-01-21 00:00:00]

Dimension coordinate: pressure
    long_name = 'p'
    positi

In [29]:
# If we want to look at a particular metadata aspect, say latitude
l = f.construct("latitude")

In [30]:
print(l)

latitude(160) degrees_north


In [31]:
l.dump()

Dimension coordinate: latitude
    long_name = 'latitude'
    standard_name = 'latitude'
    units = 'degrees_north'
    Data(160) = [89.14151763916016, ..., -89.14151763916016] degrees_north


In [33]:
# Inspecting data
d = f.data

In [34]:
d

<CF Data(1, 23, 160, 320): [[[[1.3371172826737165e-06, ..., -0.0072057610377669334]]]] K m**2 kg**-1 s**-1>

In [35]:
print(d)

[[[[1.3371172826737165e-06, ..., -0.0072057610377669334]]]] K m**2 kg**-1 s**-1


In [36]:
d.dump()

Data.shape = (1, 23, 160, 320)
Data.first_datum = 1.3371172826737165e-06
Data.last_datum  = -0.0072057610377669334
Data.fill_value = 2e+20
Data.Units = <Units: K m**2 kg**-1 s**-1>


***

## Using cf-python to edit the data and write out the edited data to file

**In this section we demonstrate how to change the data that has been read-in from file, both in terms of the data arrays and the metadata that describes it, and then how to write data back out to file with a chosen name, so that you can see how cf-python can be used to edit data or to make new data.**

***

## Reducing datasets using cf-python (part 1): basic subspacing and collapsing

**In this section we show how multi-dimensional data can be tamed using cf-python so that you can get a reduced form that can be analysed or plotted, by reducing the dimensions by selecting a subset of point(s) along the axes or collapsing down according to some statistic such as the mean or an extrema.**

***

## Visualising datasets using cf-plot: contour and vector plots

**In this section we demonstrate how to plot using cf-plot the data we have read and then processed and/or analysed using cf-python, notably showing how to create contour plots and vector plots as examples of some of the options possible with cf-plot.**

***

## Data analysis using cf-python and cf-plot: arithmetic, statistics and plots of trends

**In this section we demonstrate how to do some data analysis including performing arithmetic and statistical calculations on the data, showing how cf-python's CF Conventions metadata awareness means that the metadata is automatically updated to account for the operations that are performed.**

***

## Reducing datasets using cf-python (part 2): more options for subspacing and collapsing

**In this section we continue our study of means for reduction of multi-dimensional data by subspacing and collapsing, this time using more compound or advanced subspaces such as grouped and weighted collapses and datetime subspacing.**

***

## Changing the underlying grid of data using cf-python: regridding

**In this section we demonstrate how to change the underlying grid of the data to another grid which could be a higher- or lower- resolution one, or a completely different grid, which is called regridding or interpolation, and indicate various options cf-python supports for doing this.**

***

## Conclusion and recap of learning objectives

The NCAS CF Data Tools are a suite of Python libraries which are designed to facilitate working with data for research in the earth sciences and aligned domains. We learnt today about the cf-python (https://ncas-cms.github.io/cf-python/) and cf-plot (https://ncas-cms.github.io/cf-plot/build/). The 'cf' in the names of the NCAS CF Data Tools corresponds to the CF Conventions, a metadata standard becoming the de-facto convention across geoscience to cover the description of data so that sharing and intercomparison is simpler.

Our learning aim was to be able to use the NCAS CF Data Tools Python libraries, namely cf-python and cf-plot to process, analyse and visualise netCDF and PP datasets, whilst appreciating the context and 'unique selling point' of the libraries as being built to use the CF Conventions, a metadata standard for earth science data, to make it simpler to do what you want to do with the datasets, by working on top of a Data Model for CF. We practiced:

* Using cf-python to read dataset(s) and view the (meta)data at different detail levels;
* Using cf-python to edit the data and write out the edited data to file;
* Reducing datasets using cf-python: subspacing and collapsing (part 1, basic, and part 2, more advanced);
* Visualising datasets using cf-plot: contour and vector plots;
* Data analysis using cf-python and cf-plot: arithmetic, statistics and plots of trends;
* Changing the underlying grid of data using cf-python: regridding.


***

## Where to find more information and resources on the NCAS CF Data Tools

Useful links relating to this training:

* cf-python code home (GitHub), including Issue Tracker to report queries or questions: https://github.com/NCAS-CMS/cf-python
* cf-python documentation: https://ncas-cms.github.io/cf-python/
* cf-plot code home (GitHub), including Issue Tracker to report queries or questions: https://github.com/NCAS-CMS/cf-plot
* cf-plot documentation: https://ncas-cms.github.io/cf-plot/build/
* Technical presentation about the NCAS CF Data Tools: https://hps.vi4io.org/_media/events/2020/summer-school-cfnetcdf.pdf
* CF Conventions homepage: https://cfconventions.org/
* CF Conventions training homepage: https://cfconventions.org/Training/
* This training is hosted at: https://github.com/NCAS-CMS/cf-tools-training
* [ TODO add link to ISC course page ]

If you have any questions later, either use the Issue Trackers above or you can email me at: sadie.bartholomew@ncas.ac.uk.

***