# T1. Reading dataset(s) and viewing the (meta)data at different detail levels

## Teaching Notebook 1 (of 6) for *Intro to the NCAS CF Data Tools, cf-python and cf-plot*

**In this section we look at a basic use of cf-python, reading in one or more datasets from file and inspecting the data and the metadata at different levels of detail to suit the amount of information you want to see.**

***

## Context and learning objectives

### What are the NCAS CF Data Tools and why do they all have 'cf' in the name?

The _NCAS CF Data Tools_ are a suite of complementary Python libraries which are designed to facilitate working with data for research in the earth sciences and aligned domains. The two that are of most relevance to the average user, and those wanting to process, analyse and visualise atmospheric data, are *cf-python* (https://ncas-cms.github.io/cf-python/) and *cf-plot* (https://ncas-cms.github.io/cf-plot/build/). We will be focusing on use of cf-python and cf-plot today.

The 'cf' in the names of the NCAS CF Data Tools corresponds to the _CF Conventions_, a metadata standard, because they are built around this standard in the form of using the CF Data Model, which as well as performance is considered a 'unique selling point' of the tools.


### What are the CF Conventions?

The _CF Conventions_, usually referred to in this way but also know by the full name of the **C**limate and **F**orecast (CF) metadata conventions, are a metadata standard which is becoming the de-facto convention to cover the description of geoscientific data so that sharing and intercomparison is simpler. See https://cfconventions.org/ for more information.


### What are we going to learn in this session?

Our **learning aim** is to be able to use the NCAS CF Data Tools Python libraries, namely cf-python and cf-plot to process, analyse and visualise netCDF and PP datasets, whilst appreciating the context and 'unique selling point' of the libraries as being built to use the CF Conventions, a metadata standard for earth science data, to make it simpler to do what you want to do with the datasets, by working on top of a Data Model for CF.

We have **six distinct objectives**, matching the sections in this notebook and in the practical notebook you will work through. By the end of this lesson you should be familiar and have practiced using cf-python and cf-plot to:

1. read dataset(s) and view the (meta)data at different detail levels;
2. edit the (meta)data and write out the edited version to file;
3. reduce datasets by subspacing and collapsing;
4. visualise datasets as contour and vector plots;
5. analyse data: applying mathematical and statistical operations and plotting trends;
6. change the underlying grid of data through regridding.

<div class="alert alert-block alert-info">
<i>Note:</i> much of what you can do with cf-python you can do with the xarray library. Use whichever approach, the cf-python/cf-plot way, or the xarray way, works best for you! However, we want to emphasise that the NCAS CF Data Tools are built around the CF Conventions whereas xarray is not, so cf-python and cf-plot offer better metadata awareness to xarray, which could be a core advantage to our approach for users in/from geoscience. (If you have suggestions for how we can improve cf-python and/or cf-plot for you or your work, please let us know through the Issue Trackers linked at the end of this Notebook.)
</div>

***

## Where to find more information and resources on the NCAS CF Data Tools

Here are some links relating to the NCAS CF Data Tools and this training. The **first two are the official documentation pages** which could be useful to consult which doing the practicals, if you would like further information or if you get stuck:

* **The cf-python documentation lives at https://ncas-cms.github.io/cf-python/.**
* **The cf-plot documentation lives at https://ncas-cms.github.io/cf-plot/build/.**
* This training, with further material, is hosted online and there are instructions for setting up the environment so you can work through it in your own time: https://github.com/NCAS-CMS/cf-tools-training.
* The cf-python code lives on GitHub at https://github.com/NCAS-CMS/cf-python. There is an Issue Tracker to report queries or questions at https://github.com/NCAS-CMS/cf-python/issues.
* The cf-plot code lives on GitHub at https://github.com/NCAS-CMS/cf-plot. There is an Issue Tracker to report queries or questions at https://github.com/NCAS-CMS/cf-plot/issues.
* There is a technical presentation about the NCAS CF Data Tools avaialble from https://hps.vi4io.org/_media/events/2020/summer-school-cfnetcdf.pdf.
* The website of the CF Conventions can be found at https://cfconventions.org/.
* The landing page for training into the CF Conventions is found here within the website above: https://cfconventions.org/Training/.

If you have any queries after this course (or during, if not being taught by us in-person where you can ask us questions there and then), please either use the Issue Trackers linked above or you can email Sadie at: sadie.bartholomew@ncas.ac.uk.

***

## Setting up

**In this section we set up this Notebook, import the libraries and check the data we will work with, ready to use the libraries within this notebook.**

Run some set up for nice outputs in this Jupyter Notebook (not required in interactive Python or a script):

In [None]:
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

Import cf-python and cf-plot:

In [None]:
import cfplot as cfp
import cf

Inspect the versions of cf-python and cf-plot and the version of the CF Conventions those are matched to:

In [None]:
print("cf-python version is:", cf.__version__)
print("cf-plot version is:", cfp.__version__)
print("CF Conventions version is:", cf.CF())

<div class="alert alert-block alert-info">
<i>Note:</i> you can work with data compliant by any other version of the CF Conventions, or without (much) compliance, but the CF Conventions version gives the maximum version that these versions of the tools understand the features of.
</div>

Finally, see what datasets we have to explore:

<div class="alert alert-block alert-info">
<i>Note:</i> in a Jupyter Notebook, '!' precedes a shell command, so this is a terminal command and not Python
</div>

In [None]:
!ls ../ncas_data

***

## 1. Reading dataset(s) and viewing the (meta)data at different detail levels

<div class="alert alert-block alert-info">
<i>Note:</i> In cf-python and when discussing related code and datasets, we use terminology from the CF Data Model (for more detail see: <a href="url">https://ncas-cms.github.io/cf-python/cf_data_model.html</a>). For example cf-python methods are named in relation to concepts from this data model. We don't have time to cover this in detail but for this session it is useful to know the following terms:

<ul>
    <li><b>field</b>: a self-contained cf-python object corresponding to a netCDF data variable with all of its (CF) metadata attached;</li>
    <li><b>field list</b>: a list of lields (see above), stored as its own cf-python object 'FieldList' which is similar to a Python list;</li>
    <li><b>coordinate</b>: a (CF) metadata concept which corresponds to netCDF coordinate variables. One or more coordinates are defined on every field as either 'dimension' or 'auxiliary' coordinate objects in cf-python.</li>
</ul>
</div>

The examples from this section should help you to familiarise yourself with these terms and their practical usage.

### a) Reading in data and extracting the _field_ of interest

Read a chosen data file. Sometimes datasets have descriptive names but this one doesn't, so let's find out what it is!

In [None]:
fieldlist = cf.read("../ncas_data/data1.nc")

See the 'fieldlist' that cf-python interprets from the data read in:

In [None]:
fieldlist

Select a particular field from the fieldlist of interest:

In [None]:
field = fieldlist[0]

### b) Inspecting the _field_ of interest with different amounts of detail

View the field with **minimal detail**, i.e. a one-line summary:

In [None]:
field

Or you can view it with a **medium level of detail** with the Python built-in `print` function:

In [None]:
print(field)

A final option is to view it with **maximal detail** using the `dump()` method:

In [None]:
field.dump()

### c) Inspecting a metadata _construct_ e.g. _coordinate_ from the _field_ of interest

Use the same approach to view a particular metadata aspect, for example the latitude coordinate:

In [None]:
lat = field.coordinate("latitude")

In [None]:
lat

In [None]:
print(lat)

In [None]:
lat.dump()

### d) Inspecting a data array of interest

Likewise, the same approach works to view the data itself in the field (i.e. the underlying arrays). First, grab the data from the field with the `data` attribute:

In [None]:
data = field.data

Then view it in a chosen level of detail as with the above objects:

In [None]:
data

In [None]:
print(data)

In [None]:
data.dump()

If you want to see more of the data array itself, you can access it with the `array` attribute. Beware, for real-life datasets:

* this will be large and Python will likely truncate it so your screen isn't spammed with sub-arrays of values!
* it is computationally intensive to access the underlying data array if it is large, especially if it is multi-dimensional, so your computer will often have to work hard to get the array, so use the `array` method sparingly (only when needed)!

In [None]:
data.array

***