# 1. Reading dataset(s) and viewing the (meta)data at different detail levels

## Practical 1 (of 6) in 'Intro to the NCAS CF Data Tools, cf-python and cf-plot'

**In this section we look at a basic use of cf-python, reading in one or more datasets from file and inspecting the data and the metadata at different levels of detail to suit the amount of information you want to see.**

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> throughout the practical Notebooks, these green boxes provide instructions and tips about completing the practical (blue boxes are the same as in the teaching notebook and provide useful information). As guidance and for reference, the following are provided below before the practical material starts:
<ul>
    <li>the context and learning objectives from the main/presented Notebook below - you are advised to re-read this as a reminder;</li>
    <li>a copy of the final section from the main Notebook which provides links to further information - you might find the documentation links especially useful here;</li>
    <li>the note on terminology from section one is included also in the reminder as a guide to terms used throughout - read this if useful.</li>
</ul>
</div>
</div>

## A reminder: context, learning objectives and guidance links

### What are the NCAS CF Data Tools and why do they all have 'cf' in the name?

The _NCAS CF Data Tools_ are a suite of complementary Python libraries which are designed to facilitate working with data for research in the earth sciences and aligned domains. The two that are of most relevance to the average user, and those wanting to process, analyse and visualise atmospheric data, are *cf-python* (https://ncas-cms.github.io/cf-python/) and *cf-plot* (https://ncas-cms.github.io/cf-plot/build/). We will be focusing on use of cf-python and cf-plot today.

The 'cf' in the names of the NCAS CF Data Tools corresponds to the _CF Conventions_, a metadata standard, because they are built around this standard in the form of using the CF Data Model, which as well as performance is considered a 'unique selling point' of the tools.


### What are the CF Conventions?

The _CF Conventions_, usually referred to in this way but also know by the full name of the **C**limate and **F**orecast (CF) metadata conventions, are a metadata standard which is becoming the de-facto convention to cover the description of geoscientific data so that sharing and intercomparison is simpler. See https://cfconventions.org/ for more information.


### What are we going to learn in this session?

Our **learning aim** is to be able to use the NCAS CF Data Tools Python libraries, namely cf-python and cf-plot to process, analyse and visualise netCDF and PP datasets, whilst appreciating the context and 'unique selling point' of the libraries as being built to use the CF Conventions, a metadata standard for earth science data, to make it simpler to do what you want to do with the datasets, by working on top of a Data Model for CF.

We have **six distinct objectives**, matching the sections in this notebook and in the practical notebook you will work through. By the end of this lesson you should be familiar and have practiced using cf-python and cf-plot to:

1. read dataset(s) and view the (meta)data at different detail levels;
2. edit the (meta)data and write out the edited version to file;
3. reduce datasets by subspacing and collapsing;
4. visualise datasets as contour and vector plots;
5. analyse data: applying mathematical and statistical operations and plotting trends;
6. change the underlying grid of data through regridding.

### Guidance: where to find more information and resources on the NCAS CF Data Tools

Here are some links relating to the NCAS CF Data Tools and this training.

* This training, with further material, is hosted online and there are instructions for setting up the environment so you can work through it in your own time: https://github.com/NCAS-CMS/cf-tools-training.
* **The cf-python documentation lives at https://ncas-cms.github.io/cf-python/.**
* The cf-python code lives on GitHub at https://github.com/NCAS-CMS/cf-python. There is an Issue Tracker to report queries or questions at https://github.com/NCAS-CMS/cf-python/issues.
* **The cf-plot documentation lives at https://ncas-cms.github.io/cf-plot/build/.**
* The cf-plot code lives on GitHub at https://github.com/NCAS-CMS/cf-plot. There is an Issue Tracker to report queries or questions at https://github.com/NCAS-CMS/cf-plot/issues.
* There is a technical presentation about the NCAS CF Data Tools avaialble from https://hps.vi4io.org/_media/events/2020/summer-school-cfnetcdf.pdf.
* The website of the CF Conventions can be found at https://cfconventions.org/.
* The landing page for training into the CF Conventions is found here within the website above: https://cfconventions.org/Training/.

If you have any queries after this course, please either use the Issue Trackers linked above or you can email me at: sadie.bartholomew@ncas.ac.uk.

<div class="alert alert-block alert-info">
<i>Note:</i> In cf-python and when discussing related code and datasets, we use terminology from the CF Data Model (for more detail see: <a href="url">https://ncas-cms.github.io/cf-python/cf_data_model.html</a>). For example cf-python methods are named in relation to concepts from this data model. We don't have time to cover this in detail but for this session it is useful to know the following terms:

<ul>
    <li><b>field</b>: a self-contained cf-python object corresponding to a netCDF data variable with all of its (CF) metadata attached;</li>
    <li><b>field list</b>: a list of lields (see above), stored as its own cf-python object 'FieldList' which is similar to a Python list;</li>
    <li><b>coordinate</b>: a (CF) metadata concept which corresponds to netCDF coordinate variables. One or more coordinates are defined on every field as either 'dimension' or 'auxiliary' coordinate objects in cf-python.</li>
</ul>
</div>

***

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> run all of the cells in this section to do the set up.
</div>

## Setting up

**In this section we set up this Notebook, import the libraries and check the data we will work with, ready to use the libraries within this notebook.**

Run some set up for nice outputs in this Jupyter Notebook (not required in interactive Python or a script):

In [None]:
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

Import cf-python and cf-plot:

In [None]:
import cfplot as cfp
import cf

Inspect the versions of cf-python and cf-plot and the version of the CF Conventions those are matched to:

In [None]:
print("cf-python version is:", cf.__version__)
print("cf-plot version is:", cfp.__version__)
print("CF Conventions version is:", cf.CF())

<div class="alert alert-block alert-info">
<i>Note:</i> you can work with data compliant by any other version of the CF Conventions, or without (much) compliance, but the CF Conventions version gives the maximum version that these versions of the tools understand the features of.
</div>

Finally, see what datasets we have to explore:

<div class="alert alert-block alert-info">
<i>Note:</i> in a Jupyter Notebook, '!' preceeeds a shell command, so this is a terminal command and not Python
</div>

In [None]:
!ls ../ncas_data

***

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> now we can start the practical. We will follow the same sectioning as in the teaching notebook, so please consult the notes there in the matching section for guidance and you can also consult the cf-python and cf-plot documentation linked above.
</div>

## 1. Reading dataset(s) and viewing the (meta)data at different detail levels

### a) Reading in data and extracting the _field_ of interest

**1.a.1)** Use `cf` to read in the netCDF dataset `qbo.nc` which is found (as shown at the end of the section above) under the directory `../ncas_data`, assigning it to a variable called 'fieldlist'.


In [None]:
fieldlist = cf.read("../ncas_data/qbo.nc")

**1.a.2)** Use the standard Python function `len` to see how long the read-in fieldlist is.

In [None]:
len(fieldlist)

**1.a.3)** Access the first field in the fieldlist and assign it to the variable name 'field'.

In [None]:
field = fieldlist[0]

### b) Inspecting the _field_ of interest with different amounts of detail

**1.b.1)** View the field from (1.a.3) above in minimal detail.

In [None]:
field

**1.b.2)** Now try viewing the field from (1.a.3) above at a medium detail level.

In [None]:
print(field)

**1.b.3)** OK, finally let's see it in its full glory - with maximal detail. Take a minute or two to compare these outputs and familiarise yourself with the formats of the different views and how they present the metadata (and preview of the data) of a field.

In [None]:
field.dump()

### c) Inspecting a metadata _construct_ e.g. _coordinate_ from the _field_ of interest

**1.c.1)** Let's assume we want to know about a specific metadata construct, in this case we are intereted in the pressure. Assign to a new variable called 'pressure' the pressure coordinate of the field stored in the variable 'field' from section (1a) as just inspected in section (1b).

In [None]:
pressure = field.coordinate("pressure")

**1.c.2)** View this coordinate with minimal detail level.

In [None]:
pressure

**1.c.3)** Now use the standard approach to view it with medium detail level.

In [None]:
print(pressure)

**1.c.4)** Finally, let's use the approach for full detail level and see everything about this coordinate.

In [None]:
pressure.dump()

### d) Inspecting a data array of interest

**1.d.1)** Access the underlying data of the pressure coordinate from the previous sub-section, (1c), assigning it to a variable called 'pressure_data'.

In [None]:
pressure_data = pressure.data

**1.d.2)** Inspect the pressure coordinate data with minimal detail, noticing the units.

In [None]:
pressure_data

**1.d.3)** Access the data array of the pressure coordinate. Note that, because it is small, it is not computationally expensive to access this and similarly with other metadata data arrays, but accessing the underlying data array of the whole field (i.e. its main stored variable) could be intensive because for datasets in real usage the data can be very large and/or multi-dimensional.

In [None]:
pressure_array = pressure_data.array

**1.d.4)** Use the standard Python `print` function to view the pressure array.

In [None]:
print(pressure_array)

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> this is the end of the section. Please check your work, review the material and then move on to Practical 2 (see the Notebook 'cf_data_tools_practical_02.ipynb').
</div>

***