# 3. Reducing datasets by subspacing and collapsing

## Practical 3 (of 6) in 'Intro to the NCAS CF Data Tools, cf-python and cf-plot'

**In this section we show how multi-dimensional data can be tamed using cf-python so that you can get a reduced form that can be analysed or plotted, by reducing the dimensions by selecting a subset of point(s) along the axes or collapsing down according to some statistic such as the mean or an extrema.**

***

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> run all of the cells in this section to do the set up.
</div>

## Setting up

**In this section we set up this Notebook, import the libraries and check the data we will work with, ready to use the libraries within this notebook.**

Run some set up for nice outputs in this Jupyter Notebook (not required in interactive Python or a script):

In [None]:
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

Import cf-python and cf-plot:

In [None]:
import cfplot as cfp
import cf

Inspect the versions of cf-python and cf-plot and the version of the CF Conventions those are matched to:

In [None]:
print("cf-python version is:", cf.__version__)
print("cf-plot version is:", cfp.__version__)
print("CF Conventions version is:", cf.CF())

<div class="alert alert-block alert-info">
<i>Note:</i> you can work with data compliant by any other version of the CF Conventions, or without (much) compliance, but the CF Conventions version gives the maximum version that these versions of the tools understand the features of.
</div>

Finally, see what datasets we have to explore:

<div class="alert alert-block alert-info">
<i>Note:</i> in a Jupyter Notebook, '!' preceeeds a shell command, so this is a terminal command and not Python
</div>

In [None]:
!ls ../ncas_data

***

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> now we can start the practical. We will follow the same sectioning as in the teaching notebook, so please consult the notes there in the matching section for guidance and you can also consult the cf-python and cf-plot documentation linked above.
</div>

## 3. Reducing datasets by subspacing and collapsing

### a) Subspacing using metadata conditions

**3.a.1)** Read in the file `ggas2014121200_00-18.nc` which is under `../ncas_data` and save the corresponding FieldList to a variable called `fieldlist_3`. Inspect it with medium level of detail.

In [None]:
fieldlist_3 = cf.read("../ncas_data/ggas2014121200_00-18.nc")
print(fieldlist_3)

**3.a.2)** Extract the field representing the `cloud_area_fraction` to a variable which we will call `cloud_field`. Inspect that also with medium level of detail.

In [None]:
cloud_field = fieldlist_3[4]
print(cloud_field)

**3.a.3)** Save the data of the `cloud_field` to a new variable we will call `cloud_field_data`. Do a `print` on the `shape` method of this to confirm the shape of the data, and compare this to the insepction report from the previous cell to see the same information represented in different ways.

In [None]:
cloud_field_data = cloud_field.data
print(cloud_field_data.shape)

**3.a.4)** Make a subspace of the `cloud_field` from the cells above to subspace on the *first* time point in order. Note: doing date-time subspaces requires an extra step due to the nature of specifying dates and times which can be ambiguous otherwise: you need to wrap a quoted datetime string in the call `cf.dt()` to notify cf-python that you are providing a valid datetime string, e.g. `field1.subspace(time=cf.dt("2020-01-01 12:15:00"))`.

Assign the subspace operation resulting field to a variable `cloud_field_subspace1` and inspect it with medium detail.

*Extra task, for those who have studied section 4 before doing this practical**: make a contour plot of this subspace of the field to see what it looks like.*

In [None]:
cloud_field_subspace1 = cloud_field.subspace(time=cf.dt("2014-12-12 00:00:00"))
print(cloud_field_subspace1)
print(cloud_field_subspace1.data.shape)
# Extra part: cfp.con(cloud_field_subspace1)

**3.a.5** Make a subspace of the `cloud_field` from the cells above to subspace on the *last* point on the latitude axis.

Assign the subspace operation resulting field to a variable `cloud_field_subspace2` and inspect it with medium detail.

*Extra task, for those who have studied section 4 before doing this practical: make a contour plot of this subspace of the field to see what it looks like.*

In [None]:
cloud_field_subspace2 = cloud_field.subspace(latitude=-89.46282196044922)
print(cloud_field_subspace2)
print(cloud_field_subspace2.data.shape)
# Extra part: cfp.con(cloud_field_subspace2)

### b) Subspacing using indexing, including equivalency to the above

**3.b.1)** Take the cloud field from (3.a.2) which we have been subspacing in the previous cells and make a subspace which takes the first time point, leaving all other axes unchanged, but this time do it using indexing. Use the `equals` method of a field to check that the result is the same as that derived from the 'subspacing by metadata' approach in section (3.a.4).

TODO FIX UP UNITS TO EQUALS BECOMES TRUE, WEIRD UNITS CONFUSE CFUNITS

In [None]:
cloud_field_subspace1_by_index = cloud_field[0, :, :, :]
print(cloud_field_subspace1_by_index)
cloud_field_subspace1_by_index.equals(cloud_field_subspace1)
# or with the fields swapped i.e. cloud_field_subspace1.equals(cloud_field_subspace1_by_index) is also correct

**3.b.2)** Now make a subspace on the original `cloud_field`, leaving all other axes unchanged, to subspace on the *last* point on the latitude axis, like before, but this time use subspacing by indexing. Use the `equals` method of a field to check that the result is the same as that derived from the 'subspacing by metadata' approach in section (3.a.4).

In [None]:
cloud_field_subspace2_by_index = cloud_field[:, :, -1, :]
print(cloud_field_subspace2_by_index)
cloud_field_subspace2_by_index.equals(cloud_field_subspace2)
# or with the fields swapped i.e. cloud_field_subspace2.equals(cloud_field_subspace2_by_index) is also correct

**3.b.3)** Using indexing, do both of the subspaces from the previous sub-questions in one call on the original cloud field.

Extra: do the same operation using the 'subspace by metadata' approach and use the `equals` method to show that the results are the same.

In [None]:
cloud_field_subspace2_by_index = cloud_field[0, :, -1, :]
print(cloud_field_subspace2_by_index)
cloud_field_subspace2_by_index.equals(cloud_field_subspace2)

**3.b.4)** Do a single subspace on the original cloud field that takes the first 100 latitude and the first 200 longitude values. Use whichever method (subspacing by metadata, or indexing) you prefer, in order to do so.

In [None]:
cloud_field[:, :, :100, :200]

### c) Statistical collapses

**3.c.1)** Take the original `cloud_field` from (3.a.2) and do a collapse over the time axis to reduce it down to the minimum value. Assign that to the variable name `cloud_field_collapse1`.

In [None]:
cloud_field_collapse1 = cloud_field.collapse("time: minimum")
print(cloud_field_collapse1)

**3.c.2)** Take the original `cloud_field` from (3.a.2) and do a collapse over the latitude axis to reduce it down to the mean value. Assign that to the variable name `cloud_field_collapse2`.

In [None]:
cloud_field_collapse2 = cloud_field.collapse("latitude: mean")
print(cloud_field_collapse2)

**3.c.3)** Take the original `cloud_field` from (3.a.2) and do a collapse over the longitude axis to reduce it down to the maximum value. Assign that to the variable name `cloud_field_collapse3`.

In [None]:
cloud_field_collapse3 = cloud_field.collapse("longitude: maximum")
print(cloud_field_collapse3)

**3.c.4)** Finally, take the original `cloud_field` from (3.a.2) again and do a collapse over all horizontal space via the pair of horizontal spatial axes, latitude and longitude, to reduce them down to the standard deviation value. Assign that to the variable name `cloud_field_collapse4`.

In [None]:
cloud_field_collapse4 = cloud_field.collapse("longitude: latitude: standard_deviation")
print(cloud_field_collapse4)

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> this is the end of the section. Please check your work, review the material and then move on to Practical 4 (see the Notebook 'cf_data_tools_practical_04.ipynb').
</div>

***