# T2. Editing the (meta)data and writing out the edited version to file

## Teaching Notebook 2 (of 6) for *Intro to the NCAS CF Data Tools, cf-python and cf-plot*

**In this section we demonstrate how to change the data that has been read-in from file, both in terms of the data arrays and the metadata that describes it, and then how to write data back out to file with a chosen name, so that you can see how cf-python can be used to edit data or to make new data.**

***

## Setting up

**In this short prelude we set up this Notebook, import the libraries and check the data we will work with, ready to use the libraries and the data (exactly as per the first Notebook setup but in one cell only for quick execution).**

In [27]:
# Set up for inline plots - only needed inside a Notebook environment - and to ignore some repeating warnings
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

# Import the two CF Data Tools libraries and inspect the versions
import cfplot as cfp
import cf
print("--- Version report: ---")
print("cf-python version is:", cf.__version__)
print("cf-plot version is:", cfp.__version__)
print("CF Conventions version is:", cf.CF())

# See what datasets we have to explore within the data directory we use throughout this course
print("--- Datasets available from the path '../ncas_data': ---")
# Note that in a Jupyter Notebook, '!' precedes a shell command - so this is a command, not Python
!ls ../ncas_data

--- Version report: ---
cf-python version is: 3.18.1
cf-plot version is: 3.4.0
CF Conventions version is: 1.12
--- Datasets available from the path '../ncas_data': ---
160by320griddata.nc			   precip_2010.nc
aaaaoa.pmh8dec.pp			   precip_DJF_means.nc
alpine_precip_DJF_means.nc		   qbo.nc
data1.nc				   regions.nc
data1-updated.nc			   rgp.nc
data2.nc				   sea_currents_backup.nc
data3.nc				   sea_currents.nc
data5.nc				   ta.nc
ggas2014121200_00-18.nc			   tripolar.nc
IPSL-CM5A-LR_r1i1p1_tas_n96_rcp45_mnth.nc  two_fields.nc
land.nc					   ua.nc
model_precip_DJF_means_low_res.nc	   u_n216.nc
model_precip_DJF_means.nc		   u_n96.nc
n2o_emissions.nc			   vaAMIPlcd_DJF.nc
POLCOMS_WAM_ZUV_01_16012006.nc		   va.nc
precip_1D_monthly.nc			   wapAMIPlcd_DJF.nc
precip_1D_yearly.nc


***

## 2. Editing the (meta)data and writing out the edited version to file

Using the same data file from the previous section, let's say we want to change the data and metadata of this. As-is the field and its data are:

In [5]:
# Required from Step 1
fieldlist = cf.read("../ncas_data/data1.nc")
print("Field List is:\n\n", fieldlist)
field = fieldlist[0]
print("Field is:\n\n", field)
data = field.data
print("Data is:\n\n", data)

Field List is:

 [<CF Field: long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1>,
 <CF Field: air_temperature(time(1), pressure(23), latitude(160), longitude(320)) K>,
 <CF Field: eastward_wind(time(1), pressure(23), latitude(160), longitude(320)) m s**-1>,
 <CF Field: northward_wind(time(1), pressure(23), latitude(160), longitude(320)) m s**-1>]
Field is:

 Field: long_name=Potential vorticity (ncvar%PV)
-----------------------------------------------
Data            : long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1
Dimension coords: time(1) = [1964-01-21 00:00:00]
                : pressure(23) = [1000.0, ..., 1.0] mbar
                : latitude(160) = [89.14151763916016, ..., -89.14151763916016] degrees_north
                : longitude(320) = [0.0, ..., 358.875] degrees_east
Data is:

 [[[[1.3371172826737165e-06, ..., -0.0072057610377669334]]]] K m**2 kg**-1 s**-1


### a) Changing the underlying data

To change the data, use assignment to the relavant index or indices. For example, to change all values we can use the special index of an ellipsis like so, in this case changing them all to an identical scalar value:

In [6]:
data[...] = 10.0
data

<CF Data(1, 23, 160, 320): [[[[10.0, ..., 10.0]]]] K m**2 kg**-1 s**-1>

In [7]:
print(data)

[[[[10.0, ..., 10.0]]]] K m**2 kg**-1 s**-1


Or could change more specifically just one sub-array of these to a different value

In [8]:
data[0, 0, 0] = 3.0
data.array

array([[[[ 3.,  3.,  3., ...,  3.,  3.,  3.],
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         ...,
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.]],

        [[10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         ...,
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.]],

        [[10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         ...,
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.]],

        ...,

        [[10., 10., 10., ..., 10., 10., 10.],
         [10., 10., 10., ..., 10., 10., 10.],
         [10., 10.

Instead of setting the whole sub-array to one value, you can set the whole array to your precise specification, for example:

In [9]:
data[0, 0, 0] = range(320)
data.array

array([[[[  0.,   1.,   2., ..., 317., 318., 319.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         ...,
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.]],

        [[ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         ...,
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.]],

        [[ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         ...,
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.],
         [ 10.,  10.,  10., ...,  10.,  10.,  10.]],

        ...,

  

### b) Changing some metadata

To change metadata, first get the metadata you want to change as an object. One of the most flexible ways to do so is to use the `construct` method and as an argument specify the name of the coordinate you are interested in:

In [10]:
pressure = field.construct("pressure")

In [11]:
print(pressure)
print(pressure.data)

pressure(23) mbar
[1000.0, ..., 1.0] mbar


You can inspect the units specifically using the `units` attribute:

In [12]:
print(pressure.units)

mbar


Let's change the units to an equivalent but different unit, the `bar` (out by a factor of 1000), as an example:

In [13]:
pressure.units = "bar"
print(pressure.units)

bar


Notice how the data has been converted to account for the new units - cf-python's metadata awareness makes contextual changes like this so we don't have to do it manually!

In [14]:
print(pressure.data)

[1.0, ..., 0.0010000000474974513] bar


Note how the pressure units are changed in the field too, since we edited the same object in a Pythonic sense:

In [15]:
print(field)

Field: long_name=Potential vorticity (ncvar%PV)
-----------------------------------------------
Data            : long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1
Dimension coords: time(1) = [1964-01-21 00:00:00]
                : pressure(23) = [1.0, ..., 0.0010000000474974513] bar
                : latitude(160) = [89.14151763916016, ..., -89.14151763916016] degrees_north
                : longitude(320) = [0.0, ..., 358.875] degrees_east


### c) Writing a (list of) fields out to a file

We changed some metadata (units) and the data itself from our dataset read-in from file. Let's write the new data out
as a new file and read it back in to show that it has been changed relative to the original. You write files out to disk using the `write` function with an argument giving the path, including the name (it can _just_ be the name to write a file to the current working directory), you want to create the file to:

In [16]:
cf.write(field, "../ncas_data/data1-updated.nc")

See that it was written out to the directory we specified:

<div class="alert alert-block alert-info">
<i>Note:</i> in a Jupyter Notebook, '!' preceeeds a shell command, so this is a terminal command and not Python
</div>

In [17]:
!ls ../ncas_data

160by320griddata.nc			   precip_2010.nc
aaaaoa.pmh8dec.pp			   precip_DJF_means.nc
alpine_precip_DJF_means.nc		   qbo.nc
data1.nc				   regions.nc
data1-updated.nc			   rgp.nc
data2.nc				   sea_currents_backup.nc
data3.nc				   sea_currents.nc
data5.nc				   ta.nc
ggas2014121200_00-18.nc			   tripolar.nc
IPSL-CM5A-LR_r1i1p1_tas_n96_rcp45_mnth.nc  two_fields.nc
land.nc					   ua.nc
model_precip_DJF_means_low_res.nc	   u_n216.nc
model_precip_DJF_means.nc		   u_n96.nc
n2o_emissions.nc			   vaAMIPlcd_DJF.nc
POLCOMS_WAM_ZUV_01_16012006.nc		   va.nc
precip_1D_monthly.nc			   wapAMIPlcd_DJF.nc
precip_1D_yearly.nc


To check it wrote out the edited version from this Notebook, we can read the file back in and inspect it again:

In [18]:
updated_fieldlist = cf.read("../ncas_data/data1-updated.nc")
reread_field = updated_fieldlist[0]

See what `g` is by medium detail inspection:

In [19]:
print(reread_field)
print(reread_field.data)

Field: long_name=Potential vorticity (ncvar%PV)
-----------------------------------------------
Data            : long_name=Potential vorticity(time(1), pressure(23), latitude(160), longitude(320)) K m**2 kg**-1 s**-1
Dimension coords: time(1) = [1964-01-21 00:00:00]
                : pressure(23) = [1.0, ..., 0.0010000000474974513] bar
                : latitude(160) = [89.14151763916016, ..., -89.14151763916016] degrees_north
                : longitude(320) = [0.0, ..., 358.875] degrees_east
[[[[0.0, ..., --]]]] K m**2 kg**-1 s**-1


Notice the pressure coordinate units are 'bar' as per our change and the first data array item starts with `0.0` and the final one ends with `10.0` as per our change.

***