
# <b>Tutorial 1: Accessing and exploring CSSP China 20CR datasets</b>



## Learning Objectives:

1. How to load data into Xarrays format
2. How to convert the data xarrays into iris cube format
3. How to perform basic cube operations 

## Contents

1. [Use Xarray to access monthly data](#access_zarr) <b>
2. [Retrieve single (or list of) variables](#get_vars)
3. [Convert datasets to iris cube](#to_iris)
4. [Explore cube attributes and coordinates](#explore_iris)
5. [Exercises](#exercise)

<div class="alert alert-block alert-warning">
<b>Prerequisites</b><br> 
- Basic programming skills in python<br>
- Familiarity with python libraries Numpy and Matplotlib<br>
- Basic understanding of climate data<br>
</div>

___

## 1. Use Xarray to access monthly data<a id='access_zarr'></a>

### 1.1 Import libraries.
Import the necessary libraries. Current datasets are in zarr format, we need zarr and xarray libraries to access the data

In [None]:
import sys
import numpy as np
import xarray as xr
import zarr

sys.path.append(os.path.abspath("../"))
from scripts.xarray_iris_coord_system import XarrayIrisCoordSystem as xics

xi = xics()
xr.set_options(
    display_style="text"
)  # Work around for AML bug that won't display HTML output.

### 1.2 Set up authentication for the Azure blob store

The data for this course is held online in an Azure Blob Storage Service. To access this we use a SAS (shared access signature).  You should have been given the credentials for this service before the course, but if not please ask your instructor. We use the getpass module here to avoid putting the token into the public domain. Run the cell below and in the box enter your SAS and press return. This will store the password in the variable SAS.

In [None]:
import getpass

# SAS WITHOUT leading '?'
SAS = getpass.getpass()

We now use the Zarr library to connect to this storage. This is a little like opening a file on a local file system but works without downloading the data. This makes use of the Azure Blob Storage service. The zarr.ABStore method returns a zarr.storage.ABSStore object which we can now use to access the Zarr data in the same way we would use a local file. If you have a Zarr file on a local file system you could skip this step and instead just use the path to the Zarr data below when opening the dataset.

In [None]:
store = zarr.ABSStore(
    container="metoffice-20cr-ds",
    prefix="monthly/",
    account_name="metdatasa",
    blob_service_kwargs={"sas_token": SAS},
)
type(store)

### 1.3 Read monthly data
A Dataset consists of coordinates and data variables. Let's use the xarray's **open_zarr()** method to read all our zarr data into a dataset object and display it's metadata

In [None]:
# use the open_zarr() method to read in the whole dataset metadata
dataset = xr.open_zarr(store)
# print out the metadata
dataset

<div class="alert alert-block alert-info">
<b>Note:</b> The dataset lists coordinates and data variables.
</div>


We can also access and print list of all the variables in our dataset

In [None]:
# display all the variables in our dataset
dataset.data_vars

---

## 2. Retrieve single (or list of) variables<a id='get_vars'></a>

### 2.1 Read mean air temperature at 2 m 
Access and print just a single variable i.e minumum air temperature at 2m


<div class="alert alert-block alert-info">
<b>Note:</b> The DataArrays in our dataset can be accessed either as attributes or indexed by name
</div>

In [None]:
# Access the variable by indexing it with its name
t2m_mean = dataset["air_temperature_mean"]
# print the metadata
t2m_mean

In [None]:
# Access the variable like an attribute
t2m_mean = dataset.air_temperature_mean
# print the metadata
t2m_mean

### 2.2 Read list of variables 
We can also create a smaller dataset containing a subset of our variables

In [None]:
# creating a list containing a subset of our variables
varlist = [
    "relative_humidity_mean",
    "relative_humidity_at_pressure_mean",
    "specific_humidity",
    "surface_temperature",
]

# extracting the list of variables from dataset
mini_ds = dataset[varlist]

# print the metadata
mini_ds

<div class="alert alert-block alert-success">
    <b>Task:</b><br><ul>
        <li>Access "cloud_area_fraction" using both index and attribute method in the cell below and save it in varaible named **caf**</li>
        <li>Create a dataset **pres_ds** containing all the pressure variables, <i>(hint: use for loop)</i></li>
    </ul>
</div>

In [None]:
# Retrieve "cloud_area_fraction"
caf = dataset["cloud_area_fraction"]
# print the metadata
caf

In [None]:
# Retrieve all the pressure variables
pres_vars = [name for name in dataset.data_vars if "pressure" in name]
pres_ds = dataset[pres_vars]
pres_ds

---

## 3. Convert datasets to iris cube<a id='to_iris'></a>

### 3.1 Convert a variable to an Iris Cube
We now convert the minimum air temperature variable that we accessed in section 2.1 into iris cube. This can be done simply using the method **DataArray.to_iris()**.


In [None]:
# Use the method to_iris() to convert the xarray data array into an iris cube
cube_t2m_mean = t2m_mean.to_iris()
cube_t2m_mean

### Task 3.2 Convert whole Dataset to an Iris Cubelist
Instead of converting all variables one by one into iris cube one by one, we can convert the whole dataset (or a subset of dataset) into an iris cubelist

<div class="alert alert-block alert-info">
<b>Note:</b> This is not as simple as done for single variable above but it is straightforward with the <b>dataset.apply()</b> method, obviousely will take a bit longer to complete!
</div>

In [None]:
# first import the Iris library
import iris

In [None]:
# create an empty list to hold the iris cubes
cubelist = iris.cube.CubeList([])

# use the DataSet.apply() to convert the dataset to Iris Cublelist
dataset.apply(lambda da: cubelist.append(xi.to_iris(da)))

# print out the cubelist
cubelist

<div class="alert alert-block alert-info">
    <b>Note:</b> By clicking on any variable above, you can see its dimension coordinates and matadata
</div>

</pre><div class="alert alert-block alert-success">
    <b>Task:</b><br><ul>
        <li>convert <b>caf</b> variable into iris cube **caf_cube**</li>
        <li>create a cube list containing pressure variables only</li>
        <li>Can you note the difference between cube and cubelist?</li>
    </ul>
</div>

In [None]:
## convert caf into iris cube
caf_cube = xi.to_iris(caf)
caf_cube

In [None]:
## convert pressure dataset into iris cube list
pres_cubelist = iris.cube.CubeList([])
pres_ds.apply(lambda da: pres_cubelist.append(xi.to_iris(da)))

pres_cubelist

___

## 4. Explore cube attributes and coordinates<a id='explore_iris'></a>

### 4.1 Accessing cube from cubelist
Now that we have our variables in cubelist we can extract any varaible using the variable name. For instance the following code indices for **precipitation_flux** variable.

In [None]:
# lets load and print the Precipitation Flux variable
precipitation_cube = cubelist.extract_strict("precipitation_flux")
precipitation_cube

<div class="alert alert-block alert-info">
<b>Note:</b> We can see that we  have <b>time</b>, <b>grid_latitude</b> and <b>grig_longitude</b> dimensions, and a cell method of <b>mean</b>: time (1 hour) which means that the cube contains monthly mean Precipitation Flux data.
</div>


### 4.2 Cube attributes
We can explore the cube information further

In [None]:
# we can print its shape
precipitation_cube.shape

In [None]:
# we can print its dimensions
precipitation_cube.ndim

In [None]:
# we can print all of the data values (takes a bit of time as it is a large dataset!)
precipitation_cube.data

In [None]:
# We can also print the maximum, minimum and mean value in data
print("Maximum value: ", precipitation_cube.data.max())
print("Minimum value: ", precipitation_cube.data.min())
print("Mean value: ", precipitation_cube.data.mean())

In [None]:
# we can print cube's name
precipitation_cube.name()

In [None]:
# we can print the unit of data
precipitation_cube.units

In [None]:
# we can also print cube's general attributes
precipitation_cube.attributes

### 4.3 Rename the cube
Rename the precipitation_flux cube

<div class="alert alert-block alert-info">
<b>Note:</b> The <b>name</b>, <b>standard_name</b>, <b>long_name</b> and to an extent <b>var_name</b> are all attributes to describe the phenomenon that the cube represents.
    
<b>standard_name</b> is restricted to be a CF standard name (see the <a href="http://cfconventions.org/standard-names.html">CF standard name table</a>).  

If there is not a suitable CF standard name, <b>cube.standard_name</b> is set to <b>None</b> and the <b>long_name</b> is used instead.  
<b>long_name</b> is less restrictive and can be set to be any string.
</div>

In [None]:
print(precipitation_cube.standard_name)
print(precipitation_cube.long_name)
print(precipitation_cube.var_name)
print(precipitation_cube.name())

In [None]:
# changing the cube name to 'pflx' using "rename" method
precipitation_cube.rename("pflx")

In [None]:
print(precipitation_cube.standard_name)
print(precipitation_cube.long_name)
print(precipitation_cube.var_name)
print(precipitation_cube.name())

We see that standard_name and var_name are not set to be a non CF standard name, they are changed to None and long_name is renamed as pflx instead. The cube.name() method first tries standard_name, then ‘long_name’, then ‘var_name’, then the STASH attribute before falling back to the value of default (which itself defaults to ‘unknown’).

We can also rename the specific name of the cube. Suppose if we only want to change standard_name.

In [None]:
precipitation_cube.standard_name = "precipitation_flux"

In [None]:
print(precipitation_cube.standard_name)
print(precipitation_cube.long_name)
print(precipitation_cube.var_name)
print(precipitation_cube.name())

Similarly, we can change long_name, var_name, and name without using rename method

### 4.3 Change the cube units
Change precipitation_cube units from kg m-2 s-1 to kg m-2 day-1

<div class="alert alert-block alert-info">
<b>Note:</b> The units attribute on a cube tells us the units of the numbers held in the data array. To convert to 'kg m-2 day-1', we could just multiply the raw data by 86400 seconds, but a clearer way is to use the <b>convert_units()</b> method with the name of the units we want to convert the data into. It will automatically update the data array.
</div>

In [None]:
# inspect the current unit and maximum data value
print(precipitation_cube.units)
print(precipitation_cube.data.max())

In [None]:
# convert the units to 'mm day-1' using convert_units method
precipitation_cube.convert_units("kg m-2 day-1")

In [None]:
# inspect the current unit and maximum data value after the conversion
print(precipitation_cube.units)
print(precipitation_cube.data.max())

### 4.4 Add or remove the attributes
In section 4.2 we see how to access the cube attributes. In this section we will try to add or remove the attributes 

Let's try to add new attribute to the precipitation_flux. 
We want to keep the information of original units of the cube. Best way is to add this information in the attribute.
Define the new attribute as a key value pair and we can add the attribute using **update** method.

In [None]:
# defining new attribute
new_attr = {"original_units": "kg m-2 s-1"}

In [None]:
# List the attibutes
precipitation_cube.attributes

In [None]:
# add new attribute using .update() method
precipitation_cube.attributes.update(new_attr)

# now printing the attributes list to see if new attribute has updated
precipitation_cube.attributes

So, we got 'original_units' in attributes list. 

We can also delete any specific attribute. For example, in our precipitation_cube attributes list, we do not need 'source' and we can think of deleting it. 

In [None]:
del precipitation_cube.attributes["source"]
precipitation_cube.attributes

### 4.5 Accessing cube coordinates
Access cube's coordinates and explore coordinates attribute

<div class="alert alert-block alert-info">
<b>Note:</b> 
    <ul>
        <li>Cubes need coordinate information to help us describe the underlying phenomenon. Typically a cube's coordinates are accessed with the coords or coord methods. The latter must return exactly one coordinate for the given parameter filters, where the former returns a list of matching coordinates.</li>
        <li>The coordinate interface is very similar to that of a cube. The attributes that exist on both cubes and coordinates are: <b>standard_name</b>, <b>long_name</b>, <b>var_name</b>, <b>units</b>, <b>attributes</b> and <b>shape</b>. </li>
        <li>Coordinate does not have data, instead it has points and bounds (bounds may be None), so we can access the actual point data</li>
    </ul>    

</div>

In [None]:
# let's print out all cube's coordinates
print([coord.name() for coord in precipitation_cube.coords()])

In [None]:
# let's access the 'grid_latitude' coordinate and print out the last 10 values
grid_latitude = precipitation_cube.coord("grid_latitude")
grid_latitude

In [None]:
# print the maximum and minimum value of 'grid_latitude' coordinate
print(grid_latitude.points.max())
print(grid_latitude.points.min())

<div class="alert alert-block alert-success">
    <b>Task:</b><br><ul>
        <li> Inspect the following attributes of <b>caf_cube</b> you created in previous task</li>
            <ul>
                <li>name (standard_name)</li>
                <li>Number of dimensions (ndim)</li>
                <li>units</li>
                <li>mean of data</li>
            </ul>
        <li> Print all the coordinates of <b>caf_cube</b>, <i>(hint: use for loop)</i></li>
        <li> Explore attributes of "grid_latitude"</i></li>
           <ul>
                <li>name (standard_name)</li>
                <li>shape</li>
                <li>units</li>
           </ul>
    
    
</div>

In [None]:
## Inspect attributes
print(caf_cube.standard_name)
print(caf_cube.ndim)
print(caf_cube.units)
print(caf_cube.data.mean())

In [None]:
## Inspect coordinates
print([coord.name() for coord in caf_cube.coords()])

lat = caf_cube.coord("grid_latitude")

print(lat.standard_name)
print(lat.shape)
print(lat.units)

___

## 5. Exercise<a id='exercise'></a>

In this exercise we will explore the variables and attributes of hourly data.

### Exercise 1: Load hourly data
Load hourly data into xarrays and display all variables


In [None]:
store = zarr.ABSStore(
    container="metoffice-20cr-ds",
    prefix="hourly/",
    account_name="metdatasa",
    blob_service_kwargs={"sas_token": SAS},
)
type(store)

dataset = xr.open_zarr(store)
# print out the metadata
dataset

### Exercise 2: Convert to iris cublist
Convert the dataset into iris cublist and display the cubelist


In [None]:
cubelist = iris.cube.CubeList([])
dataset.apply(lambda da: cubelist.append(xi.to_iris(da)))

cubelist

### Exercise 3: Extract variable
Extract x_wind variable from cubelist and display the cube

In [None]:
xwind = cubelist.extract_strict("x_wind")
xwind

### Exercise 4: Explore cube attributes 
Using the Iris cube in previous excercise explore its attributes as follow:
- print out the number of dimensions
- print out its shape
- print out its coordinates names
- print out the maximum and minimum values of latitude and longitude


In [None]:
xwind.ndim

In [None]:
xwind.shape

In [None]:
coords = [coord.name() for coord in xwind.coords()]
coords

In [None]:
print(
    "latitude: [",
    xwind.coord("grid_latitude").points.min(),
    ", ",
    xwind.coord("grid_latitude").points.max(),
    "]",
)
print(
    "longitude: [",
    xwind.coord("grid_longitude").points.min(),
    ", ",
    xwind.coord("grid_longitude").points.max(),
    "]",
)

### Exercise 5: Change units and add the original units to attributes list 

- change the units of x_wind to km/hr
- add the original units to the attributes list
- print out the attributes to see if new attribtue has added successfully



In [None]:
orig_units = xwind.units
print("original units: ", orig_units)

# Changing the units to km/hr
xwind.convert_units("km h-1")

print("New units: ", xwind.units)

In [None]:
new_attr = {"original_units": orig_units}

xwind.attributes.update(new_attr)

In [None]:
xwind.attributes

___

<div class="alert alert-block alert-success">
<b>Summary</b><br> 
    In this session we learned how:<br>
    <ul>
        <li>to load data from a zarr database into xarray dataset <b> and explore its metadata.</li>
        <li><b>to convert xarray dataset into iris cube and explore its metadata</b></li>
        <li><b>to further explore iris cube's attributes</b> through simple operations</li>
    </ul>

</div>


