# **Handling CMIP6 Data**

An important part of CMIP is to make the multi-model output publicly available in a standardized format for analysis by the wider climate community and users. The standardization of the model output in a specified format, and the collection, archival, and access of the model output through the Earth System Grid Federation ([ESGF](https://esgf.llnl.gov/)) data replication centres have facilitated multi-model analyses.

Useful links:
* [ESGF CMIP6 data search and download portal](https://esgf-node.llnl.gov/search/cmip6/)
* [CMIP6 data request home page](https://clipc-services.ceda.ac.uk/dreq/index.html)
* [CMIP6 data frenquency list](https://clipc-services.ceda.ac.uk/dreq/index/miptable.html)
* [CMIP6 variables list](https://clipc-services.ceda.ac.uk/dreq/index/var.html)
* [CMIP6 data request overview paper: Juckes et al., 2020](https://gmd.copernicus.org/articles/13/201/2020/)

## CMIP6 data access in the course Jupyterhub

CMIP6 model data (except for NorESM output) can be found in the **shared-tacco-ns1004k-cmip-betzy** folder. 
- folder structure: shared-tacco-ns1004k-cmip-betzy/model-institute/model-name/experiment/ensemble-member/data-frequency/variable/grid/data-version

NorESM model data are stored in the **shared-tacco-ns1004k-cmroot** folder.
- folder structure: shared-tacco-ns1004k-cmroot/model-name/experiment/data-version/

Data filename format: variable_data-frequency_model-name_experiment_ensemble-member_grid_time-period.nc 

**Tasks:**
1. familiarize yourself with the folder structure 
2. have a look at the different CMIP6 models
3. have a look at the [data frequency list](https://clipc-services.ceda.ac.uk/dreq/index/miptable.html) and [variables](https://clipc-services.ceda.ac.uk/dreq/index/var.html)
2. $\color{red}{\text{find the file ?? of NorESM...?}}$

## netCDF files

The CMIP6 model output is stored in netCDF files.<br>

**What is a netCDF file?**
- netCDF stands for “Network Common Data Form” and is a data format (and set of libraries) which is commonly used to store and share large, array-oriented scientific data. 
- It is self-describing, portable, metadata friendly and supported by many languages (including python, R, fortran, C/C++, Matlab, NCL, etc.). 
- It is commonly used for gridded model outputs and related to the HDF and CDF formats, which you might have stumbled upon.

**How to inspect a netCDF file?**
- As stated above, you can read in the content of netCDF files with many programming languages, but netCDF files are often too big to open directly (with your favorite text editor, for instance). 
- If you want to only inspect your netCDF file, you can do this using e.g. *ncdump* (a simple browser), *ncview* (a visual browser) or *panoply* (a data viewer).

[ncdump](http://www.bic.mni.mcgill.ca/users/sean/Docs/netcdf/guide.txn_79.html#:~:text=The%20ncdump%20tool%20generates%20an,variable%20data%20in%20the%20file.&text=Thus%20ncdump%20and%20ncgen%20can,between%20binary%20and%20ASCII%20representations.) can be used as a simple browser for netCDF data files, to display the dimension names and sizes; variable names, types, and shapes; attribute names and values; and optionally, the values of data for all variables or selected variables in a netCDF file.

**Task:**
1. Open a terminal in Jupyterhub (File->New->Terminal) and inspect the header of a CMIP6 data file using ncdump.

## Visualizing CMIP6 data with python

**Task:**
1. Visualize a 3D field $\color{red}{\text{(give NorESM example)}}$ using the skeleton code below. To do so, fill the gaps marked with << >>.

Firstly, import Python packages that make working with labelled multi-dimensional arrays simple and efficient

In [4]:
import xarray as xr
import pandas as pd
import matplotlib as mpl

%matplotlib inline

Set the path to the file you want to visualize

In [None]:
path = << >>
filename = path + << >>
print(filename)

Load netcdf file into an xarray dataset. \
Display the dataset to see the dimensions, coordinates and data variables that can be adressed when reading the dataset.


In [None]:
ds =xr.open_dataset(filename)
ds

Now we can adress/index the variable of interest via *xarray.Dataset* and use the [isel()](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.isel.html) function to choose the timestep in the dataset.

In [None]:
timestep = 0
ds.<< >>.isel(time=timestep).plot()

..or the [sel()](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.sel.html) function to label the time explicitely

In [None]:
timestring = '<< >>'
ds.<< >>.sel(time=timestring).plot()

**Task:** <br>
1. Plot a (slice of a) **4D-field** (e.g. temperature at a specific pressure level instead of near-surface air temperature): <br>

In [3]:
# set the path to the corresponding data file, read the file as above and display it's content. Make yourself familiar with the dimensions and variables in the dataset.



In [None]:
# address and plot a slice of your 4D dataset by specifing the vertical dimension, e.g. the pressure level, in isel (in addition to the time specification)



# **Customize your Maps**

## Set Figure Size
To adjust the figure, we can use the matplotlib package:

In [None]:
#import python package
import matplotlib as mpl

#adjust figure size
mpl.rcParams['figure.figsize'] = [10., 8.]
ds.<< >>.isel(time=<< >>).plot()

## Change the Map Projection
We can use the Python package Cartopy to produce maps and do other geospatial data analyses. We will also use pyplot, a collection of functions that make plotting simpler.

In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection=ccrs.Miller())

ds.<< >>.isel(plev=<< >>,time=<< >>).plot(ax=ax, 
           transform=ccrs.PlateCarree(),
           cmap=load_cmap('broc') 
          )

As you can see on the figure above, axes labels disappear when you change the map projection.You can add gridlines:

In [None]:
ax.coastlines()

On our JupyterHub, an older version of cartopy is installed, where not all projections support lat/lon labelling. The *PlateCarree* and *Mercator* projections do support lat/lon labelling:

In [None]:
fig = plt.figure(figsize=(15,10))
ax = plt.axes(projection=ccrs.Mercator())

ds.<< >>.isel(plev=-1,time=0).plot(ax=ax, 
           transform=ccrs.PlateCarree(),
           cmap=load_cmap('broc') 
          )

ax.coastlines()

# Add gridlines with labels
gl = ax.gridlines(color='lightgrey', linestyle='-', draw_labels=True)

# Do not draw labels on the top and right of the map.
gl.xlabels_top = False
gl.ylabels_right = False

# **Georeferenced Latitude-Vertical Plots**

## 2D plot for one longitude point
Use the sel() function to select the data along one latitude and at one time:

In [None]:
ds.<< >>.sel(lon=<< >>,time= << >>).plot(cmap=load_cmap('broc'))

Having pressure values as the vertical coordinate, it is clear that we need to revert the vertical axis to get the lower values at the top and the highest values at the bottom:

In [None]:
ds.<< >>.sel(lon=<< >>,time= << >>).plot(cmap=load_cmap('broc'))
plt.ylim(plt.ylim()[::-1])

For pressure levels, we usually use hPa instead of Pa. Change the pressure levels from Pa to hPa:

In [None]:
# change pressure level from Pa to hPa within your dataset
hpa=ds.<< >>.plev/100
ds['plev']=hpa
ds.<< >>.plev.attrs['units'] = 'hPa'
ds.<< >>.plev.attrs['standard_name'] = 'air_pressure'
ds.<< >>.plev

# plot
ds.<< >>.sel(lon=<< >>,time= << >>)).plot(cmap=load_cmap('broc'))
plt.ylim(plt.ylim()[::-1])

We can also adjust the top of the figure. When the vertical axis is pressure levels, we can use a log scale to plot it so as to make it more intuitive to look at.

In [None]:
ds.<< >>.sel(lon=<< >>,time= << >>).plot(cmap=load_cmap('broc'))
plt.ylim(plt.ylim()[::-1])
plt.ylim(top=0.001)
plt.yscale('log')

## 2D plot over averaged longitudes

Instead of selecting one longitude, we can also average over all the longitudes, using the mean function:

In [None]:
ds.<< >>.sel(time=<< >>).mean(dim='lon').plot(cmap=load_cmap('broc'))
plt.ylim(plt.ylim()[::-1])