# Tutorial 1: Numerical weather data

**Content creators**: Fabian Emmerich

In this tutorial, you will get an insight into the data of a numerical weather model. You will learn how these are strucured and what variables they contain. To do so, we will utilize the [`xarray` Python package](https://xarray.pydata.org), which is designed for labeled, multi-dimensional data.

## Exercise 1: Create a JupyterLab and select the required Jupyter kernel

> **Note:** We will use Apptainer images as a Jupyter kernel. To be able to run Apptainer images, you will have to accept the user agreement for Apptainer images on JuDoor.

To complete the following tutorials, you will use a Jupyter kernel which has all required packages installed. A Jupyter kernel provides a software environment to run your notebooks in. For Python, a kernel may provide a set of pre-installed packages. However, it is also possible to use kernels with other software or even for other programming languages that allow interactive computing.

1. Launch a JupyerLab on a login node via JupyterJSC and launch a terminal. Continue this notebook from there.
1. To create a custom kernel, you have to create a config file (`kernel.json`) in a directory in the `~/.local/share/jupyter/kernels` path, which Jupyter scans for custom configurations.
   1. Create a folder namend `maelstrom-bootcamp` in the required path

In [None]:
import a6
import pathlib
import xarray as xr

path = pathlib.Path("/p/project/training2330/a6/data/ecmwf_era5/nc")
pattern = "*.nc"
paths = a6.utils.list_files(path=path, pattern=pattern)
print("paths")
ds = xr.open_mfdataset(
    paths,
    engine="netcdf4",
    concat_dim="time",
    combine="nested",
    coords="minimal",
    data_vars="minimal",
    compat="override",
    parallel=False,
)

In [None]:
ds.to_netcdf(
    "/p/project/training2330/a6/data/ecmwf_era5/era5_pl_1964_2023_2.nc"
)

In [None]:
!mkdir -p ~/.local/share/jupyter/kernels/maelstrom-bootcamp

B. Create a `kernel.json` file in the previously created path by executing the below cell. The `%%file <file path>` magic command at the top of the cell will write the content of the respective cell to the given path (file).

In [None]:
%%file ~/.local/share/jupyter/kernels/maelstrom-bootcamp/kernel.json
{
 "argv": [
   "apptainer",
   "exec",
   "--cleanenv",
   "-B /usr/:/host/usr/,/etc/slurm:/etc/slurm,/usr/lib64:/host/usr/lib64,/opt/parastation:/opt/parastation,/usr/lib64/slurm:/usr/lib64/slurm,/opt/jsc:/opt/jsc",
   "--nv",
   "/p/project/training2330/a6/jupyter-kernel.sif",
   "python",
   "-m",
   "ipykernel",
   "-f",
   "{connection_file}"
 ],
 "language": "python",
 "display_name": "maelstrom-bootcamp-2023"
}

The `display_name` field in the above JSON structure will be the name under which the kernel will appear in the JupyterLab. 

The `argv` array contains the command that will be executed when the kernel is loaded by Jupyter. Here, we run a command inside an Apptainer image (`apptainer exec [...] jupyter-kernel.sif`) that launches a file with an ipykernel (`python -m ipykernel`). The other arguments passed to the command will allow us to use the software of the system (e.g. Slurm) from within the kernel.

Apptainer, formerly known as Singularity, is a container runtime that was designed for usage on high-performance systems. Containers in general enable to create software environments that can be run on any host, but are separated from the host's operating system. In general, such containers allow installing any software and run it on any system without requiring prerequisites - except Apptainer, of course. As a consequence, they provide the maximum amount of reproducibility. The Apptainer image we use here basically provides a Python environment with a set of packages that you will need to complete the rest of the tutorials.

You can directly check whether setting up the kernel via `jupyter kernelspec list`, which should return a list of all available kernels and their paths. This list should include a kernel with the path from above (`/p/home/jusers/<user>/juwels/.local/share/jupyter/kernels/maelstrom-bootcamp`).

You can execute this command directly from this notebook (see below cell) by using the magic command `!`, which executes the given command in the underlying shell (terminal) of the system (e.g. bash).

In [None]:
!jupyter kernelspec list

1. Now, in the top menu bar, navigate to `Kernel > Change Kernel...`
   <img src="images/jupyterlab-kernel.png" width="60%" height="60%">
1. From the popup's dropdown, select the kernel `maelstrom-bootcamp`.
1. Once the kernel is loaded, you will see it on the top right of the notebook.

   <img src="./images/jupyterlab-kernel-status.png" width="60%" height="60%">
   
   Hovering over the circle to the right of the name will show you the status of the kernel.
   Clicking on the field that contains the name of the kernel also allows you to switch the kernel as in step 1.
   
## Exercise 2: First insight into data of a numerical weather model

In this tutorial, we will use data from the ECMWF ERA5 data set. ECMWF, which is an acronym for the European Center for Medium-Range Weather Forecasts, is a multi-national research institue for numerical weather predictions and climate. ECMWF created a set of numerical weather models for different purposes. For a general overview see [here](https://www.ecmwf.int/en/forecasts/documentation-and-support). The models differ from one another in different aspects, e.g.:

- Forecast horizon
- Spatial resolution
- Number of vertical levels (altitudes)
- Model complexity
- Phyiscal output quantities

The [ECMWF ERA5 data](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5) are reanalysis data of the earth's entire atmosphere. It has different output formats:

- Single level (surface level)
- Pressure levels
- Model levels
- Potential vorticity levels

We will now make use of `xarray` to load some of the data from the model levels and take a look at the data structure. 

There are a bunch of data formats used in meteorology. Two of the most common ones are GRIB and NetCDF. Here, we use data stored in the NetCDF format, which is supported by xarray.

### Tasks

1. Take a look at the folder located at `/p/project/training2330/a6/data/ecmwf_era5/`. Which files are located there and what pattern can you recognize from their names
1. Load one of the data files with xarray, which provides a method [`xarray.open_dataset`](https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html) that allows reading files of different formats, including netCDF.
1. The above method returns a [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), which is a data object that allows accessing the underlying data in a quite handy way. Take some time to investigate the dataset. Take a look at the documentation of xarray to find the required methods for the following tasks:
   1. What coordinates does the dataset use and in what order? What are their value ranges and resolution (step size)?
   1. Does the dataset contain any metadata?
   1. Which variables are contained in the given dataset?
   1. Take a deeper look at one ore more of these variables. How can you access them? Which physical quantity do they represent and what is their unit? What is their shape?
   1. Make a plot of one level and time step of one of these variables. 
   
      *Hint:* xarray has builtin methods that make it easy to plot variables!
1. Load another data file. Take a look at the time stamps of this and the previous dataset.

In [None]:
# CODE HERE