# How to work with ERA5 single levels on Earth Data Hub
### Climatological analysis of temperature in Germany

***
This notebook will provide you guidance on how to access and use the `reanalysis-era5-single-levels.zarr` datset on Earth Data Hub.

The first goal is to compute the 2 metre temperature anomaly for the month of October 2023, in the Germany area, against the 1981-2010 reference period. 

The second goal is to compute the 2 metre temperature climatology (monthly means and standard deviations) in Berlin for the same reference period and compare it with the monthly averages of 2023.
***

## What you will learn:

* how to access and preview the dataset
* select and reduce the data
* plot the results

## Data access and preview
***

Xarray and Dask work together following a lazy principle. This means when you access and manipulate a Zarr store the data is in not immediately downloaded and loaded in memory. Instead, Dask constructs a task graph that represents the operations to be performed. A smart user will reduce the amount of data that needs to be downloaded before the computation takes place (e.g., when the `.compute()` or `.plot()` methods are called).

To preview the data, only the dataset metadata must be downloaded. Xarray does this automatically:

***

In [None]:
import xarray as xr

# your `~/.netrc` file MUST contain your credentials for earthdatahub.com
#
# machine earthdatahub.com
#   login {your_username}
#   password {your_password}

ds = xr.open_dataset(
    "https://earthdatahub.com/stores/ecmwf-era5-single-levels/reanalysis-era5-single-levels.zarr",
    chunks={},
    engine="zarr",
    storage_options={"client_kwargs": {"trust_env": True}},
)
ds

## Working with data

Datasets on EDH are typically very large and remotely hosted. Typical use imply a selection of the data followed by one or more reduction steps to be performed in a local or distributed Dask environment. 

The structure of a workflow that uses EDH data looks like this:
1. data selection
2. (optional) data reduction
3. (optional) visualization

## 2 metre temperature: average October 2023 in Germany

### 1. Data selection

First, we perform a geographical selection corresponding to the Germany area. This greatly reduces the amount of data that will be downloaded from EDH. Also, we convert the temperature to `°C`.

In [None]:
t2m = ds.t2m.astype("float32") - 273.15
t2m.attrs["units"] = "C"
t2m_germany_area = t2m.sel(**{"latitude": slice(55, 47), "longitude": slice(5, 16)})
t2m_germany_area

!NB: At this point, no data has been downloaded yet, nor loaded in memory.

Second, we further select the October 2023 month. This is, again, a lazy operation:

In [None]:
t2m_germany_area_october_2023 = t2m_germany_area.sel(valid_time="2023-10")
t2m_germany_area_october_2023

At this point the selection is small enough to call `.compute()` on it. This will trigger the download of data from EDH and load it in memory. 

We can measure the time it takes:

In [None]:
%time

t2m_germany_area_october_2023 = t2m_germany_area_october_2023.compute()

The data was very small. This didn't take long!

### 2. Data reduction

Now that the data is loaded in memory, we can easily compute the october 2023 monthly mean:

In [None]:
t2m_germany_area_october_2023_monthly_mean = t2m_germany_area_october_2023.mean(dim="valid_time")
t2m_germany_area_october_2023_monthly_mean

### 3. Visualization
Finally, we can plot the october 2023 montly mean on a map:

In [None]:
import display
import matplotlib.pyplot as plt

In [None]:
display.map(t2m_germany_area_october_2023_monthly_mean, vmax=None, cmap="YlOrRd", title="Mean Surface Temperature, Oct 2023")

## 2 metre temperature: October 2023 anomaly in Germany

Following the above schema we can compute the 2 metre temperature anomaly for the month of October 2023 against the 1981-2010 reference period, once again in Germany.

We fistly select the relevant months in the reference period:

In [None]:
t2m_germany_area_octobers_1981_2010 = t2m_germany_area.sel(valid_time=t2m_germany_area["valid_time.month"] == 10).sel(valid_time=slice("1981", "2010"))
t2m_germany_area_octobers_1981_2010

This is small enought to be computed in reasonable time:

In [None]:
%%time

t2m_germany_area_octobers_1981_2010 = t2m_germany_area_octobers_1981_2010.compute()

Now that the data is loaded in memory we can esily compute the 1981-2010 octobers mean:

In [None]:
t2m_germany_area_octobers_1981_2010_mean = t2m_germany_area_octobers_1981_2010.mean(dim="valid_time")

And finally the anomaly:

In [None]:
anomaly = t2m_germany_area_october_2023_monthly_mean - t2m_germany_area_octobers_1981_2010_mean
anomaly

We can plot the anomaly on a map:

In [None]:
display.map(
    anomaly,
    vmax=None, 
    cmap="YlOrRd", 
    title="Mean Surface Temperature anomaly (ref 1981-2010), Oct 2013"
)

## 1981-2010 climatology vs 2023 montly mean

We will now compute the 2 metre temperature climatology (montly mean and standard deviation) in Berlin, over the reference period 1981-2010, and compare it with the 2023 monthly means.

We first select the closet data to Berlin:

In [None]:
%%time

t2m_Berlin = t2m.sel(**{"latitude": 52.5, "longitude": 13.4}, method="nearest")
t2m_Berlin

This is already small enought to be computed:

In [None]:
%%time

t2m_Berlin = t2m_Berlin.compute()

Now that the data is loaded in memory we can easily compute the climatology for the reference period 1981-2010:

In [None]:
t2m_Berlin_climatology_mean = t2m_Berlin.sel(valid_time=slice("1981", "2010")).groupby("valid_time.month").mean(dim="valid_time")
t2m_Berlin_climatology_std = t2m_Berlin.sel(valid_time=slice("1981", "2010")).groupby("valid_time.month").std(dim="valid_time")

We also compute the  monthly means for the year 2023:

In [None]:
t2m_Berlin_2023_monthly_means = t2m_Berlin.sel(valid_time="2023").resample(valid_time="1M").mean(dim="valid_time")
t2m_Berlin_2023_monthly_means

We can finally plot the climatology in Berlin for the 1981-2010 refrence period against the 2023 montly means:

In [None]:
plt.figure(figsize=(10, 5))
t2m_Berlin_climatology_mean.plot(label="Mean", color="#3498db")
plt.errorbar(
    t2m_Berlin_climatology_mean.month, 
    t2m_Berlin_climatology_mean, 
    yerr=t2m_Berlin_climatology_std, 
    fmt="o", 
    label="Standard Deviation",
    color="#a9a9a9"
)
for month in range (1, 11):
    t2m_point = t2m_Berlin_2023_monthly_means.sel(valid_time=t2m_Berlin_2023_monthly_means["valid_time.month"]==month)
    label = None
    if month == 1:
        label = "2023"
    plt.scatter(month, t2m_point, color="#ff6600", label=label)
plt.title("Surface Temperature climatology in Berlin (DE), 1981-2010")
plt.xticks(t2m_Berlin_climatology_mean.month)
plt.xlabel("Month")
plt.ylabel("Surface Temperature [C]")
plt.legend()
plt.grid(alpha=0.3)
plt.show()