# ERA5 reanalysis on single levels on Earth Data Hub

***
This notebook will provide you guidance on how to access and use the reanalysis-era5-single-levels.zarr datset on Earth Data Hub.

The first goal is to compute the 2 metre temperature (t2m) anomaly for the month of October 2023, in the Germany area, against the 1981-2010 reference period. 

The second goal is to compute the t2m climatology (monthly means and standard deviations) in Berlin for the same reference period and compare it with the monthly averages of 2023.
***

## Key points in this notebook:

* import the necessary dependencies
* declare workflow parameters
* preview the data
* select and reduce the data
* plot the results

### Import the necessary dependences
***


In [None]:
import display
import matplotlib.pyplot as plt
import xarray as xr

### Declare workflow parameters
***


In [None]:
AREA = {"latitude": slice(55, 47), "longitude": slice(5, 16)}
DATASET = "s3://ecmwf-era5-single-levels/reanalysis-era5-single-levels.zarr"
LOCATION = {"latitude": 52.5, "longitude": 13.4}
REFERENCE_PERIOD = slice("1981", "2010")
MONTH = "2023-10"

### Preview, select, reduce and plot the data
***
The typical structure of a workflow that uses EDH data looks like this:
1. data preview
2. data selection
3. data reduction
4. plot

While allowing the user to preview the data and plan the operations to perform on it, these steps reduce the amount of data that will be downloaded and processed when the computation is triggered.
***

#### 1. Data preview
Xarray and Dask work together following a lazy principle. This means operations are not performed when called, but rather when actually needed. For instance, the computation is triggered when the `.compute()` or `.plot()` methods are called. 

To preview the data, only the dataset metadata must be downloaded. Xarray does this automatically:


In [None]:
ds = xr.open_dataset(DATASET, chunks={}, engine="zarr").astype("float32")
ds

#### 2. Data selection

To reduce the amount of downloaded data we perform a geographical selection corresponding to the Germany AREA. Before that, we also convert the temperature to °C:

In [None]:
t2m = ds.t2m - 273.15
t2m.attrs["units"] = "C"
t2m_area = t2m.sel(**AREA)
t2m_area

#### Average October 2023 2 metre temperature in Germany

From the dataset above, we further select October 2023

In [None]:
t2m_area_month = t2m_area.sel(valid_time="2023-10")
t2m_area_month

#### 3. Data reduction

Now we can then compute the monthly mean, effectively reducing the data to a bi-dimensional shape:

In [None]:
t2m_area_month_mean = t2m_area_month.mean(dim="valid_time")
t2m_area_month_mean

***
! Remember, the computation is truly performed only when the `.compute()` method is called.
***


In [None]:
%%time
%%capture

t2m_area_month_mean = t2m_area_month_mean.compute()

#### 4. Plot
Finally, we can plot the results on a map:

In [None]:
display.map(t2m_area_month_mean, vmax=None, cmap="YlOrRd", title="Mean Surface Temperature, Oct 2023")

#### October 2023 anomaly

Following the above schema of first selecting and then reducing the data, we can easily compute the average October 2 metre temperature for the reference period.

Again, we only consider the Germany area.

In [None]:
%%time

t2m_area_month_mean_ref = t2m_area.sel(valid_time=t2m_area["valid_time.month"] == 10).sel(valid_time=REFERENCE_PERIOD).mean(dim="valid_time").compute()
t2m_area_month_mean_ref

After that, the anomaly computation is straightforward:

In [None]:
anomaly = t2m_area_month_mean - t2m_area_month_mean_ref
anomaly

We can plot it on a map:

In [None]:
display.map(
    anomaly,
    vmax=None, 
    cmap="YlOrRd", 
    title="Mean Surface Temperature anomaly (ref 1981-2010), Oct 2013"
)

#### 1981-2010 climatology vs 2023 montly mean

We will now compute the 2 metre temperature climatology (montly mean and standard deviation) in Berlin, over the reference period, and compare it with the 2023 monthly means.

We first select the closet data to Berlin:

In [None]:
%%time

t2m_loc = t2m.sel(**LOCATION, method="nearest")
t2m_loc

We then compute the climatology for the reference period 1981-2010:

In [None]:
t2m_climatology_mean = t2m_loc.sel(valid_time=REFERENCE_PERIOD).groupby("valid_time.month").mean(dim="valid_time").compute()
t2m_climatology_std = t2m_loc.sel(valid_time=REFERENCE_PERIOD).groupby("valid_time.month").std(dim="valid_time").compute()

We also compute the  monthly means for the year 2023:

In [None]:
t2m_monthly_mean = t2m_loc.sel(valid_time="2023").resample(valid_time="1M").mean(dim="valid_time").compute()

We finally plot the climatology in Berlin for the 1981-2010 refrence period against the 2023 montly means:

In [None]:
plt.figure(figsize=(10, 5))
t2m_climatology_mean.plot(label="Mean", color="#3498db")
plt.errorbar(
    t2m_climatology_mean.month, 
    t2m_climatology_mean, 
    yerr=t2m_climatology_std, 
    fmt="o", 
    label="Standard Deviation",
    color="#a9a9a9"
)
for month in range (1, 11):
    t2m_point = t2m_monthly_mean.sel(valid_time=t2m_monthly_mean["valid_time.month"]==month)
    label = None
    if month == 1:
        label = "2023"
    plt.scatter(month, t2m_point, color="#ff6600", label=label)
plt.title("Surface Temperature climatology in Berlin (DE), 1981-2010")
plt.xticks(t2m_climatology_mean.month)
plt.xlabel("Month")
plt.ylabel("Surface Temperature [C]")
plt.legend()
plt.grid(alpha=0.3)
plt.show()