# How to work with Climate Adaptation Digital Twin data on Earth Data Hub

***
This notebook will provide you guidance on how to access and use the `SSP3-7.0-IFS-NEMO-sfc-standard-v0.zarr` datset on Earth Data Hub. This is a sample dataset for the Destine Climate Adaptation Digital Twin.

Our first goal is to plot the mean 2 metre temperature in January 2030 over Central Europe.

Our second goal is to compute the 2 metre temperature climatology (monthly means and standard deviations) in Berlin for the 2020-2030 reference period.
***

## What you will learn:

* how to access and preview the dataset
* select and reduce the data
* plot the results

## Data access and preview
***

Xarray and Dask work together following a lazy principle. This means when you access and manipulate a Zarr store the data is in not immediately downloaded and loaded in memory. Instead, Dask constructs a task graph that represents the operations to be performed. A smart user will reduce the amount of data that needs to be downloaded before the computation takes place (e.g., when the `.compute()` or `.plot()` methods are called).

To preview the data, only the dataset metadata must be downloaded. Xarray does this automatically:

***

In [None]:
import xarray as xr

# your `~/.netrc` file MUST contain your credentials for earthdatahub.com
#
# machine data.earthdatahub.com
#   login {your_username}
#   password {your_password}

ds = xr.open_dataset(
    "https://data.earthdatahub.com/destine-climate-dt/SSP3-7.0-IFS-NEMO-sfc-standard-v0.zarr",
    chunks={},
    engine="zarr",
    storage_options={"client_kwargs": {"trust_env": True}},
)
ds

## Working with data

Datasets on EDH are typically very large and remotely hosted. Typical use imply a selection of the data followed by one or more reduction steps to be performed in a local or distributed Dask environment. 

The structure of a workflow that uses EDH data looks like this:
1. data selection
2. (optional) data reduction
3. (optional) visualization

## 2 metre temperature: average January 2030 in Germany

### 1. Data selection

First, we perform a geographical selection corresponding to the Germany area. This greatly reduces the amount of data that will be downloaded from EDH. Also, we convert the temperature to `°C`.

In [None]:
t2m = ds.t2m.astype("float32") - 273.15
t2m.attrs["units"] = "C"
t2m_germany_area = t2m.sel(**{"latitude": slice(47, 55), "longitude": slice(5, 16)})
t2m_germany_area

!NB: At this point, no data has been downloaded yet, nor loaded in memory.

Second, we further select January 2030. This is again a lazy operation:

In [None]:
t2m_germany_area_january_2030 = t2m_germany_area.sel(time="2030-01")
t2m_germany_area_january_2030

At this point the selection is small enough to call `.compute()` on it. This will trigger the download of data from EDH and load it in memory. 

We can measure the time it takes:

In [None]:
%%time

t2m_germany_area_january_2030 = t2m_germany_area_january_2030.compute()

The data was very small, this didn't take long.

### 2. Data reduction

Now that the data is loaded in memory, we can easily compute the october 2023 monthly mean:

In [None]:
t2m_germany_area_january_2030_monthly_mean = t2m_germany_area_january_2030.mean(dim="time")
t2m_germany_area_january_2030_monthly_mean

### 3. Visualization
Finally, we can plot the january 2030 montly mean on a map:

In [None]:
import display
import matplotlib.pyplot as plt

In [None]:
display.map(t2m_germany_area_january_2030_monthly_mean, vmax=None, cmap="YlOrRd", title="Mean Surface Temperature, Jan 2030")

## 2020-2030 climatology

We will now compute the 2 metre temperature climatology (montly mean and standard deviation) in Berlin, over the reference period 2020-2030

We first select the closet data to Berlin:

In [None]:
%%time

t2m_Berlin = t2m.sel(**{"latitude": 52.5, "longitude": 13.4}, method="nearest")
t2m_Berlin

This is already small enought to be computed:

In [None]:
%%time

t2m_Berlin = t2m_Berlin.compute()

Now that the data is loaded in memory we can easily compute the climatology for the reference period 2020-2030:

In [None]:
t2m_Berlin_climatology_mean = t2m_Berlin.groupby("time.month").mean(dim="time")
t2m_Berlin_climatology_std = t2m_Berlin.groupby("time.month").std(dim="time")

We can finally plot the climatology in Berlin for the 2020-2030 refrence period

In [None]:
plt.figure(figsize=(10, 5))
t2m_Berlin_climatology_mean.plot(label="Mean", color="#3498db")
plt.errorbar(
    t2m_Berlin_climatology_mean.month, 
    t2m_Berlin_climatology_mean, 
    yerr=t2m_Berlin_climatology_std, 
    fmt="o", 
    label="Standard Deviation",
    color="#a9a9a9"
)

plt.title("Surface Temperature climatology in Berlin (DE), 2020-2030")
plt.xticks(t2m_Berlin_climatology_mean.month)
plt.xlabel("Month")
plt.ylabel("Surface Temperature [C]")
plt.legend()
plt.grid(alpha=0.3)
plt.show()