# How to work with Opendata real-time forecast on Earth Data Hub
### Surface temperature forecast in Rome

***
This notebook will provide you guidance on how to access and use the `s3://hedp/ecmwf-forecasts/opendata-oper-20231016T00.zarr` datset on Earth Data Hub.

The goal is to plot the surface temperature forecsat for October 16th.
***

## What you will learn:

* how to access and preview the dataset
* select and reduce the data
* plot the results

## Data access and preview
***

Xarray and Dask work together following a lazy principle. This means when you access and manipulate a Zarr store the data is in not immediately downloaded and loaded in memory. Instead, Dask constructs a task graph that represents the operations to be performed. A smart user will reduce the amount of data that needs to be downloaded before the computation takes place (e.g., when the `.compute()` or `.plot()` methods are called).

To preview the data, only the dataset metadata must be downloaded. Xarray does this automatically:

***

In [1]:
import xarray as xr

# your `~/.netrc` file MUST contain your credentials for earthdatahub.com
#
# machine data.earthdatahub.com
#   login {your_username}
#   password {your_password}

ds = xr.open_dataset(
    "https://data.earthdatahub.com/hedp/ecmwf-forecasts/opendata-oper-20231016T00.zarr",
    chunks={},
    engine="zarr",
    storage_options={"client_kwargs": {"trust_env": True}},
)
ds

FileNotFoundError: No such file or directory: 'https://data.earthdatahub.com/hedp/ecmwf-forecasts/opendata-oper-20231016T00.zarr'

## Working with data

Datasets on EDH are typically very large and remotely hosted. Typical use imply a selection of the data followed by one or more reduction steps to be performed in a local or distributed Dask environment. 

The structure of a workflow that uses EDH data looks like this:
1. data selection
2. (optional) data reduction
3. (optional) visualization

## Rome 2 metre temperature forecast

### 1. Data selection
We select the 2 metre temperature variable from the dataset and we convert it form `K` to `°C`. Forecast steps are given in `ns`, we convert it to `h`.

In [None]:
t2m = ds["2t"] - 273.15
t2m.attrs["units"] = "C"
t2m["step"] = t2m.step.astype('datetime64[ns]').astype('float64') / 1e9 / 3600
t2m

We select then the closest point to Rome:

In [None]:
t2m_Rome = t2m.sel(**{"latitude": 41.9, "longitude": 12.5}, method="nearest")
t2m_Rome

At this point, no data has been downloaded yet, nor loaded in memory. However, the selection is small enough to call `.compute()` on it. This will trigger the download of data from EDH and load it in memory.

We can mesure the time it takes:

In [None]:
%%time

t2m_Rome = t2m_Rome.compute()

The data was very small, this didn't take long.

### 2. Visualization

Now that the data is loaded in memory, we can easily plot the temperature forecast.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

t2m_Rome.plot()
plt.title("Surface Temperature forecast in Rome (IT)")
plt.ylabel("Surface Temperature [C]")
reference_time = np.datetime_as_string(t2m_Rome.time.values)
plt.xlabel(f"time since {reference_time[:-10]} [hr]")
plt.show()