## Data from different sources

In this notebook, we will compare global temperature anomalies from two datasets:
- [ERA5 reanalysis](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means?tab=download) from **ECMWF**, available through the Climate Data Store.
- [HADCRUT5](https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.1.0.0/download.html), a dataset of global historical surface temperature anomalies from the **UK Met Office**, available on their website as a direct download.

We will then compare the anomalies from each of these datasets for **July 2024** - one of the hottest months on record globally.

### The bigger picture

This notebook demonstrates that we can get data from:
- Two different institutions (ECMWF and the UK Met Office)
- From two different source (CDS and URL)
- In two different formats (GRIB and netCDF)
- And that the earthkit ecosystem will treat them as "equal citizens", with earthkit tools working with the same API in both cases.

### Components of earthkit

This tutorial uses the following earthkit components - click any logo to open the package documentation:

<div align="center">
  <br>
  <a href="https://earthkit-data.readthedocs.io/en/latest/" target="_blank" style="display:inline-block; margin: 0 15px;">
    <img src="https://raw.githubusercontent.com/ecmwf/logos/refs/heads/main/logos/earthkit/earthkit-data-light.svg" alt="earthkit-data" width="200">
  </a>
  <a href="https://earthkit-transforms.readthedocs.io/en/latest/" target="_blank" style="display:inline-block; margin: 0 15px;">
    <img src="https://raw.githubusercontent.com/ecmwf/logos/refs/heads/main/logos/earthkit/earthkit-transforms-light.svg" alt="earthkit-transforms" width="200">
  </a>
  <a href="https://earthkit-plots.readthedocs.io/en/latest/" target="_blank" style="display:inline-block; margin: 0 15px;">
    <img src="https://raw.githubusercontent.com/ecmwf/logos/refs/heads/main/logos/earthkit/earthkit-plots-light.svg" alt="earthkit-plots" width="200">
  </a>
</div>

By importing `earthkit`, we get access to all of these tools with a single import.

In [None]:
import earthkit as ek

### 1. Getting the data

Let's do a comparison of the temperature anomalies for **July 2024** (one of the hottest months on record globally) from ERA5 and HADCRUT5.

#### 1.1 ERA5

In order to get temperature anomalies for July 2024 from ERA5, we need to acces the [ERA5 monthly averaged reanalysis](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means?tab=download) dataset from the CDS.

>In order to access ERA5 renalysis data, you will need an account on the Copernicus Climate Data Store (CDS). If you do not have an account, please visit [the CDS website](https://cds.climate.copernicus.eu/#!/home) and register for an account. Then, [follow these instructions](https://cds.climate.copernicus.eu/how-to-api) (step 1 only) to set up your API key.

HADCRUT5 data is an anomaly against the 1961-1991 average, so we need to retrieve:
- Data for every July from 1961-1991. This is our **reference period**.
- Data for July 2024, to calculate the anomaly.

In [None]:
era5_reference_data = ek.data.from_source(
    "cds", "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": list(range(1961, 1991)),
        "month": "07",
        "time": "00:00",
    },
)

era5_2024_data = ek.data.from_source(
    "cds", "reanalysis-era5-single-levels-monthly-means",
    {
        "product_type": "monthly_averaged_reanalysis",
        "variable": "2m_temperature",
        "year": 2024,
        "month": "07",
        "time": "00:00",
    },
)

#### 1.2 HADCRUT5

The HADCRUT5 dataset is available on the [Met Office's dedicated website](https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.1.0.0/download.html). We can access the gridded monthly dataset directly from a URL. 

In [None]:
HADCRUT5_URL = "https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/analysis/HadCRUT.5.0.2.0.analysis.anomalies.ensemble_mean.nc"
hadcrut5_data = ek.data.from_source("url", HADCRUT5_URL)

### 2. Data analysis

#### 2.1 ERA5

Now we need to do some analysis with earthkit-transforms to calculate the ERA5 temperature anomalies.

First, let's convert the GRIB data that we retrieved from the CDS to xarray.

In [None]:
reference = era5_reference_data.to_xarray()
july_2024 = era5_2024_data.to_xarray(ensure_dims="forecast_reference_time")

Now let's calculate a climatology from the reference data, and use it to calculate an anomaly for July 2024.

In [None]:
climatology = ek.transforms.climatology.mean(reference, frequency="month")
climatology

We calculate anomalies by finding the difference between some data at a point in time against the long-term average. Our *climatology* is our long-term average, and earthkit-transforms provides an `anomaly` method for conveniently calculating the difference between the two.

In [None]:
era5_july_anomaly = ek.transforms.climatology.anomaly(july_2024, climatology)

#### 2.2 HADCRUT5
As for HADCRUT5, this data is already anomalies - so we just need to extract July 2024.

In [None]:
hadcrut5 = hadcrut5_data.to_xarray()
hadcrut5

In [None]:
hadcrut5_july_anomaly = hadcrut5.tas_mean.sel(time="2024-07-16")
hadcrut5_july_anomaly

### 3. Visualisation

We can use earthkit-plots to visualise these two datasets using the same principles.

First, let's design a suitable style for these anomalies.

In [None]:
style = ek.plots.styles.Contour(
    colors=[
        "#1B2C62", "#1F4182", "#2355A1", "#3978BB", "#519BD2", "#71B8E4",
        "#91D1F2", "#B0E1F8", "#CBEBF9", "#E3F4FB", "#F5FBFE", "#FEFBEA",
        "#FDF2BC", "#FCE18A", "#FDC659", "#FDA731", "#F9872D", "#F26429",
        "#E34128", "#D01F27", "#B31A21", "#921519",
    ],
    levels=range(-7, 8),
    ticks=range(-7, 8),
    extend="both",
    # the data is in Kelvin but we want to show celsius in the legend
    # if we use the `units` key it will attempt to convert units to C
    # units_label lets us just override the label
    units_label="°C",
)

Now we can plot them!

In [None]:
import cartopy.crs as ccrs

figure = ek.plots.Figure(crs=ccrs.Robinson(), rows=1, columns=2, size=(8, 5.5))

# We can throw both datasets at the figure and it will iterate over subplots to plot them
figure.pcolormesh([hadcrut5_july_anomaly, era5_july_anomaly], style=style)

figure.coastlines(resolution="low")
figure.borders(resolution="low")

figure.legend(label="temperature anomaly ({units})")

# Add titles
figure[0].title("HADCRUT5")
figure[1].title("ERA5")
# We can use the "time" key once here as it should have the same value for ERA5 and HADCRUT5
figure.title("Global temperature anomaly during {time:%B %Y}", fontsize=15)

# Add shading to emphasise the missing data in HADCRUT5
# We can directly access the underlying matplotlib objects to do this
x = [-180, -180, 180, 180, -180]
y = [-90, 90, 90, -90, -90]
figure[0].ax.fill(x, y, transform=ccrs.PlateCarree(), hatch="///////", fill=False, zorder=0)

figure.show()

### Exercises

1. Now that you have compared July 2024 between HADCRUT5 and ERA5, can you do the same for another month and/or year?
1. Can you show a zoomed-in version of this comparison over Europe?