# Compute global temperature average for observational datasets

This notebook follows the guidelines of the _State of the Global Climate 2021_([link]) report to compute observational timeseries of the global annual temperature average (land+ocean) to be used in warming level computations. The same datasets are used, except ERA5 for which I didn't have the data at hand when writing this.

When possible, data is downloaded directly here.

Most of those datasets are given as anomalies relative to some period. The WMO guidelines give a workflow to compute anomalies relative to 1850-1900. However, the xscen dataset gives absolute values, not anomalies. We add an estimate of the mean temperature over 1850-1900 to the observational anomalies in order to have compatible values. The xscen dataset should nonetheless always be used by computing anomalies. The value of 13.79°C was computed with Berkeley Earth's timeseries.

[link]: https://library.wmo.int/idurl/4/56300

## Preparation

In [None]:
import os
import re
import sys

os.environ["ESMFMKFILE"] = os.path.join(sys.prefix, "lib", "esmf.mk")
from datetime import datetime, timedelta

import dask
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
import xarray as xr
from dask.diagnostics import ProgressBar

import xscen as xs

dask.config.set(num_workers=12)

In [None]:
# WMO's reference period
ref_period = [1981, 2010]
# Tempertarue difference between the reference period and 1850-1900
# See source. Computed according to the IPCC's AR6 WG1.
ref_delta = 0.69

# This was computed from Berkeley Earth and is an estimate of the 1850-1900 mean
# it is used to get series that looks like real temperature.
preind_abs = 13.79


def clean(tas):
    """Transform anomalies relative to some period to anomalies relative to 1850-1900 and then adds an estimate for that period."""
    # From tas to anomalies relative to the reference period to anomalies relative to 1850-1900.
    tas_pre = tas - tas.sel(year=slice(*ref_period)).mean() + ref_delta
    # Fill 1850-1900 with 0s
    first_year = tas_pre.year.values[0]
    last_0_year = min(1900, first_year)
    zeros = xr.DataArray(
        [0] * (last_0_year - 1850),
        dims=("year",),
        coords={"year": np.arange(1850, last_0_year)},
    )
    # Concat over years and add the estimate to get absolute values
    return xr.concat([zeros, tas_pre], "year") + preind_abs


# The output, a list of DataArrays with a year dim and a "source" singleton dim
temps = []

### Berkeley Earth
Rohde, R. A.; Hausfather, Z. The Berkeley Earth Land/Ocean Temperature Record. Earth System Science Data 2020, 12, 3469–3479. https://doi.org/10.5194/essd-12-3469-2020.

The annual summary of the Global Monthly Averages from 1850 to preset is available online as text file. It comes as an anomaly relative to the 1951-1980 average, which is given directly in the header of the text file. We have the choice of using temperature above or below sea ice for the ocean component. I didn't anything in the WMO recommendation about this, so we will use the temperature above.

For this dataset, we will make two versions. One version will use the 0.69°C delta to compute absolute anomalies from 1850-1900, as done in the WMO report. The other will simply use the 1850-1900 values directly, as they are available for this dataset.

In [None]:
# Get data
with open("Berkeley_data.txt", "wb") as f:
    res = requests.get(
        "https://berkeley-earth-temperature.s3.us-west-1.amazonaws.com/Global/Land_and_Ocean_summary.txt"
    )
    f.write(res.content)

df = pd.read_table(
    "Berkeley_data.txt",
    skiprows=58,
    usecols=[0, 1],
    names=["year", "temp"],
    sep=r"\s+",
    index_col="year",
)
da = df.temp.to_xarray().assign_attrs(units="°C")

# Get global average for the reference period of the data
with open("Berkeley_data.txt") as f:
    for line in f:
        if "% Estimated Jan 1951-Dec 1980 global mean temperature (C)" in line:
            data = re.search(r"(\d{2}.\d{3})", next(f))
            break
refAvg = float(data.groups()[0])
refAvg

daAbs = da + refAvg

daWMO = clean(da)

temps.append(daAbs.expand_dims(source=["Berkeley-Raw"]))
temps.append(daWMO.expand_dims(source=["Berkeley"]))

In [None]:
# A figure to look at it
fig, ax = plt.subplots(figsize=(10, 3))
(daAbs - daAbs.sel(year=slice(1850, 1900)).mean()).plot(ax=ax, label="Raw")
daWMO.plot(ax=ax, label="WMO")
ax.set_title(
    "Global Average Temperature according to Bekerley - anomalies vs 1850-1900"
)
ax.set_xlabel("years")
ax.set_ylabel("[°C]")
ax.legend()

### GISTEMP v4
GISTEMP Team, 2023: GISS Surface Temperature Analysis (GISTEMP), version 4. NASA Goddard Institute for Space Studies. Dataset accessed 2023-12-06 at https://data.giss.nasa.gov/gistemp/.

This dataset comes in a CSV of anomalies relative to 1951-1980.


In [None]:
df = pd.read_csv(
    "https://data.giss.nasa.gov/gistemp/tabledata_v4/GLB.Ts+dSST.csv",
    usecols=["Year", "J-D"],
    skiprows=1,
    index_col="Year",
    na_values="***",
)
da = df["J-D"].to_xarray().rename(Year="year").rename("temp").assign_attrs(units="°C")

daWMO = clean(da)

temps.append(daWMO.expand_dims(source=["GISTEMPv4"]))

### HadCRUT5
Morice, C.P., J.J. Kennedy, N.A. Rayner, J.P. Winn, E. Hogan, R.E. Killick, R.J.H. Dunn, T.J. Osborn, P.D. Jones and I.R. Simpson (in press) An updated assessment of near-surface temperature change from 1850: the HadCRUT5 dataset. Journal of Geophysical Research (Atmospheres) doi:10.1029/2019JD032361 (supporting information). 

The CSV is an anomaly relative to 1961-1990.

In [None]:
df = pd.read_csv(
    "https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/analysis/diagnostics/HadCRUT.5.0.2.0.analysis.summary_series.global.annual.csv",
    usecols=[0, 1],
    index_col=0,
)
da = (
    df["Anomaly (deg C)"]
    .to_xarray()
    .rename(Time="year")
    .rename("temp")
    .assign_attrs(units="°C")
)

daWMO = clean(da)

temps.append(daWMO.expand_dims(source=["HadCRUT5"]))

### NOAAGlobalTemp v5
R. S. Vose, B. Huang, X. Yin, D. Arndt, D. R. Easterling, J. H. Lawrimore, M. J. Menne, A. Sanchez-Lugo, and H. M. Zhang (2022): NOAA Global Surface Temperature Dataset (NOAAGlobalTemp), Version 5.1 [indicate subset used]. NOAA National Centers for Environmental Information. doi.org/10.25921/2tj4-0e21

Available as a text file as an anomaly relative to 1971-2000. The files are updated each month with only the last month kept available.

In [None]:
df = pd.read_table(
    f"https://www.ncei.noaa.gov/data/noaa-global-surface-temperature/v5.1/access/timeseries/aravg.ann.land_ocean.90S.90N.v5.1.0.202311.asc",
    sep=r"\s+",
    usecols=[0, 1],
    index_col=0,
    names=["year", "temp"],
)

da = df.temp.to_xarray().assign_attrs(units="°C")

daWMO = clean(da)

temps.append(daWMO.expand_dims(source=["NOAAGlobalTempv5"]))

### ERA5
Hersbach, H.; Bell, B.; Berrisford, P. et al. The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society 2020, 146 (730), 1999–2049. https://doi.org/10.1002/qj.3803.

TODO

In [None]:
# cat = xs.DataCatalog('/tank/scenario/catalogues/reconstruction.json')

In [None]:
# cat.search(source='ERA5', variable='tas', xrfreq='MS').unique()

### JRA-55
Ebita, A. et al., 2011, The Japanese 55-year Reanalysis "JRA-55": an interim report, SOLA, 7, 149-152.
Kobayashi, S. et al., 2015; The JRA-55 Reanalysis: General Specifications and Basic Characteristics, to be published on JMSJ.

Here the data is a gridded 6-hourly global dataset, so we need to resample yearly and average spatially.

In [None]:
ds = xr.open_mfdataset(
    "/tank/scenario/netcdf/jra/jra55/analysis/tas/*.nc",
    coords="minimal",
    data_vars="minimal",
)

da = xs.spatial_mean(
    ds.tas.resample(time="YS").mean(), method="cos-lat", region="global"
)

da = da.assign_coords(time=da.time.dt.year).rename(time="year")

with ProgressBar():
    daWMO = clean(da).load()
temps.append(daWMO.expand_dims(source=["JRA-55"]))

## Combine all

In [None]:
ds = xr.concat(temps, "source")
ds

In [None]:
# A figure to look at it
fig, ax = plt.subplots(figsize=(10, 3))
ds.plot(ax=ax, hue="source")
ax.set_title("Global Average Temperature - Obs")
ax.set_xlabel("years")
ax.set_ylabel("[°C]")

In [None]:
db = xr.open_dataset("xscen/data/IPCC_annual_global_tas.nc", engine="h5netcdf")
db

In [None]:
ds2 = (
    ds.assign_coords(year=pd.to_datetime(ds.year, format="%Y"))
    .rename(year="time")
    .drop_vars("source")
    .rename(source="simulation")
    .assign_coords(
        source=(("simulation",), ds.source.values),
        data_source=(
            ("simulation",),
            ["Computed following WMO guidelines"] * ds.source.size,
        ),
        mip_era=(("simulation",), ["obs"] * ds.source.size),
        experiment=(("simulation",), ["obs"] * ds.source.size),
        member=(("simulation",), [""] * ds.source.size),
    )
    .rename("tas")
    .to_dataset()
)
ds2

In [None]:
db2 = xr.concat([db, ds2], "simulation")
db2.attrs["description"] = (
    db.attrs["description"]
    + " Observational datasets were also added following the WMO guidelines."
)

In [None]:
db2.to_netcdf("xscen/data/IPCC_annual_global_tas_withObs.nc")

In [None]:
db2.tas.plot(hue="simulation", add_legend=False)