# Prepare HadEX3 TXx dataset


Prepare the annual maximum temperature (TXx) dataset from Dunn et al. ([2020](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019JD032263)).

Dunn, et al. (2020), Development of an updated global land in-situ-based dataset of temperature and precipitation extremes: HadEX3.

In [None]:
import warnings

import pooch
import xarray as xr

from statistics import theil_ufunc

In [None]:
logger = pooch.get_logger()
logger.setLevel("WARNING")

### Download and cache the file

Not setting a `know_hash` - the file may change upstream but this does not matter as it is only used as an example.

In [None]:
name = "HadEX3_TXx_ANN.nc"

file = pooch.retrieve(
    f"https://www.metoffice.gov.uk/hadobs/hadex3/data/{name}.gz",
    known_hash=None,
    path="./rawdata/HadEX3",
    fname=f"{name}.gz",
    processor=pooch.Decompress(name=name),
)

In [None]:
# we would get 13 warnings when reading HadEX3 data
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", message="variable '.*' has multiple fill values")
    ds = xr.open_dataset(file)

# rename some dimensions
ds = ds.rename(latitude="lat", longitude="lon")

# use only data after 1950
ds = ds.sel(time=slice("1950", None))

## Calculate regression slope

Using a Theil-Sen estimator. Not all gridpoints have full coverage, we need to mask gridpoints that have not enough data.

Plot the fraction of valid timesteps:

In [None]:
TXx = ds.TXx

fraction_valid = TXx.notnull().sum("time") / len(TXx.time)

fraction_valid.plot()

Select gridpoints with a coverage of > 0.66

In [None]:
# select
TXX = TXx.where(fraction_valid > 0.66)
TXX.isel(time=-1).plot()

In [None]:
trend, is_significant = theil_ufunc(TXx)

In [None]:
ds = ds.assign(trend=trend, is_significant=is_significant)

ds.trend.attrs["long_name"] = "TXx_trend"
ds.trend.attrs["units"] = "°C / year"
ds.trend.attrs["comment"] = "°C / year"

### Save

In [None]:
ds.to_netcdf("HadEX3_TXx_ANN.nc")