# Exploring Land Surface Temperature

[ESA Land Surface Temperature Climate Change Initiative (LST_cci): Monthly Multisensor Infra-Red (IR) Low Earth Orbit (LEO) land surface temperature (LST) time series level 3 supercollated (L3S) global product (1995-2020), version 2.00](https://catalogue.ceda.ac.uk/uuid/785ef9d3965442669bff899540747e28).


In [None]:
import typing
import warnings
from datetime import datetime

import geopandas as gpd
import numpy as np
import pooch
import pandas as pd
import regionmask

# rioxarray is not directly referenced, but its `rio` extension of `xarray` is
import rioxarray
import xarray as xr
import xrspatial.zonal
from shapely.errors import ShapelyDeprecationWarning

warnings.filterwarnings("ignore", category=ShapelyDeprecationWarning)

## Fetch and Open Land Surface Temperature Data File

Fetch the Land Surface Temperature (LST) data for a specific date from the
[Centre for Environmental Data Analysis Archive](https://archive.ceda.ac.uk/)
(CEDA Archive), and subset it to Africa's bounding box.

In [None]:
P = typing.ParamSpec("P")


def global_lst_file(date: datetime) -> str:
    """Fetch and cache global land surface temperature data file for a single day.

    Return path to locally cached file.
    """
    y = date.year
    m = date.month

    # Can check status of CEDA core archives at https://stats.uptimerobot.com/vZPgQt7YnO
    # Currently `dap` is down.

    lst_data_url = (
        "https://dap.ceda.ac.uk/neodc/esacci/land_surface_temperature/data/"
        f"MULTISENSOR_IRCDR/L3S/0.01/v2.00/monthly/{y}/{m:02d}/"
        f"ESACCI-LST-L3S-LST-IRCDR_-0.01deg_1MONTHLY_DAY-{y}{m:02d}01000000-fv2.00.nc"
        # Must add `#mode=bytes` to the end.
        # See https://github.com/Unidata/netcdf4-python/issues/1043
        "#mode=bytes"
    )

    return pooch.retrieve(lst_data_url, None)


def open_global_lst_dataset(
    date: datetime,
    open_dataset: typing.Callable[typing.Concatenate[str, P], xr.Dataset],
    *args: P.args,
    **kwargs: P.kwargs,
) -> xr.Dataset:
    """Open land surface temperature dataset for a particular day.

    Return 2D ``xarray.Dataset`` of the data in CRS EPSG:4326 by calling the
    ``open_dataset`` callable with the path to the LST data file for the specified
    ``date`` as the first argument, followed by all additional ``args`` and ``kwargs``,
    if provided.

    Args
    ----
    date:
        desired date (day, excluding time) of LST data
    open_dataset:
        callable to use to open the dataset (e.g., ``xarray.open_dataset``), which
        must accept a source file (``str``) as its first positional argument, and
        return an ``xarray.Dataset`` when called
    *args:
        additional positional arguments (following the path to the LST data file) to
        pass to ``open_dataset``
    **kwargs:
        keyword arguments to pass to ``open_dataset``
    """
    from rioxarray.raster_dataset import RasterDataset

    global_lst_ds = open_dataset(global_lst_file(date), *args, **kwargs)
    global_lst_rio = typing.cast(RasterDataset, global_lst_ds.rio)

    return global_lst_rio.write_crs("EPSG:4326").squeeze(drop=True)

## Obtain Geometries for Cholera Outbreak Regions

We'll select the outbreaks with an "admin2" spatial scale, obtain their distinct 
regions, and join them with our shapefile to obtain their geometries.


First, let's read our outbreak data and select all outbreaks in admin2 regions:

In [None]:
admin2_outbreaks_df = (
    pd.read_csv("data/outbreak_data.csv", parse_dates=["start_date", "end_date"])
    .assign(
        start_year=lambda df: df.start_date.dt.year,
        start_month=lambda df: df.start_date.dt.month,
        duration_in_months=lambda df: np.ceil(
            (df.end_date - df.start_date) / np.timedelta64(1, "M")
        ).astype(int),
    )
    .query("spatial_scale == 'admin2'")
)

admin2_outbreaks_df

Select the distinct location period IDs, so we can select their geometries from
our shapefile:

In [None]:
admin2_location_period_id_df = admin2_outbreaks_df[
    ["location_period_id"]
].drop_duplicates()

admin2_location_period_id_df

Read our shapefile:

In [None]:
location_period_id_gdf = gpd.read_file(
    "data/AfricaShapefiles/total_shp_0427.shp"
).rename(columns={"lctn_pr": "location_period_id"})

location_period_id_gdf

Merge our distinct location period IDs with the shapefile to obtain the
geometries for only our distinct admin2 outbreak regions:

In [None]:
admin2_gdf = typing.cast(
    gpd.GeoDataFrame,
    location_period_id_gdf.merge(
        admin2_location_period_id_df,
        how="inner",
        on="location_period_id",
    ),
)

admin2_gdf

In [None]:
display(admin2_gdf.geometry[0])
display(admin2_gdf.geometry[1])

In [None]:
admin2_outbreaks_df.query("location_period_id in [4754, 8540]")

In [None]:
display(admin2_gdf.crs)
admin2_gdf.boundary.plot()

In [None]:
# If you want to look at the entire continent, and compute zonal stats for it,
# uncomment the lines below, which will reassign admin2_gdf to all admin2
# regions in Africa, instead of only the admin2 regions from our outbreak data.

# admin2_gdf = typing.cast(
#     gpd.GeoDataFrame,
#     gpd.read_file(
#         pooch.retrieve(
#             "https://geoportal.icpac.net/geoserver/ows?service=WFS"
#             "&version=1.0.0"
#             "&request=GetFeature"
#             "&typename=geonode%3Aafr_g2014_2013_2"
#             "&outputFormat=json"
#             "&srs=EPSG%3A4326"
#             "&srsName=EPSG%3A4326",
#             None,
#         )
#     ),
# )

# display(admin2_gdf)
# display(admin2_gdf.crs)

# admin2_gdf.boundary.plot()

## Compute Zonal Mean Land Surface Temperatures

Compute zones so we can compute zonal statistics.

**WARNING:** This takes several minutes to compute!

Pick a date for land surface temperatures:

In [None]:
# Arbitrarily using day 1 of month, because only year and month matter, but day
# is a required argument.
lst_date = datetime(2020, 11, 1)

Read land surface temperatures for a specific date using `xarray`:


In [None]:
global_ds = open_global_lst_dataset(lst_date, xr.open_dataset)
global_ds

Select only the data for our area of interest:

In [None]:
minx, miny, maxx, maxy = admin2_gdf.total_bounds

africa_ds = global_ds.sel(
    lon=slice(minx, maxx),
    lat=slice(miny, maxy),
)

africa_ds

Extract the `lst` data variable, convert from Kelvin to Celsius, and plot:

In [None]:
africa_lst_celsius_da = africa_ds.lst - 273.15
africa_lst_celsius_da.plot(cmap="coolwarm")

In [None]:
zones_da = typing.cast(
    xr.DataArray,
    regionmask.mask_geopandas(
        admin2_gdf,
        africa_ds.lon,
        africa_ds.lat,
    ),
)

zones_da

In [None]:
zones_da.values[~np.isnan(zones_da.values)].min()

In [None]:
zones_da.plot()  # type: ignore

Compute zonal means for LST:

In [None]:
pd.set_option("display.max_rows", None)

mean_lst_df: pd.DataFrame = (
    xrspatial.zonal.stats(zones_da, africa_lst_celsius_da, stats_funcs=["mean"])
    .set_index("zone")  # type: ignore
    .rename(columns={"mean": "mean_lst"})  # type: ignore
)
mean_lst_df

Join the means to the geometries and plot the zonal means.  Note that for some
reason, there are 6212 means, but 6226 geometries (`africa_admin2_gdf`), so we
had to set `"zone"` as the index on `means_df` so the values are correctly
aligned:

In [None]:
mean_lst_gdf = admin2_gdf.join(mean_lst_df, how="inner").dropna()
mean_lst_gdf

Looking at the total number of zones below, we see that 479 of the 494 districts contain LST data for this month as `.dropna()` has removed those zones without any mean LST data for that month. 

In [None]:
len(mean_lst_df)

In [None]:
mean_lst_gdf.plot("mean_lst", cmap="coolwarm", legend=True)  # type: ignore

Overlaying our district boundaries over our mean monthly LST data, we can identify (visually) those districts where data is missing. 

In [None]:
ax = admin2_gdf.boundary.plot(alpha=0.15, color="black")
mean_lst_gdf.plot("mean_lst", ax=ax, zorder=-1, cmap="coolwarm", legend=True)