# Compute Zonal Means

Compute the monthly zonal means for the following environmental parameters, from
3 months prior to the start month of the earliest outbreak in our dataset
through the start month of the latest outbreak:

1. Soil moisture (sm)
2. Precipitation (precip)
3. Land Surface Temperature (LST, daytime)

Note that we obtain LST data through CEDA via their OPeNDAP server, which
requires registering for an account at
https://services.ceda.ac.uk/cedasite/register/info/, and then specifying the
username and password for your account within your `.env` file (see `README.md`
for creating your `.env` file).

However, you may run this notebook without creating a CEDA account (and thus
without specifying your credentials in your `.env` file).  In this case, the
computation of zonal means for LST will simply be skipped, and only the soil
moisture and precipitation zonal means will be computed.

Beyond optionally registering for a CEDA account, running this notebook requires
no prerequisites.

In [None]:
%load_ext dotenv
%dotenv

import os
from importlib import reload

import pandas as pd

from cholera import ceda, lst, outbreaks, precip, sm, xarray_ as xr_

# Reload modules to avoid the need to restart the notebook kernel after making
# code changes within the modules.
reload(lst)
reload(precip)
reload(sm)
reload(xr_)
None

Convenience function for writing zonal means to an appropriately named CSV file,
based upon the environmental parameter variable and the monthly time span
covered:

In [None]:
def zonal_means_to_csv(zonal_means: pd.DataFrame) -> str:
    start_year, start_month, _ = zonal_means.index.min()
    end_year, end_month, _ = zonal_means.index.max()
    path = (
        f"../data/zonal-means-{zonal_means.columns[0]}-"
        f"{start_year}{start_month:02d}-{end_year}{end_month:02d}.csv"
    )
    zonal_means.to_csv(path)

    return path

If you have registered for a CEDA account, and you have configured your account
credentials in your `.env` file, then the following code block will ensure that
you have a valid certificate in place to allow you to retrieve LST data via
CEDA's OPeNDAP server.

However, if you haven't done so, then the cell that computes LST zonal means
will simply skip the computation.

In [None]:
username = os.environ.get("CEDA_USERNAME")
password = os.environ.get("CEDA_PASSWORD")
ceda_cert_expiration = (
    ceda.auth(username=username, password=password) if username and password else None
)

print(
    f"CEDA OPeNDAP certificate is valid through {ceda_cert_expiration.isoformat()} UTC"
    if ceda_cert_expiration
    else "Skipping CEDA auth check since a username and password were not configured"
)

Obtain the outbreak region geometries, and compute the monthly time span
starting 3 months prior to the start month of the earliest outbreak in our
dataset through the start month of the latest outbreak:

In [None]:
regions = outbreaks.regions()
start_months = outbreaks.start_month_range()

# Entire monthly range of outbreaks, with a 3-month prelude, and setting the
# end date to the end of the month.
all_months = slice(
    start_months[0] - pd.offsets.MonthBegin(3),
    start_months[-1] + pd.offsets.MonthEnd(),
)
print(all_months)

Compute the soil moisture (sm) zonal means and write them to a CSV file:

In [None]:
sm_zonal_means = sm.zonal_means(regions, time=all_months)
sm_csv = zonal_means_to_csv(sm_zonal_means)
print(f"Wrote soil moisture zonal means to {sm_csv}")
sm_zonal_means

Compute the precipitation (precip) zonal means and write them to a CSV file:

In [None]:
precip_zonal_means = precip.zonal_means(regions, time=all_months)
precip_csv = zonal_means_to_csv(precip_zonal_means)
print(f"Wrote precipitaton zonal means to {precip_csv}")
precip_zonal_means

If you registered a CEDA account, and specified your credentials in your `.env`
file, then compute the land surface temperature (LST) zonal means and save them
to a CSV file (otherwise, skip the computation).

**IMPORTANT:** The CEDA OPeNDAP server appears to be flaky, at least when making
many concurrent requests.  Therefore, at the very least, you may see several
errors when LST zonal means are computed by the following cell, along with
multiple automatic retry attempts.  Sometimes, the retries work, and sometimes
they don't.  In the event that the computation ultimately fails, you may have to
rerun the cell, possibly multiple times, until the computation succeeds.
Further, this computation takes _much_ longer than the computations above --
perhaps 20-30 minutes, or even more, depending upon your computer's resources.

In [None]:
if ceda_cert_expiration:
    lst_zonal_means = lst.zonal_means(regions, time=all_months)
    lst_csv = zonal_means_to_csv(lst_zonal_means)
    print(f"Wrote land surface temperature zonal means to {lst_csv}")
    display(lst_zonal_means)