# Using waterbodies for basin and national scale assessment

* **Products used:** 
[ls8_sr](https://explorer.digitalearth.africa/ls8_sr), 
[wofs_ls_summary_annual](https://explorer.digitalearth.africa/wofs_ls_summary_annual),
* **Special requirements:** An _optional_ description of any special requirements
* **Prerequisites:** An _optional_ list of any notebooks that should be run or content that should be understood prior to launching this notebook


## Background
An *optional* overview of the scientific, economic or environmental management issue or challenge being addressed by Digital Earth Africa. 
For `Beginners_Guide` or `Frequently_Used_Code` notebooks, this may include information about why the particular technique or approach is useful or required. 
If you need to cite a scientific paper or link to a website, use a persistent DOI link if possible and link in-text (e.g. [Dhu et al. 2017](https://doi.org/10.1080/20964471.2017.1402490)).

## Description
A _compulsory_ description of the notebook, including a brief overview of how Digital Earth Africa helps to address the problem set out above.
It can be good to include a run-down of the tools/methods that will be demonstrated in the notebook:

1. First we do this
2. Then we do this
3. Finally we do this

***

## Getting started

Provide any particular instructions that the user might need, e.g. To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

### Load packages
Import Python packages that are used for the analysis.

Use standard import commands; some are shown below. 
Begin with any `iPython` magic commands, followed by standard Python packages, then any additional functionality you need from the `Tools` package.

In [1]:
import matplotlib.pyplot as plt
import datacube
import dask
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.animation as animation

from deafrica_tools.plotting import display_map
from IPython.display import HTML
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.dask import create_local_dask_cluster

from deafrica_tools.waterbodies import get_waterbodies, get_time_series

### Connect to the datacube

Connect to the datacube so we can access DE Africa data.
The `app` parameter is a unique name for the analysis which is based on the notebook file name.

In [2]:
dc = datacube.Datacube(app="Waterbody-basin")

In [3]:
create_local_dask_cluster()

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /user/mickwelli@bigpond.com/proxy/8787/status,

0,1
Dashboard: /user/mickwelli@bigpond.com/proxy/8787/status,Workers: 1
Total threads: 2,Total memory: 11.21 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:45801,Workers: 1
Dashboard: /user/mickwelli@bigpond.com/proxy/8787/status,Total threads: 2
Started: Just now,Total memory: 11.21 GiB

0,1
Comm: tcp://127.0.0.1:35069,Total threads: 2
Dashboard: /user/mickwelli@bigpond.com/proxy/34785/status,Memory: 11.21 GiB
Nanny: tcp://127.0.0.1:43165,
Local directory: /tmp/dask-scratch-space/worker-wv6aw7wl,Local directory: /tmp/dask-scratch-space/worker-wv6aw7wl


### Analysis parameters

An *optional* section to inform the user of any parameters they'll need to configure to run the notebook:

* `param_name_1`: Simple description (e.g. `example_value`). Advice about appropriate values to choose for this parameter.
* `param_name_2`: Simple description (e.g. `example_value`). Advice about appropriate values to choose for this parameter.


In [4]:
basins = gpd.read_file("../Supplementary_data/Waterbodies/HydroBasins_L3/hybas_lake_af_lev03_v1c.geojson")

In [5]:
zambezi = basins[basins["SORT"] == 6]

In [6]:
bbox = zambezi.bounds
bbox = (bbox.minx.values[0], bbox.miny.values[0], bbox.maxx.values[0], bbox.maxy.values[0])

In [7]:
zambezi_waterbodies = get_waterbodies(bbox, crs="EPSG:4326").clip(zambezi)

In [None]:
#zambezi_waterbodies.explore()

In [128]:
zambezi_waterbodies

Unnamed: 0,id,wb_id,area_m2,length_m,uid,perim_m,last_obs_date,last_valid_obs_date,last_valid_obs,last_attrs_update_date,geometry
42085,DEAfrica_Waterbodies.ks8wp41gvn,271954,17100.0,180.000244,ks8wp41gvn,600.0,2025-01-09,2024-12-25,0.0,2025-01-20,"POLYGON ((23.5137 -18.6204, 23.5137 -18.6206, ..."
22653,DEAfrica_Waterbodies.ks8wp42x1p,271955,91800.0,688.118412,ks8wp42x1p,2040.0,2025-01-09,2024-12-25,0.0,2025-01-20,"POLYGON ((23.514 -18.6169, 23.514 -18.6179, 23..."
22655,DEAfrica_Waterbodies.ks8wp4gvcy,271956,24300.0,295.255814,ks8wp4gvcy,1020.0,2025-01-09,2024-12-25,0.0,2025-01-20,"POLYGON ((23.5156 -18.6159, 23.5156 -18.6162, ..."
22652,DEAfrica_Waterbodies.ks8wnghgdr,271953,21600.0,270.000000,ks8wnghgdr,900.0,2025-01-09,2024-12-25,0.0,2025-01-20,"POLYGON ((23.5062 -18.6147, 23.5062 -18.6149, ..."
27474,DEAfrica_Waterbodies.ks8wnge1sk,271952,5400.0,90.000000,ks8wnge1sk,300.0,2025-01-09,2024-12-25,0.0,2025-01-20,"POLYGON ((23.5044 -18.6129, 23.5044 -18.6137, ..."
...,...,...,...,...,...,...,...,...,...,...,...
94831,DEAfrica_Waterbodies.kw0bp4vzdd,347840,6300.0,120.000000,kw0bp4vzdd,360.0,2025-01-16,2024-12-16,0.0,2025-01-20,"POLYGON ((23.8706 -11.2332, 23.8706 -11.2339, ..."
10149,DEAfrica_Waterbodies.kw10dp6xbk,348434,22500.0,282.897100,kw10dp6xbk,840.0,2025-01-09,2024-12-16,0.0,2025-01-20,"POLYGON ((23.9984 -11.1201, 23.9984 -11.1203, ..."
10150,DEAfrica_Waterbodies.kw10dp91qe,348435,8100.0,216.088217,kw10dp91qe,600.0,2025-01-09,2024-12-16,0.0,2025-01-20,"POLYGON ((23.9956 -11.1201, 23.9956 -11.1203, ..."
10152,DEAfrica_Waterbodies.kw10f0x67b,348438,10800.0,180.000000,kw10f0x67b,540.0,2025-01-09,2024-12-16,0.0,2025-01-20,"POLYGON ((24.0047 -11.1143, 24.0047 -11.1155, ..."


In [8]:
zambezi_waterbodies.uid.iloc[0]

'ks8wp41gvn'

In [9]:
zambezi_ts = get_time_series(zambezi_waterbodies.uid.iloc[0])
zambezi_ts['uid']=zambezi_waterbodies.uid.iloc[0]

In [10]:
zambezi_ts

Unnamed: 0_level_0,area_wet_m2,percent_wet,area_dry_m2,percent_dry,area_invalid_m2,percent_invalid,area_observed_m2,percent_observed,percent_wet_rolling_median,uid
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1984-06-02,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,,ks8wp41gvn
1984-09-06,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,,ks8wp41gvn
1984-09-22,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,0.0,ks8wp41gvn
1984-10-08,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,0.0,ks8wp41gvn
1986-12-17,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,0.0,ks8wp41gvn
...,...,...,...,...,...,...,...,...,...,...
2024-10-30,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,0.0,ks8wp41gvn
2024-11-15,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,0.0,ks8wp41gvn
2024-12-01,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,0.0,ks8wp41gvn
2024-12-09,0.0,0.0,17100.0,100.0,0.0,0.0,17100.0,100.0,0.0,ks8wp41gvn


In [95]:
df = []
df.append(get_time_series(zambezi_waterbodies.uid.iloc[0])[["area_wet_m2", "area_invalid_m2"]].loc['2000-01-01':].groupby(pd.Grouper(freq='M')).mean().assign(uid=zambezi_waterbodies.uid.iloc[0]))

# 25*12 = 300

In [133]:
"""
Loading and processing DE Africa Water Bodies data.
Last modified: November 2023
"""

from datetime import datetime

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from matplotlib.patches import Patch
from owslib.etree import etree
from owslib.fes import PropertyIsEqualTo
from owslib.wfs import WebFeatureService

# URL for the DE Africa Water Bodies data on PROD Geoserver.
WFS_ADDRESS = "https://geoserver.digitalearth.africa/geoserver/wfs"
WFS_LAYER = "waterbodies:DEAfrica_Waterbodies"
API_ADDRESS = "https://api.digitalearth.africa/waterbodies/"


def get_waterbody(geohash: str) -> gpd.GeoDataFrame:
    """Gets a waterbody polygon and metadata by geohash.

    Parameters
    ----------
    geohash : str
        The geohash/uid for a waterbody in DE Africa Water Bodies.

    Returns
    -------
    gpd.GeoDataFrame
        A GeoDataFrame with the polygon.
    """

    wfs = WebFeatureService(url=WFS_ADDRESS, version="1.1.0")
    filter_ = PropertyIsEqualTo(propertyname="uid", literal=geohash)
    filterxml = etree.tostring(filter_.toXML()).decode("utf-8")
    response = wfs.getfeature(
        typename=WFS_LAYER,
        filter=filterxml,
        outputFormat="json",
    )
    wb_gpd = gpd.read_file(response)
    return wb_gpd


def get_waterbodies(bbox: tuple, crs: str = "EPSG:4326") -> gpd.GeoDataFrame:
    """Gets the polygons and metadata for multiple water bodies by bbox.

    Parameters
    ----------
    bbox : (xmin, ymin, xmax, ymax)
        Bounding box.
    crs : str
        Optional CRS for the bounding box.

    Returns
    -------
    gpd.GeoDataFrame
        A GeoDataFrame with the polygons and metadata.
    """

    wfs = WebFeatureService(url=WFS_ADDRESS, version="1.1.0")
    response = wfs.getfeature(
        typename=WFS_LAYER,
        bbox=tuple(bbox) + (crs,),
        outputFormat="json",
    )
    wb_gpd = gpd.read_file(response)
    return wb_gpd


def get_geohashes(bbox: tuple = None, crs: str = "EPSG:4326") -> list[str]:
    """Gets all waterbody geohashes.

    Parameters
    ----------
    bbox : (xmin, ymin, xmax, ymax)
        Optional bounding box.
    crs : str
        Optional CRS for the bounding box.

    Returns
    -------
    [str]
        A list of geohashes.
    """

    wfs = WebFeatureService(url=WFS_ADDRESS, version="1.1.0")
    if bbox is not None:
        bbox = bbox + (crs,)
    response = wfs.getfeature(
        typename=WFS_LAYER,
        propertyname="uid",
        outputFormat="json",
        bbox=bbox,
    )
    wb_gpd = gpd.read_file(response)
    return list(wb_gpd["uid"])


def get_time_series(
    geohash: str = None,
    waterbody: pd.Series = None,
    start_date: str = "1984-01-01",
    end_date: str = datetime.now().strftime("%Y-%m-%d"),
) -> pd.DataFrame:
    """Gets the time series for a waterbody. Specify either a GeoDataFrame row or a geohash.

    Parameters
    ----------
    geohash : str
        The geohash/uid for a waterbody in DE Africa Water Bodies.
    waterbody : pd.Series
        One row of a GeoDataFrame representing a waterbody.
    start_date : str
        Start date for the time range to filter the timeseries to.
    end_date : str
        End date for the time range to filter the timeseries to.
    Returns
    -------
    pd.DataFrame
        A time series for the waterbody.
    """
    if waterbody is not None and geohash is not None:
        raise ValueError("One of waterbody and geohash must be None")
    if waterbody is None and geohash is None:
        raise ValueError("One of waterbody and geohash must be specified")

    if geohash is not None:
        wb = get_waterbody(geohash)
        wb_id = wb.wb_id.item()
    else:
        wb_id = waterbody.wb_id.item()
    url = (
        API_ADDRESS
        + f"waterbody/{wb_id}/observations/csv?start_date={start_date}&end_date={end_date}"
    )
    wb_timeseries = pd.read_csv(url, usecols=["date","area_wet_m2", "area_invalid_m2"], index_col = 'date', engine="pyarrow")
    # Tidy up the dataframe.
    #wb_timeseries = wb_timeseries.set_index("date")
    wb_timeseries.index = pd.to_datetime(wb_timeseries.index)
    # Create a rolling median for the wet time series
    #wb_timeseries["percent_wet_rolling_median"] = wb_timeseries["percent_wet"].rolling(3).median()

    return wb_timeseries

In [134]:
%%time

df = []
for i in range(len(zambezi_waterbodies.uid[0:50])):
    df.append(get_time_series(
        zambezi_waterbodies.uid.iloc[i], start_date = '2000-01-01').groupby(
        pd.Grouper(freq='M')).mean().assign(uid=zambezi_waterbodies.uid.iloc[i]))


CPU times: user 4.74 s, sys: 569 ms, total: 5.31 s
Wall time: 38.5 s


In [132]:
pd.concat(df)[["area_wet_m2", "area_invalid_m2"]].groupby(
        pd.Grouper(freq='M')).sum()

Unnamed: 0_level_0,area_wet_m2,area_invalid_m2
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-01-31,1.280700e+06,2700.000000
2000-02-29,0.000000e+00,0.000000
2000-03-31,0.000000e+00,0.000000
2000-04-30,1.269450e+06,217800.000000
2000-05-31,6.345000e+05,145800.000000
...,...,...
2024-09-30,4.247245e+06,2207.142857
2024-10-31,3.927150e+06,1028.571429
2024-11-30,6.292500e+04,7200.000000
2024-12-31,3.765887e+06,2580.000000


In [41]:
!python -V

Python 3.12.3


In [22]:
enumerate(zambezi_waterbodies.uid[0:100])

TypeError: 'enumerate' object is not subscriptable

In [None]:
dask.compute(*df)

In [None]:
pd.concat(df.compute)

## Heading 1
Use headings to break up key steps/stages of the notebook.

Use markdown text for detailed, descriptive text explaining what the code below does and why it is needed.

> **Note:** Use this markdown format (sparingly) to draw particular attention to an important point or caveat

In [None]:
# Use code comments for low-level documentation of code
a = 1

### Subheading 1
Use subheadings to break up steps within a single section.

In [None]:
# Use code comments for low-level documentation of code
b = 2

## Heading 2
Use markdown text for detailed, descriptive text explaining what the code below does and why it is needed.

In [None]:
# Use code comments for low-level documentation of code
c = 3

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Compatible datacube version:** 

In [None]:
print(datacube.__version__)

**Last Tested:**

In [None]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')