# Mapping water extent and rainfall using Sentinel-1 and CHIRPS

* **Products used:** 
[s1_rtc](https://explorer.digitalearth.africa/products/s1_rtc),
[rainfall_chirps_monthly](https://explorer.digitalearth.africa/products/rainfall_chirps_monthly)

## Background
The United Nations have prescribed 17 "Sustainable Development Goals" (SDGs). This notebook attempts to monitor SDG Indicator 6.6.1 - change in the extent of water-related ecosystems. Indicator 6.6.1 has 4 sub-indicators:
>    i. The spatial extent of water-related ecosystems <br>
>    ii. The quantity of water contained within these ecosystems <br>
>    iii. The quality of water within these ecosystems <br>
>    iv. The health or state of these ecosystems <br>

This notebook primarily focuses on the first sub-indicator - spatial extents.

A couple of very instructive papers on Lake Chad

- [The Lake Chad hydrology under current climate change](https://www.nature.com/articles/s41598-020-62417-w)
- [Recent Surface Water Extent of Lake Chad from Multispectral Sensors and GRACE](https://www.mdpi.com/1424-8220/18/7/2082)

## Description

The notebook demonstrates how to:

1. Load Sentinel-1 data over the water body of interest
2. Calculate the water index SWI
3. Resample the time-series SWI to monthly medians
4. Generate an animation of the water extent time-series
5. Calculate and plot a time series of seasonal water extent (in square kilometres)
6. Compare two nominated time-periods, and plot where the water-body extent has changed.
7. Compare the water extent area from SWI to the Rainfall CHIRPS data

***

## Getting started
To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

### Load packages
Import Python packages that are used for the analysis.

In [None]:
%matplotlib inline

import datacube
import matplotlib.pyplot as plt
import numpy as np
import sys
import xarray as xr
import geopandas as gpd

from IPython.display import Image
from matplotlib.colors import ListedColormap
from matplotlib.patches import Patch
from skimage.filters import threshold_li

from deafrica_tools.datahandling import load_ard
from deafrica_tools.plotting import display_map, xr_animation
from deafrica_tools.dask import create_local_dask_cluster
from datacube.utils.aws import configure_s3_access

configure_s3_access(aws_unsigned=True, cloud_defaults=True)

### Connect to the datacube

Activate the datacube database, which provides functionality for loading and displaying stored Earth observation data.

In [None]:
dc = datacube.Datacube(app='water_extent')

## Set up a Dask cluster

Dask can be used to better manage memory use and conduct the analysis in parallel. 
For an introduction to using Dask with Digital Earth Africa, see the [Dask notebook](../Beginners_guide/08_parallel_processing_with_dask.ipynb).

>**Note**: We recommend opening the Dask processing window to view the different computations that are being executed; to do this, see the *Dask dashboard in DE Africa* section of the [Dask notebook](../Beginners_guide/08_parallel_processing_with_dask.ipynb).

To activate Dask, set up the local computing cluster using the cell below.

In [None]:
create_local_dask_cluster()

### Analysis parameters

The following cell sets the parameters, which define the area of interest and the length of time to conduct the analysis over.

The parameters are:

* `vector_file`: The path to the shapefile or geojson that will define the analysis area of the study.
* `time_range` : The date range to analyse (e.g. `('2018', '2020')`.
* `resolution` : The pixel resolution of the satellite data.
* `dask_chunks`: Chunk sizes to use for dask.

**If running the notebook for the first time**, keep the default settings below.
This will demonstrate how the analysis works and provide meaningful results.
The example covers Lake Chad. 

**Current default is Lake Sulunga, Tanzania.**

In [None]:
# Define the area of interest .
vector_file = 'data/lake_chad_extent.geojson'
gdf = gpd.read_file(vector_file)

bbox=list(gdf.total_bounds)
lon_range = (bbox[0], bbox[2])
lat_range = (bbox[1], bbox[3])

# # Define the area of interest.
# lat = 15.3066
# lon = -3.8041

# lat_buffer = 0.075
# lon_buffer = 0.075

# # Combine central lat,lon with buffer to get area of interest.
# lat_range = (lat - lat_buffer, lat + lat_buffer)
# lon_range = (lon - lon_buffer, lon + lon_buffer)

# Define the start year and end year.
time_range = ("2018", "2021")

# Define the resolution to load the datasets in.
resolution = (-20, 20)

# Define the dask chunks to be used.
dask_chunks = {"time": 1, "x": 1500, "y": 1500}

## View the area of Interest on an interactive map
The next cell will display the selected area on an interactive map.
The red border represents the area of interest of the study.
Zoom in and out to get a better understanding of the area of interest.
Clicking anywhere on the map will reveal the latitude and longitude coordinates of the clicked point.

In [None]:
# display_map(lon_range, lat_range)

## Load the Sentinel-1 data
The code below will create a query dictionary for our region of interest, and then load the Sentinel-1 satellite data.
For more information on loading data, see the [Loading data notebook](../Beginners_guide/03_Loading_data.ipynb).

In [None]:
# Create a query object.
query = {
    "x": lon_range,
    "y": lat_range,
    "time": time_range,
    "resolution": resolution,
    "output_crs": "EPSG:6933",
    "dask_chunks": dask_chunks,
    "group_by": "solar_day",
}

In [None]:
# Load Sentinel-1 data.
ds_S1 = load_ard(dc=dc, products=["s1_rtc"], measurements=["vv", "vh"], **query)

print(ds_S1)

## Convert the Sentinel-1 digital numbers to dB

While Sentinel-1 backscatter is provided as linear intensiy, it is often useful to convert the backscatter to decible (dB) for analysis. 
Backscatter in dB unit has a more symmetric noise profile and less skewed value distribution for easier statistical evaluation.

The Sentinel-1 backscatter data is converted from digital number (DN) to backscatter in decibel unit (dB) using the function:

\begin{equation}
10 * \log_{10}(\text{DN})
\end{equation}

In [None]:
# Convert DN to db values.
ds_S1["vv_db"] = 10 * xr.ufuncs.log10(ds_S1.vv)
ds_S1["vh_db"] = 10 * xr.ufuncs.log10(ds_S1.vh)

## Calculate the SWI water index

The Sentinel-1A water index (SWI) is calculated as follows:

\begin{equation} 
\text{SWI} = 0.1747 * \beta _{vv} + 0.0082 * \beta _{vh} * \beta _{vv} + 0.0023 * \beta _{vv}^{2} - 0.0015 * \beta _{vh}^{2} + 0.1904
\end{equation}

where  βvh and βvv represent the backscattering coefficient in VH polarization and VV polarization, respectively ([Tian et al., 2017](https://doi.org/10.3390/rs9060521)). 

In [None]:
# Calculate the Sentinel-1A Water Index (SWI).
swi = (
    (0.1747 * ds_S1.vv_db)
    + (0.0082 * ds_S1.vh_db * ds_S1.vv_db)
    + (0.0023 * ds_S1.vv_db ** 2)
    - (0.0015 * ds_S1.vh_db ** 2)
    + 0.1904
)

print(swi)

## Resample time series

Due to many factors (e.g. speckles) the data will be gappy and noisy. Here, we will resample the data to ensure we are working with a consistent time-series.

To do this, we resample the data to monthly time-steps using `max`. This will show us the maximum value of SWI for each month for each pixel, or in other words, the maximum water extent as measured by the SWI index.


In [None]:
%%time
# Resample using medians.
print("calculating SWI monthly maximum...")
swi = swi.resample(time="1M").max().compute()

### Plot the SWI water extent 

In [None]:
# swi.isel(time=1).plot.imshow(cmap="RdBu", figsize=(6,6));

## Determine a threshold to define water extent

There are several ways to determine the threshold. Here, we use the `threshod_li` function implemented in the `skimage` package to determine the threshold from SWI automatically.

In [None]:
threshold = threshold_li(swi.values)
print(threshold)

### Visualise threshold

To check if our chosen threshold reasonably divides the two distributions, we can add the threshold to a histogram plot


In [None]:
fig, ax = plt.subplots(figsize=(15, 3))
swi.plot.hist(bins=5000, label="VH filtered")
plt.xlim(-1, 4)
ax.axvspan(xmin=-1, xmax=threshold, alpha=0.25, color="red", label="Not Water")
ax.axvspan(xmin=threshold,
           xmax=4,
           alpha=0.25,
           color="green",
           label="Water")
plt.legend()
plt.xlabel("SWI")
plt.title("Effect of the classifier")
plt.show()

## Calculating the extent of open water

First we need to determine the area each pixel covers, then we apply the automatically determined threshold to delneate water extent.

In [None]:
pixel_length = query["resolution"][1]  # in metres
m_per_km = 1000  # conversion from metres to kilometres
area_per_pixel = pixel_length ** 2 / m_per_km ** 2

In [None]:
water_swi = xr.where(swi > threshold, 1, 0)
ds_valid_water_area_swi = water_swi.sum(dim=["x", "y"]) * area_per_pixel

## Export time-series as csv

In [None]:
ds_valid_water_area_swi.rename('swi').to_dataframe().drop("spatial_ref", axis=1).rename(
    {"SWI": "Area of waterbodies from SWI (km2)"}, axis=1
).to_csv(f"results/SWI_water_extent_{time_range[0]}_to_{time_range[1]}.csv")

## Plot a time series of open water area

In [None]:
plt.figure(figsize=(18, 4))
ds_valid_water_area_swi.plot(marker="o", color="#9467bd")
plt.title(f"Observed Area of Water from {time_range[0]} to {time_range[1]} from SWI")
plt.xlabel("Dates")
plt.ylabel("Waterbody area (km$^2$)")
plt.tight_layout()
plt.savefig(f"results/SWI_water_extent_{time_range[0]}_to_{time_range[1]}.png")

## Compare water extent between two periods
 
* `baseline_time` : The baseline year for the analysis
* `analysis_time` : The year to compare to the baseline year

In [None]:
baseline_time = "2019-05-31"
analysis_time = "2019-08-31"

Create a new Data Array to store the baseline and analysis dates.

In [None]:
time_xr_swi = xr.DataArray([baseline_time, analysis_time], dims=["time"])

## Plotting

Plot water extent of the SWI product for the two chosen periods.

In [None]:
water_swi.sel(time=time_xr_swi).plot(
    col="time",
    col_wrap=2,
    robust=True,
    figsize=(10, 5),
    cmap="viridis",
    add_colorbar=False,
);

## Calculating the change for the two nominated periods
The cells below calculate the amount of water gain, loss and stable for the two periods.

In [None]:
# Extract the two periods(Baseline and analysis) dataset from
ds_selected_swi = water_swi.where(water_swi == 1, 0).sel(time=time_xr_swi)

analyse_total_value_swi = ds_selected_swi[1]
change_swi = analyse_total_value_swi - ds_selected_swi[0]

water_appeared_swi = change_swi.where(change_swi == 1)
permanent_water_swi = change_swi.where(
    (change_swi == 0) & (analyse_total_value_swi == 1)
)
permanent_land_swi = change_swi.where(
    (change_swi == 0) & (analyse_total_value_swi == 0)
)
water_disappeared_swi = change_swi.where(change_swi == -1)

The cells below calculate the area of water extent for water_loss, water_gain, permanent water and land.

In [None]:
total_area_swi = analyse_total_value_swi.count().values * area_per_pixel
water_apperaed_area_swi = water_appeared_swi.count().values * area_per_pixel
permanent_water_area_swi = permanent_water_swi.count().values * area_per_pixel
water_disappeared_area_swi = water_disappeared_swi.count().values * area_per_pixel

## Plotting
The water variables are plotted to visualise the result.

In [None]:
water_appeared_color = "Green"
water_disappeared_color = "Yellow"
stable_color = "Blue"
land_color = "Brown"

fig, ax = plt.subplots(1, 1, figsize=(10, 10))

ds_selected_swi[1].plot.imshow(
    cmap="Pastel1", add_colorbar=False, add_labels=False, ax=ax
)
water_appeared_swi.plot.imshow(
    cmap=ListedColormap([water_appeared_color]),
    add_colorbar=False,
    add_labels=False,
    ax=ax,
)
water_disappeared_swi.plot.imshow(
    cmap=ListedColormap([water_disappeared_color]),
    add_colorbar=False,
    add_labels=False,
    ax=ax,
)
permanent_water_swi.plot.imshow(
    cmap=ListedColormap([stable_color]), add_colorbar=False, add_labels=False, ax=ax
)

plt.legend(
    [
        Patch(facecolor=stable_color),
        Patch(facecolor=water_disappeared_color),
        Patch(facecolor=water_appeared_color),
        Patch(facecolor=land_color),
    ],
    [
        f"Water to Water {round(permanent_water_area_swi, 2)} km$^2$",
        f"Water to No Water {round(water_disappeared_area_swi, 2)} km$^2$",
        f"No Water to Water: {round(water_apperaed_area_swi, 2)} km$^2$",
    ],
    loc="lower left",
)

plt.title("Change in water extent using SWI: " + baseline_time + " to " + analysis_time);

## CHIRPS

In [None]:
# Define the catchment area.
# catchment_vector_file = 'data/Lake_Chad.geojson'
# gdf_catchment = gpd.read_file(catchment_vector_file)
# catchment_bbox=list(gdf_catchment.total_bounds)
# lon_range = (catchment_bbox[0], catchment_bbox[2])
# lat_range = (catchment_bbox[1], catchment_bbox[3])

In [None]:
# Create the catchment_query dictionary.
catchment_query = {
    "x": lon_range,
    "y": lat_range,
    "time": time_range,
    "resolution": (-5000, 5000),
    "output_crs": "EPSG:6933",
    "dask_chunks": dask_chunks,
}

# Load the Rainfall CHIRPS data.
ds_rf = dc.load(product="rainfall_chirps_monthly", **catchment_query)

print(ds_rf)

In [None]:
# Compare the monthly total precipitation for the cathcment area with the open water area from SWI.
fig, ax1 = plt.subplots(figsize=(17, 6))

rf_plot = ds_rf["rainfall"].mean(["y", "x"])

# plt.subplot(2,1,1)
rf_plot.plot(
    marker="^",
    markersize=4,
    linewidth=1,
    ax=ax1,
    linestyle="dashed",
    label="Precipitation",
)

plt.ylabel("%s (%s)" % ("Total Precipitation", ds_rf["rainfall"].attrs["units"]))
plt.title("")

ax2 = ax1.twinx()
ds_valid_water_area_swi.plot(
    color="red", marker="^", markersize=4, linewidth=1, ax=ax2, label="Waterbody Area"
)
plt.title("")
plt.ylabel("Waterbody area (km$^2$)", color="red")
plt.xlabel("")
plt.yticks(color="red")

fig.legend(loc="upper left", bbox_to_anchor=(0.05, 0.93))
fig.suptitle(
    f"Evolution of Lake surface area from SWI, compared to catchment rainfall (CHIRPS) over time from {time_range[0]} to {time_range[1]}"
)
fig.tight_layout();

## Next steps


Return to the "Analysis parameters" section, modify some values (e.g. `latitude`, `longitude`, `start_year`, `end_year`) and re-run the analysis.
You can use the interactive map in the "View the selected location" section to find new central latitude and longitude values by panning and zooming, and then clicking on the area you wish to extract location values for.
You can also use Google maps to search for a location you know, then return the latitude and longitude values by clicking the map.

Change the year also in "Compare Two Time Periods - a Baseline and an Analysis" section, (e.g. `base_year`, `analyse_year`) and re-run the analysis.

---

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Compatible datacube version:**

In [None]:
print(datacube.__version__)

**Last Tested:**

In [None]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')