# Lab 11: Water balance from remote sensing

**Purpose:** In this chapter, you will learn simple water balance calculations using remote- sensing- derived products related to precipitation and evapotranspiration. You will learn to work at the river basin scale and perform time-series analysis using the Earth Engine platform. 

In [None]:
%pylab inline

In [None]:
# import ee api and geemap package
import ee
import math
import geemap
import pandas as pd
from geemap import colormaps as cmaps

In [None]:
# try to initalize an ee session
# if not authenticated then run auth workflow and initialize
try:
    ee.Initialize()
except:
    ee.Authenticate()
    ee.Initialize()

## Domain setup

A hydrological system, also referred to as river basin or drainage basin, is  any land  area where precipitation collects and drains off into a common outlet. In a basin the hydrological processes between upstream and downstream are interconnected. 

In this case we will be using the basin area for the Mun River in Thailand. This is a significant tributary system for the Mekong River and supports most of Thailands agricultural production. There are a good number of small reservoirs for irrigation systems and some seasonal flooding that occurs along the river. 

For our time domain, we will use 10 years from 2010-2020 to analysis monthly water balance. The satellite data records go back further but for this example we will keep it moderately constrained.

We begin by defining our area of interest and time domain for analysis:

In [None]:
# import global basin dataset
basins = ee.FeatureCollection("WWF/HydroSHEDS/v1/Basins/hybas_4")

# filter for the Mun river basin in Thailand
basin = basins.filter(ee.Filter.eq("HYBAS_ID",4041108580))

In [None]:
# Visualize the basin
Map = geemap.Map()

Map.centerObject(basin,7)

Map.addLayer(basin, {}, 'AOI')

Map.addLayerControl()

Map

Next, we define the time bounds. Given we will be calculating monthly water balance, we will define the years *and* months so we can

In [None]:
# set start and end year
start_year = 2010;
end_year = 2020;
 
# create two date objects for start and end
start_date = ee.Date.fromYMD(start_year, 1, 1);
end_date = ee.Date.fromYMD(end_year + 1, 1, 1);
 
# make a list with years
years = ee.List.sequence(start_year,end_year);
# make a list with months
months = ee.List.sequence(1, 12);

## Calculating monthly precipitation

Precipitation has been measured for many centuries ([Strangeways 2010](https://doi.org/10.1002/wea.548)). The traditional method is point measurement, which was standardized in the previous century to make measurements comparable in space and time. Although statistical methods exist to calculate area averaged rainfall from weather stations, the limited number of data points remains a constraint, especially in developing countries and sparsely populated regions where the density of weather stations is low. Satellites can fill this information gap (but also come with their own challenges), as they observe the planet at a regular interval with calibrated sensors. 

The Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) data is a quasi-global rainfall dataset ([Funk et al. 2015](https://doi.org/10.1038/sdata.2015.66)) covering more than 35 years. We import the CHIRPS ImageCollection and select the imagery for the relevant dates. Note that we used the pentad time series; each image in this collection contains the accumulated rainfall for five days. The daily product is also available in Earth Engine. The pentad dataset is used rather than the daily data product to reduce the number of computations needed to aggregate the data.


In [None]:
# import the CHIRPS dataset
chirps = (
    ee.ImageCollection("UCSB-CHG/CHIRPS/PENTAD")
    # filter for relevant time period
    .filterDate(start_date,end_date)
)

The year and month lists are used in the function below to calculate the monthly rainfall. We use a server-side nested loop where we first iterate over the years (2010, 2011, … 2020) and then iterate over the months (1, 2, … 12). This returns an image with the total rainfall for each month. We set the year, month, and timestamp (`system:time_start`) for each image and flatten the image to turn the object into a single `ImageCollection`. 

In [None]:
# define a nested loop for calculating monthly precip
def monthly_accumulation(yr):
    def _month_loop(mon):
        mondate = ee.Date.fromYMD(yr, mon, 1)
        
        accum = (
            chirps
            .filter(ee.Filter.calendarRange(yr, yr, 'year'))
            .filter(ee.Filter.calendarRange(mon, mon, 'month'))
            .sum()
        )
        return (accum
            .set('year', yr)
            .set('month', mon)
            .set('system:time_start', mondate.millis())
        )

    return months.map(_month_loop)

# apply the monthly accumulation function
# flatten the nested lists
# convert to image collection
monthly_precip = ee.ImageCollection.fromImages(years.map(monthly_accumulation).flatten())


In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(basin,7)

Map.addLayer(basin, {}, 'AOI')
# add layer with monthly mean. note that we clip for the basin of interest
Map.addLayer(monthly_precip.mean().clip(basin),{"min":50,"max":200,"palette":cmaps.get_palette("YlGnBu")},"Mean Monthly Precipitation");

Map.addLayerControl()

Map


## Calculating ET

Different methods exist to map ET from remote sensing data, including simple empirical models that relate spectral reflectance with ET, vegetation index models, energy budget, and deterministic models ([Courault et al. 2005](https://doi.org/10.1007/s10795-005-5186-0)).

There are different readily available ET products derived from remotely sensed data. We will use the MODIS ET products (MOD16) as it is readily available as an asset on Earth Engine. The MOD16 algorithm ([Mu et al., 2011](https://doi.org/10.1016/j.rse.2011.02.019)) is based on the logic of the Penman-Monteith equation, which uses daily meteorological reanalysis data and eight-day remotely sensed vegetation property dynamics from MODIS as inputs


In [None]:
# import the mod16 dataset
mod16 = (
    ee.ImageCollection("MODIS/006/MOD16A2")
    .select("ET")
    # filter for relevant time period
    .filterDate(start_date,end_date)
)

In [None]:
# define a nested loop for calculating monthly ET
def monthly_evapotranspiration(yr):
    def _month_loop(mon):
        mondate = ee.Date.fromYMD(yr, mon, 1)

        accum = (
            mod16
            .filter(ee.Filter.calendarRange(yr, yr, 'year'))
            .filter(ee.Filter.calendarRange(mon, mon, 'month'))
            .sum()
            .multiply(0.1) # apply scale factor to data
        )
        return (accum
            .set('year', yr)
            .set('month', mon)
            .set('system:time_start', mondate.millis())
        )

    return months.map(_month_loop)

# apply the monthly accumulation function
# flatten the nested lists
# convert to image collection
monthly_evap = ee.ImageCollection.fromImages(years.map(monthly_evapotranspiration).flatten())

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(basin,7)

Map.addLayer(basin, {}, 'AOI')
# add layer with monthly mean. note that we clip for the basin of interest
Map.addLayer(monthly_precip.mean().clip(basin),{"min":50,"max":200,"palette":cmaps.get_palette("YlGnBu")},"Mean Monthly Precipitation");
Map.addLayer(monthly_evap.mean().clip(basin),{"min":10,"max":100,"palette":"red,orange,yellow,lightblue,blue"},"Mean Monthly Evapotranspiration");


Map.addLayerControl()

Map

## Calculating runoff

Runoff is an important part of water systems as it is the lateral transport of water in time. This parameter is practically impossible to calculate from remote sensing observations and is typically calculated using models. However, with upcoming satelitte missions such as the [Surface Water Ocean Topography (SWOT) mission](https://swot.jpl.nasa.gov/mission/overview/), we will be able to observe discharge within the river channel. However, overland runoff will still evade us...


For this exercise, we are going to cheat here and bring in modeled runoff data since we cannot acquire the data from remote sensing data:


In [None]:
# import the ERA5 daily aggregated model data
era5_q = (
    ee.ImageCollection("ECMWF/ERA5_LAND/MONTHLY")
    # select only the runoff band
    .select("runoff")
    # filter for relevant time period
    .filterDate(start_date,end_date)
)

In [None]:
# define a nested loop for calculating monthly runoff
def monthly_runoff(yr):
    def _month_loop(mon):
        # get date info
        mondate = ee.Date.fromYMD(yr, mon, 1)
        # calculate number of days in given month
        # need so we can convert average runoff to total runoff for month
        ndays = mondate.advance(1,"month").difference(mondate,"days")
        # get runoff for month and calculate the total
        accum = (
            era5_q
            .filter(ee.Filter.calendarRange(yr, yr, 'year'))
            .filter(ee.Filter.calendarRange(mon, mon, 'month'))
            .first()
            .multiply(1000) # convert m to mm
            .multiply(ndays)
        )
        return (accum
            .set('year', yr)
            .set('month', mon)
            .set('system:time_start', mondate.millis())
        )

    return months.map(_month_loop)

# apply the monthly accumulation function
# flatten the nested lists
# convert to image collection
monthly_q = ee.ImageCollection.fromImages(years.map(monthly_runoff).flatten())

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(basin,7)

Map.addLayer(basin, {}, 'AOI')
# add layer with monthly mean. note that we clip for the basin of interest
Map.addLayer(monthly_precip.mean().clip(basin),{"min":75,"max":200,"palette":cmaps.get_palette("YlGnBu")},"Mean Monthly Precipitation");
Map.addLayer(monthly_evap.mean().clip(basin),{"min":10,"max":100,"palette":"red,orange,yellow,lightblue,blue"},"Mean Monthly Evapotranspiration");
Map.addLayer(monthly_q.mean().clip(basin),{"min":0,"max":100,"palette":cmaps.get_palette("Blues")},"Mean Monthly Runoff");


Map.addLayerControl()

Map

## Montly water balance

The water balance is the key concept in understanding the availability of water resources in a hydrological system. The water balance includes both input and extractions of water and can be defined by the following equation:

$P − (E + Q) = \Delta S$

Inputs to the hydrological system are defined by precipitation (P; rainfall and snow). Extractions for the system are from runoff (Q) and evapotranspiration (E), with evapotranspiration denoting the sum of evaporation from the land surface plus transpiration from plants. Water balance changes in groundwater and soil storage are indicated by ΔS.



We have calculated monthly precipitation, evapotranspiration, and runoff. Now we can calculate the storage each month using the straightforward equation above. Here we apply the equation for each month to calculate storage:

In [None]:
# define a nested loop for calculating monthly runoff
def monthly_waterbalance(yr):
    def _month_loop(mon):
        # get date info
        mondate = ee.Date.fromYMD(yr, mon, 1)

        # extract out precipitation for month
        p = (
            monthly_precip
            .filter(ee.Filter.calendarRange(yr, yr, 'year'))
            .filter(ee.Filter.calendarRange(mon, mon, 'month'))
            .first()
            .resample() # change resample method to bilinear for smooth results
        )

    # extract out ET for month
        et = (
            monthly_evap
            .filter(ee.Filter.calendarRange(yr, yr, 'year'))
            .filter(ee.Filter.calendarRange(mon, mon, 'month'))
            .first()
        )

        # extract out runoff for month
        q = (
            monthly_q
            .filter(ee.Filter.calendarRange(yr, yr, 'year'))
            .filter(ee.Filter.calendarRange(mon, mon, 'month'))
            .first() 
            .resample() # change resample method to bilinear for smooth results
        )

        # calculate storage
        storage = p.subtract(et.add(q)).rename("storage")

        return (storage
            .set('year', yr)
            .set('month', mon)
            .set('system:time_start', mondate.millis())
        )

    return months.map(_month_loop)

# apply the monthly accumulation function
# flatten the nested lists
# convert to image collection
monthly_storage = ee.ImageCollection.fromImages(years.map(monthly_waterbalance).flatten())

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(basin,7)

Map.addLayer(basin, {}, 'AOI')
# add layer with monthly mean. note that we clip for the basin of interest
Map.addLayer(monthly_precip.mean().clip(basin),{"min":75,"max":200,"palette":cmaps.get_palette("cubehelix")},"Mean Monthly Precipitation");
Map.addLayer(monthly_evap.mean().clip(basin),{"min":10,"max":100,"palette":"red,orange,yellow,lightblue,blue"},"Mean Monthly Evapotranspiration");
Map.addLayer(monthly_q.mean().clip(basin),{"min":0,"max":100,"palette":cmaps.get_palette("YlGnBu")},"Mean Monthly Runoff");
Map.addLayer(monthly_storage.mean().clip(basin),{"min":-75,"max":75,"palette":cmaps.get_palette("RdYlBu")},"Mean Monthly Storage");


Map.addLayerControl()

Map

### Water balance timeseries
Often times a map of water storage doesn't tell the whole story, there are a lot of changes that occur within a year and between years. Here we will calculate the basin average water balance for each month to see how it changes in time:

In [None]:
def storage_timeseries(image):
    # reduction function
    ds = image.reduceRegion(
        reducer = ee.Reducer.mean(),
        geometry = basin.geometry(),
        scale = 500
    )

    # set the result as a metadata property in the image
    return image.set(ds)

# apply the function and filter for images that were not all masked
monthly_storage_ts = monthly_storage.map(storage_timeseries).filter(ee.Filter.neq("storage",None))


In [None]:
# extract out the timeseries information from the collection
timeseries = monthly_storage_ts.aggregate_array("storage").getInfo()
timestamp = monthly_storage_ts.aggregate_array("system:time_start").getInfo()

In [None]:
# convert the data into a pandas DataFrame
dates = pd.to_datetime(np.array(timestamp)*1e6)
storage_series = pd.Series(timeseries,index=dates,name="d Storage")

In [None]:
ax = storage_series.plot.bar(figsize=(10,7));
# ax.plot([storage_series.index.as_dtype(int).min(),0], [0,storage_series.index.as_dtype(int).max()],"k--")

ax.set_xticklabels([x.strftime("%Y-%m") if x.month == 1 else "" for x in storage_series.index ], rotation=45);
ax.set_ylabel("Average Storage [mm]");


## Trends in storage

As we can see in our plot above, there is a strong seasonality to the water balance. Often times it is benificial to calculate trends in water balance (or other paramters) but this is difficult with a seasonality signal. 
[seasonal adjustment](https://en.wikipedia.org/wiki/Seasonal_adjustment). A simple way to correct for a seasonal component is to use differencing.

In [None]:
# define a function to calculate the monthly mean from all years
def calc_monthly_mean(month):
    month_mean = (monthly_storage
        .filter(ee.Filter.calendarRange(month, month, 'month'))
        .mean()
    )
    return month_mean.set("month", month)

# apply function
monthly_mean_storage = ee.ImageCollection.fromImages(months.map(calc_monthly_mean))

In [None]:
# preprocessing function to remove seasonality and add time bands
def trend_preprocess(image):
    month = image.get("month")

    # get the monthly mean corresponding to image 
    month_mean = monthly_mean_storage.filter(ee.Filter.eq("month",month)).first()

    # get the time band relative to year
    date_val = image.date().difference(ee.Date("1970-01-01"),"year")

    return (image.subtract(month_mean) # remove seasonal mean
        .addBands(ee.Image.constant(date_val).float().rename("t")) # add time band
        .copyProperties(image,["system:time_start","month","year"]) # set time property
    )

# apply function to calculate deviation from seasonal mean
monthly_storage_deviation = monthly_storage.map(trend_preprocess)

Another way to do extract out the monthly means corresponding to the images is by applying a [join](https://developers.google.com/earth-engine/guides/joins_save_all) to the collection. However, when doing so we would still need to map over the function to remove the seasonal signal as the join in this case will just pick which imagery corresponds

Now that we have our image collection that has the monthly means removed we can get our time series to see how far off each month is from expected. We will use the same function from above that calculates the basin average values:

In [None]:
# apply function that applies reducer and sets time series info
monthly_storage_deviation_ts = monthly_storage_deviation.map(storage_timeseries).filter(ee.Filter.neq("storage",None))

In [None]:
# extract out the timeseries information from the collection
timeseries = monthly_storage_deviation_ts.aggregate_array("storage").getInfo()
timestamp = monthly_storage_deviation_ts.aggregate_array("system:time_start").getInfo()

In [None]:
# convert the data into a pandas DataFrame
dates = pd.to_datetime(np.array(timestamp)*1e6)
storage_deviation_series = pd.Series(timeseries,index=dates,name="d Storage")

In [None]:
ax = storage_deviation_series.plot.bar(figsize=(10,7));

ax.set_xticklabels([x.strftime("%Y-%m") if x.month == 1 else "" for x in storage_deviation_series.index ], rotation=45);
ax.set_ylabel("Storage from normal [mm]");

Just for fun, let's calculate the storage trends in space to see where within the basin we expect to see increases or decreases in storage. We will use a 
[Sen's slope](https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator) reducer to calculate the trend in time. (Note: Sen's slope is a popular linear trend analysis for time series)

To apply the Sen's slope reducer in Earth Engine, we just need time and the parameter we are trying to estimate then call the reducer:

In [None]:
# apply Sen's slope reduction in time
trend_coefs = monthly_storage_deviation.select("t","storage").reduce(ee.Reducer.sensSlope())

This will result in an image  with two bands: "slope" and "offset". The slope will tell is the change per month in time and we can add that to a map:

In [None]:
wet_yr = monthly_storage_deviation.filterDate("2011-05-01","2011-11-01").mean()
dry_yr = monthly_storage_deviation.filterDate("2014-11-01","2015-05-01").mean()

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(basin,7)

Map.addLayer(basin, {}, 'AOI')
# add layer with monthly mean. note that we clip for the basin of interest
Map.addLayer(monthly_storage.mean().clip(basin),{"min":-75,"max":75,"palette":cmaps.get_palette("RdYlBu")},"Mean Monthly Storage");

Map.addLayer(wet_yr.clip(basin),{"bands":"storage", "min":-75,"max":75,"palette":cmaps.get_palette("RdYlBu")},"Wet Yr Mean Monthly Storage");
Map.addLayer(dry_yr.clip(basin),{"bands":"storage", "min":-25,"max":25,"palette":cmaps.get_palette("RdYlBu")},"Dry Yr Mean Monthly Storage");

Map.addLayer(trend_coefs.clip(basin),{"bands":"slope","min":-3,"max":3,"palette":cmaps.get_palette("plasma")},"Monthly Storage Trend");



Map.addLayerControl()

Map