# Lab 10: Remote sensing of water quality

**Purpose:** The purpose of this lab is to walk through an example of using remote sensing datasets for water quality monitoring. Students will gain experience using Earth Engine for spatial-temporal data sampling as well as regression analysis for estimating a water quality parameter.

In [None]:
%pylab inline

In [None]:
# import ee api and geemap package
import ee
import math
import geemap
import numpy as np
import pandas as pd
from geemap import colormaps as cmaps

In [None]:
# try to initalize an ee session
# if not authenticated then run auth workflow and initialize
try:
    ee.Initialize()
except:
    ee.Authenticate()
    ee.Initialize()

## Background

Remote sensing tools can provide spatial and temporal resolution for monitoring water quality in reservoirs and large rivers that are not available from traditional in situ measurements. The retrieval of water quality parameters from remote sensing systems relies on the optical properties (transmittance, absorption and scattering) of water and the dissolved and suspended constituents in the water. Suspended solids are responsible for most of the scattering in an aquatic system, whereas chlorophyll-a (chl-a) and colored dissolved matter are mainly responsible for absorption ([Myint and Walker, 2002](https://doi.org/10.1080/01431160110104700
)). There is a wide body of research on assessing water quality analytical optical modeling using in situ inherent optical properties ([Cox et al., 2009](https://doi.org/10.1080/07438149809354347)). Methods used to relate in situ data to the satellite observations through statistical relationships include simple linear regression, non-linear regressions, principal component analysis, and neural networks.

In this notebook, we will explore a straightforward empirical approach using a linear regression to estimate a water quality for lakes and reservoirs in Utah.

## Estimating water quality

For this example, we are going to estimate [Secchi depth](https://en.wikipedia.org/wiki/Secchi_disk) from remote sensing data. Secchi depth is an optical property of water which is related to turbidity and other water quality parameters ([Lavender et al., 2017](https://doi.org/10.1371/journal.pone.0186092)). Of course, to estimate this parameter we need field measurements and relate those to what is observed by satellites. The data used in this notebook was collected from the [Water Quality Data Viewer App](https://tethys-staging.byu.edu/apps/lake/data/) and uploaded to Earth Engine.

To start, we will do some pre-processing to the water quality sample table and remote sensing data:

In [None]:
# load in our data set
utah_secchi = ee.FeatureCollection("users/kmarkert/BYUCE594/Utah_Lake_Secchi_depth")
gsl_secchi = ee.FeatureCollection("users/kmarkert/BYUCE594/GSL_Secchi_depth")
dc_secchi = ee.FeatureCollection("users/kmarkert/BYUCE594/Deer_Creek_Secchi_depth")

wq_samples = utah_secchi.merge(gsl_secchi).merge(dc_secchi)

In [None]:
print(f"Total number of samples: {wq_samples.size().getInfo()}")

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(wq_samples, 9); 

Map.addLayer(wq_samples,{},"Water Quality Samples")

Map.addLayerControl()

Map

In [None]:
# define function to format date information for samples
def format_sample_date(feature):
    # extract date info and convert to milliseconds since 1970
    collection_time = ee.Date.parse("MM-dd-YYYY", feature.get("Activity Start Date")).millis()
    return feature.set("collection_date",collection_time)

wq_samples = wq_samples.map(format_sample_date)

In [None]:
# display the information from first feature
wq_samples.first().getInfo()

Next we need remote sesning data. For this example we will use Landsat as it provides a long time series of satellite observations. We will have to do some preprocessing before using and combine collections from different sensor collections.

In [None]:
# load historical suface water occurrence information to constain water masks
water_occurrence = ee.Image("JRC/GSW1_3/GlobalSurfaceWater").select("occurrence")
water_constrain = water_occurrence.gt(50)

In [None]:
# QA mask function
def qa_mask(image):
    #Bits 3, 4, and 5 are cloud shadow, snow, and cloud, respectively.
    cloudShadowBitMask = (1 << 3);
    cloudsBitMask = (1 << 5);
    snowBitMask = (1 << 4);

    #Get the pixel QA band.
    qa = image.select('pixel_qa');

    # apply the bit shift and get binary image of different QA flags
    cloud_shadow_qa = qa.bitwiseAnd(cloudShadowBitMask).eq(0)
    snow_qa = qa.bitwiseAnd(snowBitMask).eq(0)
    cloud_qa = qa.bitwiseAnd(cloudsBitMask).eq(0)

    # get water mask info!
    waterBitMask = (1 << 2)
    # and constrain to where we know there is water
    water_qa = qa.bitwiseAnd(waterBitMask).updateMask(water_constrain)

    # combine qa mask layers to one final mask
    mask = cloud_shadow_qa.And(snow_qa).And(cloud_qa).And(water_qa)

    # apply mask and return orignal image
    return image.updateMask(mask);


In [None]:
# load on Landsat 5 collection
l5_collection = (
    ee.ImageCollection('LANDSAT/LT05/C01/T1_SR')
    # filter by sample locations
    .filterBounds(wq_samples)
    # apply qa mask
    .map(qa_mask)
    # select the spectral bands and rename
    .select(
        ["B1","B2","B3","B4","B5","B7"],
        ["blue","green","red","nir","swir1","swir2"]
    )
)

In [None]:
# load on Landsat 7 collection
l7_collection = (
    ee.ImageCollection('LANDSAT/LE07/C01/T1_SR')
    # filter by sample locations
    .filterBounds(wq_samples)
    # apply qa mask
    .map(qa_mask)
    # select the spectral bands and rename
    .select(
        ["B1","B2","B3","B4","B5","B7"],
        ["blue","green","red","nir","swir1","swir2"]
    )
)

In [None]:
# load on Landsat 8 collection
l8_collection = (
    ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
    # filter by sample locations
    .filterBounds(wq_samples)
    # apply qa mask
    .map(qa_mask)
    # select the spectral bands and rename
    .select(
        ["B2","B3","B4","B5","B6","B7"],
        ["blue","green","red","nir","swir1","swir2"]
    )
)

In [None]:
# merge all of the collections together for long time series
ls_collection = l5_collection.merge(l7_collection).merge(l8_collection)

In [None]:
ls_composite = ls_collection.median()
ls_count = ls_collection.select("blue").count()

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(wq_samples, 9); 

Map.addLayer(ls_composite, {"bands":"red,green,blue", "min": 0, "max": 3300,"gamma":1.3}, 'L8 Composite');
Map.addLayer(ls_count, {"min": 0, "max": 1000,"palette":cmaps.get_palette("magma")}, 'Observation Count');

Map.addLayer(wq_samples,{},"Water Quality Samples")

Map.addLayerControl()

Map

### Sampling coincident data

Here we are going to do the unthinkable...use a for loop! But we are going to try to be smart about how we set this up: first we will identify all unique dates that samples were collected (this limits the number of loops) and then sample from our imagery using all sampels from the dates. Additionally, we will wrap different requests in the loop so that this doesn't happen on the server side.

In [None]:
# get list of unique dates for samples
dates = (
    wq_samples
    .aggregate_array("collection_date") # get the date information 
    .map(lambda x: ee.Date(x).format("YYYY-MM-dd")) # convert the date object to ISO string
    .distinct() # only get unique date values
)

In [None]:
# get how many dates we have to loop through
n = dates.size().getInfo()

print(f"Number of collection dates: {n}")

Must be careful with sampling using this approach...it is very easy to get a `Maximum recursion depth exceeded` error which means that too many request attempts were nested. To avoid the error then we can set our recursion depth to higher than the default system setting:

In [None]:
import sys
sys.setrecursionlimit(3000)

Now we are ready to run the sampling:

In [None]:
# create an empty featurecollection to append samples to
rs_wq_samples = ee.FeatureCollection([])

# coincident tolerance in days
# controls how many days on either side of sample collection to check for RS data
tolerance = 1

# start looping over dates
for i in range(n):
    collection_date = ee.Date(dates.get(i))

    # get time bounds to filter imagery
    t1 = collection_date.advance(-tolerance,"day")
    t2 = collection_date.advance(tolerance+1,"day")

    # get the samples from the date of interest
    samples_date = wq_samples.filter(ee.Filter.eq("collection_date",collection_date.millis()))

    # filter imagery for date and mosaic
    sample_img = ls_collection.filterDate(t1,t2).mosaic()

    # sample pixels using the sample points
    spectra_samples = sample_img.sampleRegions(
        collection=samples_date,
        scale = 30, 
        tileScale = 4, 
        geometries = True
    )

    # append samples from date to larger collection
    rs_wq_samples = rs_wq_samples.merge(spectra_samples)

# filter by a band to make sure we only have samples from valid obs
rs_wq_samples = rs_wq_samples.filter(ee.Filter.neq("blue",None))

Because sampling is a computationally process (we have to pre-process all of the imagery and then find coincident observations), we typically export this intermediate result to load in later. We can theoretically continue using the sampled collection in interactive mode but it is very likely we will run into a `Too many concurrent aggregations` error.

To export as an asset and as a CSV in Google Drive, we can run the following code:

In [None]:
# Export to asset code
userid = geemap.ee_user_id()
asset_task = ee.batch.Export.table.toAsset(
    collection = rs_wq_samples,
    description = "UT_Lake_WQ_LS_Samples",
    assetId = f"{userid}/UT_Lake_WQ_LS_Samples"
)
asset_task.start()

# Export to drive code
drive_task = ee.batch.Export.table.toDrive(
    collection = rs_wq_samples,
    description = "UT_Lake_WQ_LS_Samples"
)
drive_task.start()

### Statistical analysis

We are going to use Earth Engine to perform some *basic* statistical analysis for estimating Secchi depth. Ideally you would perform a more robust analysis outside of EE (hence why we exported to Drive). However, this provides an example of doing the analysis with Earth Engine:

In [None]:
# read in a pre-exported collection to make computations run quicker
table = ee.FeatureCollection("users/kmarkert/UT_Lake_WQ_LS_Samples")

In [None]:
# define which columns to test correlations with
x_cols = ee.List(["blue","green","red","nir"])
y_col = "Result Value"

In [None]:
# define a function to calculate correlations between different
# predictor variables and the response
def get_correlation(x):
    r = table.reduceColumns(ee.Reducer.pearsonsCorrelation(),[x,y_col])
    return r.get("correlation")

# get correlations
cor_list = x_cols.map(get_correlation)

In [None]:
cor_list.getInfo()

In [None]:
# determine which column has the best correlations
max_cor = ee.Array(cor_list).abs().argmax().get(0)
x_col = ee.String(x_cols.get(max_cor))
print(f"Best correlated: '{x_col.getInfo()}'")

In [None]:
# define function to add predictor/response variables 
# to the featurecollection
def add_vars(feature):
    # extract out cols and apply log transform
    y = ee.Number(feature.get(y_col)).log10()
    x = ee.Number(feature.get(x_col)).log10()

    # pack the info to key-value pairs
    var_dict = ee.Dictionary({
        "x":x,
        "constant": 1,
        "y":y
    })

    # assign new column info
    return feature.set(var_dict)

# apply function
table = table.map(add_vars)

In [None]:
table.first().getInfo()

In [None]:
# apply regression on the table
regression = table.reduceColumns(ee.Reducer.linearRegression(numX=2,numY=1),["constant","x","y"])
# extract out the coefficients as a list
coefficients = ee.Array(regression.get("coefficients")).project([0]).toList()

In [None]:
coefficients.getInfo()

### Applying regression on imagery

Now that we have coefficients for estimating Secchi depth from remote sensing data, we can now apply over all imagery. This is beneficial as it provides spatial estimates of the parameter as well as can be applied for each aquisition (provided there are no clouds).

In [None]:
# define function to calculate the secchi depth from coefficients
def apply_regression(img):
    # extract out band for prediction
    # and apply log transform
    log_g = img.select(x_col).log10()
    # apply regression
    log_secchi_depth = log_g.polynomial(coefficients)
    # inverse log transform
    secchi_depth = ee.Image.constant(10).pow(log_secchi_depth)

    return secchi_depth.rename("secchi_depth").copyProperties(img,["system:time_start"])

In [None]:
# apply function to calculate secchi depth
secchi_depth_collection = ls_collection.map(apply_regression)

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(wq_samples, 9); 

Map.addLayer(ls_collection.median(), {"bands":"red,green,blue", "min": 0, "max": 3300,"gamma":1.3}, 'L8 Composite');
Map.addLayer(secchi_depth_collection.mean(), {"min": 0, "max": 2,"palette":cmaps.get_palette("viridis_r")}, 'Secchi depth');

Map.addLayer(wq_samples,{},"Water Quality Samples")

Map.addLayerControl()

Map

### Quick visualization of analysis

Our example on Earth Engine leaves much to be desired in terms of visualizing data for the analysis. So, to illustrate the same analysis on the client-side using Python, we run the regression and plot the results of fitting.

In [None]:
# get the column arrays from earth engine
x_obs = np.array(table.aggregate_array(x_col).getInfo())
y_obs = np.array(table.aggregate_array(y_col).getInfo())

In [None]:
# apply log transform on data out
x = np.log10(x_obs)
y = np.log10(y_obs)

In [None]:
# apply linear fit
z = np.polyfit(x,y,1)
p = np.poly1d(z)

In [None]:
# create an array of x values and predict for visualization
x_line = np.log10(np.arange(x_obs.min(), x_obs.max(),1))
y_line = p(x_line)

In [None]:
# visualize samples and regression
plot(x, y, "C0o", alpha=0.3)
plot(x_line,y_line,"C1",lw=3);
xlabel(f"Log spectra ({x_col.getInfo()})")
ylabel("Log Secchi depth");

In [None]:
# apply prediction on real data
# and apply inverse log trasform
y_hat = 10 ** p(x)

In [None]:
# visualize predicted vs observed
plot(y_obs, y_hat, "C0o", alpha=0.3)
plot([0,y_hat.max()],[0,y_hat.max()],"k--")
xlabel("Observed [m]")
ylabel("Predicted [m]");
xlim(0,8)
ylim(0,8)

After visualizing the results, this does not provide bad results for the level of effort put into it. Additional statistical analysis would be needed to achieve a better model for estimating Secchi depth, however, the goal of this notebook was not to get a perfectly accurate result but rather demonstrate the process.