# Lab 13: Remote sensing of soil moisture

**Purpose:** In this chapter, you will work with a soil moisture data set on Earth Engine and combine multiple dataset together in attempt to predict higher resolution soil moisture values. You will work on managing multiple dataset, applying joins, and a machine learning work flow. 

In [None]:
%pylab inline

In [None]:
# import ee api and geemap package
import ee
import math
import geemap
import pandas as pd
from geemap import colormaps as cmaps

In [None]:
# try to initalize an ee session
# if not authenticated then run auth workflow and initialize
try:
    ee.Initialize()
except:
    ee.Authenticate()
    ee.Initialize()

## Background

Soil moisture is an important variable in the hydrologic system which controls the exchange of water, energy, and carbon fluxes between the land surface and the atmosphere. Traditionally, soil moisture is retrieved from microwave remote sensing data but these data are typically not suitable for regional hydrological and agricultural applications such as irrigation management and flood predictions due to their coarse spatial resolution ([Peng et al., 2017](https://doi.org/10.1002/2016RG000543)).

Many methods have been developed to downscale the coarse. Some of these methods rely on coincident observations and physical models of how soil moisture interacts with what we can observe whereas other methods are more statistical in nature ([Mishra et al., 2018](https://doi.org/10.1016/j.jag.2018.02.005)). For this lab we will implement a pure statistical "downscaling" approach to estimate soil moisture at higher spatial resolution using other datasets.



## Soil moisture downscaling using ML

As mentioned above, we will be using a statistical approach to estimate soil moisture at higher spatial resolution. The approach will rely on multipl disparate datasets that may or may not be collected at different times. The bulk of this notebook will focus on how one manages and uses these different datasets together in an efficient manner for machine learning.

For this methodology, we will test using NDVI, Land Surface Temperature, Precipitation, and a soil classification to estimate soil moisture.

### Data gathering

First off, we need data. Here we will load in the dataset that we will use for the ML estimation of soil moisture and apply the neccesary pre-processing:

In [None]:
# define our start/end time to filter the data collections
start_time = "2016-01-01"
end_time = "2022-01-01"

In [None]:
# define a function to do the bit shifting for us fo QA processing
def extract_bits(image, start, end=None, new_name=None):
    """Function to convert qa bits to binary flag image

    args:
        image (ee.Image): qa image to extract bit from
        start (int): starting bit for flag
        end (int | None, optional): ending bit for flag, if None then will only use start bit. default = None
        new_name (str | None, optional): output name of resulting image, if None name will be {start}Bits. default = None

    returns:
        ee.Image: image with extract bits
    """

    newname = new_name if new_name is not None else f"{start}_bits"

    if (start == end) or (end is None):
        # perform a bit shift with bitwiseAnd
        return image.select([0], [newname]).bitwiseAnd(1 << start)
    else:
        # Compute the bits we need to extract.
        pattern = 0
        for i in range(start, end):
            pattern += int(math.pow(2, i))

        # Return a single band image of the extracted QA bits, giving the band
        # a new name.
        return image.select([0], [newname]).bitwiseAnd(pattern).rightShift(start)

In [None]:
# function to preprocess the modis LST data
def lst_preprocess(image):
    # get QA band
    qa_band = image.select("QC_Day")

    # extract QA bits
    mask = extract_bits(qa_band, start=2, end=3).eq(0)

    # apply mask and rescale to Celcius
    return image.multiply(0.02).subtract(273.15).updateMask(mask).copyProperties(image,["system:time_start","system:time_end"])

In [None]:
# function to preprocess the modis LST data
def sr_preprocess(image):
    # get QA band
    qa_band = image.select("QC_250m")

    # extract QA bits
    mask = extract_bits(qa_band, start=0, end=1).eq(0)

    # calculate NDVI
    ndvi = image.normalizedDifference(["sur_refl_b02", "sur_refl_b01"]).rename("ndvi")

    # add ndvi band and apply mask
    return image.addBands(ndvi).updateMask(mask).copyProperties(image,["system:time_start","system:time_end"])

In [None]:
# Load MODIS lst image collection
# filter by date and apply preprocessing
modis_lst = ee.ImageCollection("MODIS/006/MYD11A2").filterDate(start_time,end_time).map(lst_preprocess)

In [None]:
# Load MODIS surface reflectance image collection
# filter by date and apply preprocessing
modis_sr = ee.ImageCollection("MODIS/061/MOD09GQ").filterDate(start_time, end_time).map(sr_preprocess)

In [None]:
# import the CHIRPS dataset
chirps = ee.ImageCollection("UCSB-CHG/CHIRPS/PENTAD")

# no filtering or preprocessing needed


In [None]:
# load static images for the model

# load in a soil classification
soil_class = ee.Image("OpenLandMap/SOL/SOL_TEXTURE-CLASS_USDA-TT_M/v02").select(["b0"],["surface_soil_class"])

# load in DEM
dem = ee.Image("USGS/GTOPO30").select("elevation")

Finally we have all of our features needed for training a model, next we need information on what the observed/modeled soil moisture is. Here we will use the [NASA-USDA Enhanced SMAP Global Soil Moisture Data](https://developers.google.com/earth-engine/datasets/catalog/NASA_USDA_HSL_SMAP10KM_soil_moisture) which is a combination of observed and modeled soil moisture ([Bolton and Crow, 2012](https://doi.org/10.1029/2012GL053470))

In [None]:
# load in the soil moisture dataset
soil_moisture = ee.ImageCollection("NASA_USDA/HSL/SMAP10KM_soil_moisture").filterDate(start_time, end_time)

### Co-locating datasets using joins

Now that we have all of our data, we need to store it in a manner to where we can use to together. Since we need to sample all of the data at the same time, we will need an image with all of the bands. Getting all of the bands together is not trivial because we need obserations at the same time. To efficiently gather all fo the data we need in one place we will use [joins](https://developers.google.com/earth-engine/guides/joins_intro) which combine elements from different collections.

The other way to combine elements is to map over the collections and apply filters but that may not scale to large image collections very well.



We are mostly concerned with temporal filters/joins, so we will need to define a filter to identify which elements we should join:

In [None]:
# Define an allowable time difference: one days in milliseconds.
one_day_millis = 24 * 60 * 60 * 1000

# Create a time filter to define a match as overlapping timestamps.
time_filter = ee.Filter.Or(
    # use max difference filter to specify only one day difference
    # checks one day on either side of observation
    ee.Filter.maxDifference(
        difference= one_day_millis,
        leftField= 'system:time_start',
        rightField= 'system:time_end'
    ),
    ee.Filter.maxDifference(
        difference= one_day_millis,
        leftField='system:time_end',
        rightField= 'system:time_start'
    )
);


Now that we have a filter, we need to define our join. This specifies what the results will be so we know what to look for later.

In [None]:
# Define the join.
# this is "saveBest" which will give us the image closest in time to what we want
ndvi_join = ee.Join.saveBest(
  matchKey= 'ndvi', # this will be the name of the result in the collection
  measureKey= 'timeDiff'
)

Lastly, we need to apply the join!

In [None]:
# Apply the join.
# uses soil_moisture as the collection to join to and applies filter on surface reflectance data
sm_ndvi = ndvi_join.apply(soil_moisture, modis_sr, time_filter)

In [None]:
# check our result to see if it worked
sm_ndvi.first().getInfo()

Now that we have verified that the join worked, we can do the same for the other collections. Here we define the join for the LST data and apply.

Note: we apply the join on the result of the last join so we can keep things together.

In [None]:
# Define the join.
lst_join = ee.Join.saveBest(
  matchKey= 'lst',
  measureKey= 'timeDiff'
)

# Apply the join.
sm_ndvi_lst = lst_join.apply(sm_ndvi, modis_lst, time_filter)

We are going to mix things up for joining the precipitation data. Theoretically, precipitation can affect soil moisture days after a rain event. So, we want to account for that and do so with a different filter. We will check for the difference from 

In [None]:
# specify number of days we want to keep
lag_days = 15

# create our filter which keeps last n days of observations
lag_filter = ee.Filter.And(
    # filter for difference from observation on either side
    ee.Filter.maxDifference(
        difference= 1000 * 60 * 60 * 24 * lag_days,
        leftField= "system:time_start",
        rightField= "system:time_start"
    ),
    # filters data that is greater than the observation
    # so we only keep days before obs
    ee.Filter.greaterThan(
        leftField= "system:time_start",
        rightField= "system:time_start"
 ))

Now we have our fun filter than checks for days before observation. Now we want to apply but keep all of the observations. This way we can calculate the sum for the days leading up. To do so, we define the `saveAll` join and apply.

In [None]:
# save all join to save every image that meets out criteria
lag_join = ee.Join.saveAll(
    matchesKey= 'precip',
    measureKey= 'delta_t',
    ordering= "system:time_start",
    ascending= True, # Sort chronologically
)

# Apply the join.
sm_ndvi_lst_precip = lag_join.apply(sm_ndvi_lst, chirps, lag_filter)

In [None]:
# apply filter for null values in join properties 
# just to ensure we don't have any missing data
sm_ndvi_lst_precip  = sm_ndvi_lst_precip.filter(
    ee.Filter.And(
        ee.Filter.neq("ndvi",None),
        ee.Filter.neq("lst",None),
        ee.Filter.neq("precip",None),
    )
)

In [None]:
# recast to image collection
# sometimes with joins/filters it gets converted to a ee.Collection
# so we just want to make sure EE knows it has images
sm_ndvi_lst_precip = ee.ImageCollection(sm_ndvi_lst_precip)

### Sampling data

In the final collection from the co-locating process, there are the soil moisture images with the NDVI, LST, and precipitation data as properties. We want to sample points from the data to then get a dataset for machine learning.

This is a little complex because there is not just one soil moisture observation to sample but mulitple in time. So, we will get all available dates to sample from, pick a few, and then sample from those iteratively.

In [None]:
# get a list of dates to sample from
dates = sm_ndvi_lst_precip.aggregate_array("system:time_start").getInfo()

In [None]:
# specify the number of dates we would like to sample
n_days = 50

# randomly select n dates to sample
sample_dates = np.random.choice(dates,size=n_days)

We now have dates that we would like to sample, now we define a geographic region to sample. Here we will sample over all of CONUS:

In [None]:
# this loads in a global vector file of countries
# filter by country of interest
conus = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017").filter(
    ee.Filter.eq("country_na","United States")
)

# get a simple bounding box
sample_region = conus.geometry(1e4).bounds(1e4)

Now we are ready to sample! We will loop through the randomly selected dates, grab our image for that date, combine all of the bands together, and then finally sample.

We are using a for loop here to queue up multiple requests:

In [None]:
# define empty feature collection to append samples to
training_samples = ee.FeatureCollection([])

# loop over our dates
for i,date in enumerate(sample_dates):
    # get a time range to filter for date
    t1 = ee.Date(int(date))
    t2 = t1.advance(1,"day")

    # get our image for date
    img = ee.Image(sm_ndvi_lst_precip.filterDate(t1,t2).first())

    # combine all of the images from joins into one with multiple bands
    # notice we are fetching the property we specified in the join
    # we also add our static images (soil class and DEM) to each image 
    img = (
        img
        .addBands(img.get("ndvi")) # add ndvi
        .addBands(img.get("lst")) # add lst
        # get precip as collecition and reduce to image
        .addBands(ee.ImageCollection.fromImages(img.get('precip')).sum()) 
        # add static bands
        .addBands(soil_class)
        .addBands(dem)
    )

    # run sampling
    samples = img.sample(
        region = sample_region,
        numPixels = 200, 
        scale = 10000,
        seed = i,
    )

    # append samples to collection
    training_samples = training_samples.merge(samples)

In [None]:
# check total number of sample
training_samples.size().getInfo()

In [None]:
# check to make sure our features have all of the information we want
training_samples.first().getInfo()

### Training/testing the model

Now that we have our training data, we are ready to train a model. We will do a similar approach where we split the data into testing and training and do a quick check on how well the model performs.

In [None]:
# specify which bands will be used as features and which one will be the label
features = ["ndvi","LST_Day_1km","elevation","precipitation","surface_soil_class"]

label = "ssm" # surface soil moisture in mm

In [None]:
# add a random column to the collection to randomly split
training_samples = training_samples.randomColumn(seed=5)

# split into train/test datasets using 70-30 split
training = training_samples.filter(ee.Filter.lte("random", 0.7))
testing = training_samples.filter(ee.Filter.gt("random", 0.7))

In [None]:
# instantiate our model and train
# note here we set the output to regression so it knows how to handle the outputs
rf = (
    ee.Classifier.smileRandomForest(numberOfTrees=50, bagFraction=0.8)
    .setOutputMode("REGRESSION")
    .train(training, label, features)
)

In [None]:
# apply model on test dataset
y_test = testing.classify(rf, "test")

In [None]:
# get the predicted and observed values
y_pred = np.array(y_test.aggregate_array("test").getInfo())
y_true = np.array(y_test.aggregate_array(label).getInfo())

In [None]:
# make a quick scatter plot of the results
plot(y_pred,y_true, "C0o",alpha=0.3)
plot([0,25],[0,25],"k--")

xlabel("Predicted Soil Moisture [mm]")
ylabel("Observed Soil Moisture [mm]")

show()

In [None]:
# calculate RMSE quickly
rmse = np.sqrt(np.mean((y_pred - y_true)**2))

print(f"RMSE: {rmse}")

### Apply the model

Let's assume we are very happy with our model and we now would like to apply it to our whole collection. To do so is pretty straightforward: we will define a function to map over the whole image collection, combine the bands, and apply the model.

In [None]:
# define function that applies model inference to image
def apply_model(img):
    # combine all of the images from joins into one with multiple bands
    # notice we are fetching the property we specified in the join
    # we also add our static images (soil class and DEM) to each image 
    img = (
        img
        .addBands(img.get("ndvi")) # add ndvi
        .addBands(img.get("lst")) # add lst
        # get precip as collecition and reduce to image
        .addBands(ee.ImageCollection.fromImages(img.get('precip')).sum()) 
        # add static bands
        .addBands(soil_class)
        .addBands(dem)
    )

    # apply inference
    pred = img.classify(rf,label+"_pred")

    # return the image now with the estimate high resolution soil moisture
    return img.addBands(pred)

In [None]:
# apply the prediction to the image collection
sm_ndvi_lst_rf = sm_ndvi_lst_precip.map(apply_model)

In [None]:
# extract out one image for use to visually inspect
view_img = sm_ndvi_lst_rf.first()#.clip(sample_region)

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(sample_region, 5)

Map.addLayer(sample_region, {}, 'UT')


Map.addLayer(view_img.select("ndvi"),{"min":0,"max":1,"palette":cmaps.get_palette("viridis")},"NDVI");
Map.addLayer(view_img.select("LST_Day_1km"),{"min":-5,"max":30,"palette":cmaps.get_palette("inferno")},"LST");
Map.addLayer(view_img.select(label),{"min":0,"max":25,"palette":cmaps.get_palette("plasma_r")},"Original SM");
Map.addLayer(view_img.select(label+"_pred"),{"min":0,"max":25,"palette":cmaps.get_palette("plasma_r")},"Downscaled SM");

Map.addLayerControl()

Map
