# Fractional Cover vs. field data validation  <img align="right" src="../../../Supplementary_data/dea_logo.jpg">

* **This notebook exists to document the process used to check the reprocessing of the Fractional Cover versus field data:** We therefore expect it not to be run again unless needed, and before running you need to know what coefficients you need to apply.
* **Compatibility:** Notebook currently compatible with the `NCI` `DEA Sandbox` environment only
* **Products used:** 
[ga_ls_fc_3](https://explorer.sandbox.dea.ga.gov.au/products/ga_ls_fc_3),
[ga_ls_wo_3](https://explorer.sandbox.dea.ga.gov.au/products/ga_ls_wo_3),
* **Additional Data:** SLATS star transect field data

## Background

This notebook was created to check that the reprocessed FC in 02/2022 would improve the alignment of Landsat 8 FC with Landsat 5 and 7 FC. Discrepancies were observed after the first full DEA Collection 3 Landsat Vegetation Fractional Cover processing.

> * We needed to recalculate Fractional Cover with updated coefficients and compare it to previous data to demonstrate the fix was successful
> * the updated FC coefficients are applied after the first FC calculation. The first FC calculation is performed using the existing FC module. band * scale + interception will be good enough, e.g. bs * 0.9499 + 2.45 
> * See https://github.com/GeoscienceAustralia/fc/pull/48/files
extra_coefficients:
> *Field measurements of ground cover are taken using the SLATS Star transect protocol https://www.researchgate.net/publication/236022381_Field_measurement_of_fractional_ground_cover and stored in the TERN SLATS database.  https://portal.tern.org.au/slats-star-transects-field-sites/23207 

# Description

1. Find field data; this field data is the Star transects from [the JRSRP geoserver wfs service](https://field-geoserver.jrsrp.com/geoserver/aus/wfs?service=wfs&version=1.1.0&request=GetFeature&typeNames=aus:star_transects&outputFormat=csv) which can be visualised through [the TERN Landscapes-JRSRP Field Data Portal](https://field.jrsrp.com/) and is available as a csv. The field data we used in this notebook came from the C2-C3 update set, and is not the most current version, so it is available in this folder as "star_transects.csv"
2. Load corresponding surface reflectance from datacube or save file
3. Calculate FC and compare to field data using scikit-learn
***

# This notebook exists for documentation purposes only
This notebook was created for the implementation of a fix to FC. You probably don't want to run it, as once the FC is updated, this will no longer produce an "improved" FC set, but one with the fix applied again.

## Load packages

In [2]:
%matplotlib inline

import sys
import warnings
from datetime import datetime, timedelta
from functools import partial
from itertools import groupby

import datacube
import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
from datacube.testutils.io import native_geobox, native_load
from datacube.utils.geometry import CRS, GeoBox, Geometry
from matplotlib import gridspec
from matplotlib import pyplot as plt
from odc.algo import keep_good_only
from odc.algo._masking import _fuse_mean_np, _xr_fuse
from odc.algo.io import load_with_native_transform
from odc.stats.utils import fuse_ds
from scipy.stats import kendalltau, pearsonr, spearmanr
from shapely import wkt
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score

import warnings

warnings.simplefilter("ignore")


# instantiate a datacube
dc = datacube.Datacube(app="FC_field_data_val")

### Define a function to compute the fractional covers as viewed by the satellite for the site


In [2]:
# Function to compute the fractional covers as viewed by the satellite for the site (field sites)
# Required a site properties object

def fractionalCoverSatView(siteProperties):
    '''equations to calculate fractional cover from the csv data'''
    nTotal = siteProperties['num_points']
    
    # Canopy Layer
    nCanopyBranch = siteProperties['over_b'] * nTotal / 100.0
    nCanopyDead = siteProperties['over_d'] * nTotal / 100.0
    nCanopyGreen = siteProperties['over_g'] * nTotal / 100.0
    
    # Midstory Layer
    nMidBranch = siteProperties['mid_b'] * nTotal / 100.0
    nMidGreen = siteProperties['mid_g'] * nTotal / 100.0
    nMidDead = siteProperties['mid_d'] * nTotal / 100.0
    
    # Ground Layer
    nGroundDeadLitter = (siteProperties['dead'] + siteProperties['litter']) * nTotal / 100.0
    nGroundCrustDistRock = (siteProperties['crust'] + siteProperties['dist'] + siteProperties['rock']) * nTotal / 100.0
    nGroundGreen = siteProperties['green'] * nTotal / 100.0
    nGroundCrypto = siteProperties['crypto'] * nTotal / 100.0
    
    # Work out the canopy elements as viewed from above
    canopyFoliageProjectiveCover = nCanopyGreen / (nTotal - nCanopyBranch)
    canopyDeadProjectiveCover = nCanopyDead / (nTotal - nCanopyBranch)
    canopyBranchProjectiveCover = nCanopyBranch / nTotal * (1.0 - canopyFoliageProjectiveCover - canopyDeadProjectiveCover)
    canopyPlantProjectiveCover = (nCanopyGreen+nCanopyDead + nCanopyBranch) / nTotal
    
    # Work out the midstorey fractions
    midFoliageProjectiveCover = nMidGreen / nTotal
    midDeadProjectiveCover = nMidDead / nTotal
    midBranchProjectiveCover = nMidBranch / nTotal
    midPlantProjectiveCover = (nMidGreen + nMidDead + nMidBranch) / nTotal
    
    # Work out the midstorey  elements as viewed by the satellite using a gap fraction method
    satMidFoliageProjectiveCover = midFoliageProjectiveCover * (1 - canopyPlantProjectiveCover)
    satMidDeadProjectiveCover = midDeadProjectiveCover * (1 - canopyPlantProjectiveCover)
    satMidBranchProjectiveCover = midBranchProjectiveCover * (1 - canopyPlantProjectiveCover)
    satMidPlantProjectiveCover = midPlantProjectiveCover * (1 - canopyPlantProjectiveCover)
    
    # Work out the groundcover fractions as seen by the observer
    groundPVCover = nGroundGreen / nTotal
    groundNPVCover = nGroundDeadLitter / nTotal
    groundBareCover = nGroundCrustDistRock / nTotal
    groundCryptoCover = nGroundCrypto / nTotal
    groundTotalCover = (nGroundGreen + nGroundDeadLitter + nGroundCrustDistRock) / nTotal
    
    # Work out the ground cover proportions as seen by the satellite
    satGroundPVCover = groundPVCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundNPVCover = groundNPVCover * ( 1- midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundBareCover = groundBareCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundCryptoCover = groundCryptoCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundTotalCover = groundTotalCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    
    # Final total covers calculated using gap probabilities through all layers
    totalPVCover = canopyFoliageProjectiveCover + satMidFoliageProjectiveCover + satGroundPVCover
    totalNPVCover = canopyDeadProjectiveCover + canopyBranchProjectiveCover + satMidDeadProjectiveCover + satMidBranchProjectiveCover + satGroundNPVCover
    totalBareCover = satGroundBareCover
    totalCryptoCover = satGroundCryptoCover
    
    return np.array([totalPVCover,totalNPVCover+totalCryptoCover,totalBareCover])

In [3]:
# this block is to load and mask FC in native projection
def _native_tr(xx):
    """
    Loads data in its native projection. It performs the following:

    1. Load all fc and WOfS bands
    2. Set the high terrain slope flag to 0
    3. Set all pixels that are not clear and dry to NODATA
    4. Calculate the clear wet pixels
    5. Drop the WOfS band
    """
    water = xx.water & 0b1110_1111
    dry = water == 0
    xx = xx.drop_vars(["water"])
    xx = keep_good_only(xx, dry, nodata=255)
    return xx

def _fuser(xx):
    xx = _xr_fuse(xx, partial(_fuse_mean_np, nodata=255), '')
    return xx

def filter(groups, size=2):
    for _, ds_group in groups:
        ds_group = tuple(ds_group)
        if len(ds_group) == size:
            yield ds_group

def ds_align(datasets):
    datasets.sort(key=lambda ds: (ds.center_time, ds.metadata.region_code))
    paired_dss = groupby(datasets, key=lambda ds: (ds.center_time, ds.metadata.region_code))
    paired_dss = filter(paired_dss)
    map_fuse_func = lambda x: fuse_ds(*x)
    dss = map(map_fuse_func, paired_dss)
    return dss

In [4]:
def load_fc(query, platform="landsat-8", coef=None, nday=True):
    """
        Load FC by query and platform
        inputs:
            query: query dict to dc.find_datasets
            platfrom: to load data by different sensor
            coef: the regression coefficients applied to LS8
            nday: average over the days (True) or select the closest day (False)
        outputs:
            a numpy array of FC
    """
    print('query time', query.obs_time)
    c3_query = {'geopolygon': Geometry(query.query_poly, crs=CRS('EPSG:3577'))}
    c3_query['time'] = (query.start_time, query.end_time)
    geobox = GeoBox.from_geopolygon(c3_query['geopolygon'], (-30, 30), crs='epsg:3577')
    c3_ls8_datasets = dc.find_datasets(product=['ga_ls_fc_3', 'ga_ls_wo_3'], **c3_query,
                 platform=platform, group_by="solar_day")
    c3_ls8_datasets = ds_align(c3_ls8_datasets)
    try:
        c3_ls8 = load_with_native_transform(
            c3_ls8_datasets,
            bands=["water", "pv", "bs", "npv"],
            geobox=geobox,
            native_transform=_native_tr,
            fuser=_fuser,
            groupby="solar_day",
            resampling="bilinear",
            chunks={'y': -1, 'x': -1},
        )
    except ValueError as e:
        print(e)
        return np.array([np.nan]*3)
    c3_ls8 = c3_ls8.where(c3_ls8 < 255)
    if coef is not None:
        for var in c3_ls8.data_vars:
            print(f"apply coef {coef[var]} on {var}")
            c3_ls8[var] = (c3_ls8[var] * coef[var][1] + coef[var][0]).clip(min=0)
    if not nday:
        c3_ls8 = c3_ls8.isel(dict(spec=np.argmin(np.abs(np.datetime64(query.obs_time) - c3_ls8.solar_day.data))))
    return c3_ls8.drop('spatial_ref').mean().compute().to_array().data
    

### C3 Reprocessing coefficients for updated FC C3 computation for Landsat 8

In [5]:
#extra fc coefficients updated as per https://github.com/GeoscienceAustralia/dea-config/blob/master/prod/services/alchemist/ga_ls_fc_3/ga_ls_fc_3.alchemist.yaml . Ran into a namespace issue, do not call these' fc_coefficients'
extra_coefficients = {'bs':[2.45, 0.9499],
                   'pv':[2.77, 0.9481],
                   'npv':[-0.73, 0.9578]}                   

### Load field data in from csv

In [6]:
# Load star_transects field data 
field = pd.read_csv('star_transects.csv')

In [7]:
# read field data from file into 'field' dataframe and create a geopandas geodataframe of all the points
field = field.rename(columns={'geom': 'geometry'})
field['geometry'] = field.geometry.apply(wkt.loads)
field = gpd.GeoDataFrame(field)

In [8]:
#field data comes in in WGS84
field.crs = {'init': 'EPSG:4326'}
#transform to Australian Albers Equal Area 
field = field.to_crs({'init':'EPSG:3577'})

In [9]:
# Filter data by date - get dates later than the first observation of the satellite
field['obs_time'] = pd.to_datetime(field.obs_time)
field = field.loc[field['obs_time'] > np.datetime64('2013-06-01')]

### Calculate field measured fractions

In [10]:
# Calculate field measured fractions
field = field.merge(
    field.apply(fractionalCoverSatView, axis=1, result_type= 'expand').rename(
        columns = {0:'total_pv',1:'total_npv',2:'total_bs'}),
    left_index=True, right_index=True)
field = field[field.apply(lambda x: x['total_pv']+x['total_npv']+x['total_bs'], axis=1) >0.95]

### calculate fractional cover for satellite observations

In [11]:
query = pd.DataFrame({'obs_time': field.obs_time,
                        'start_time': field.obs_time - timedelta(days=15),
                           'end_time':field.obs_time + timedelta(days=15),
                           'query_poly': field.geometry.buffer(50, cap_style=3)})

In [12]:
fc_loaded_ls7 = query.apply(load_fc, platform='landsat-7', nday=False, axis=1, result_type= 'expand')
fc_loaded_ls7 = fc_loaded_ls7.rename(columns={0: 'pv_ls7', 1: 'bs_ls7', 2: 'npv_ls7'})

query time 2014-01-21 00:00:00


CPLReleaseMutex: Error = 1 (Operation not permitted)


query time 2014-06-25 00:00:00
query time 2014-01-23 00:00:00
query time 2014-06-24 00:00:00
query time 2014-05-29 00:00:00
query time 2014-05-01 00:00:00
query time 2014-01-21 00:00:00
query time 2014-06-25 00:00:00
query time 2014-04-30 00:00:00
query time 2014-05-29 00:00:00
query time 2014-05-01 00:00:00
query time 2014-01-20 00:00:00
query time 2014-01-21 00:00:00
query time 2014-05-30 00:00:00
query time 2014-01-22 00:00:00
query time 2014-05-29 00:00:00
query time 2014-04-24 00:00:00
query time 2014-01-21 00:00:00
query time 2014-04-30 00:00:00
query time 2014-06-24 00:00:00
query time 2014-05-30 00:00:00
query time 2014-05-29 00:00:00
query time 2014-06-25 00:00:00
query time 2014-01-21 00:00:00
query time 2014-06-23 00:00:00
query time 2014-05-02 00:00:00
query time 2014-04-28 00:00:00
query time 2014-01-22 00:00:00
query time 2014-05-30 00:00:00
query time 2014-01-22 00:00:00
query time 2014-06-25 00:00:00
query time 2014-06-24 00:00:00
query time 2014-06-24 00:00:00
query ti

In [13]:
fc_loaded = query.apply(load_fc, nday=False, axis=1, result_type= 'expand')
fc_loaded = fc_loaded.rename(columns={0: 'pv_o', 1: 'bs_o', 2: 'npv_o'})

query time 2014-01-21 00:00:00
query time 2014-06-25 00:00:00
query time 2014-01-23 00:00:00
query time 2014-06-24 00:00:00
query time 2014-05-29 00:00:00
query time 2014-05-01 00:00:00
query time 2014-01-21 00:00:00
query time 2014-06-25 00:00:00
query time 2014-04-30 00:00:00
query time 2014-05-29 00:00:00
query time 2014-05-01 00:00:00
query time 2014-01-20 00:00:00
query time 2014-01-21 00:00:00
query time 2014-05-30 00:00:00
query time 2014-01-22 00:00:00
query time 2014-05-29 00:00:00
query time 2014-04-24 00:00:00
query time 2014-01-21 00:00:00
query time 2014-04-30 00:00:00
query time 2014-06-24 00:00:00
query time 2014-05-30 00:00:00
query time 2014-05-29 00:00:00
query time 2014-06-25 00:00:00
query time 2014-01-21 00:00:00
query time 2014-06-23 00:00:00
query time 2014-05-02 00:00:00
query time 2014-04-28 00:00:00
query time 2014-01-22 00:00:00
query time 2014-05-30 00:00:00
query time 2014-01-22 00:00:00
query time 2014-06-25 00:00:00
query time 2014-06-24 00:00:00
query ti

In [14]:
fc_loaded_with_coef = query.apply(load_fc, nday=False, coef=extra_coefficients, axis=1, result_type= 'expand')
fc_loaded_with_coef = fc_loaded_with_coef.rename(columns={0: 'pv', 1: 'bs', 2: 'npv'})

query time 2014-01-21 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
apply coef [-0.73, 0.9578] on npv
query time 2014-06-25 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
apply coef [-0.73, 0.9578] on npv
query time 2014-01-23 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
apply coef [-0.73, 0.9578] on npv
query time 2014-06-24 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
apply coef [-0.73, 0.9578] on npv
query time 2014-05-29 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
apply coef [-0.73, 0.9578] on npv
query time 2014-05-01 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
apply coef [-0.73, 0.9578] on npv
query time 2014-01-21 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
apply coef [-0.73, 0.9578] on npv
query time 2014-06-25 00:00:00
apply coef [2.77, 0.9481] on pv
apply coef [2.45, 0.9499] on bs
ap

In [15]:
# join everything into a huge dataframe
# not really necessary other than recycling the plot function
field = field.merge(fc_loaded_with_coef, how = 'inner', left_index=True, right_index=True)
field = field.merge(fc_loaded, how = 'inner', left_index=True, right_index=True)
field = field.merge(fc_loaded_ls7, how = 'inner', left_index=True, right_index=True)

###  `field` is a massive geodataframe full of results which we can write to a shapefile to preserve our results
Within this GeoDataFrame, the columns `bs, pv, npv` are from LS8 FC after applying the treatment,
`bs_o, pv_o, npv_o` are from LS8 FC before applying the treatment, `bs_ls7, pv_ls7, npv_ls7` are from LS7 FC

The columns `total_pv, total_npc, total_bs` are calculated from the field-measured fractions.

In [5]:
field.to_file('field_reprocessed_fc_%s.shp'%''.join(sensor_name.split()))
field = gpd.read_file('field_reprocessed_fc_%s.shp'%''.join(sensor_name.split()))

### Plot field and satellite data comparison

In [None]:
def validate(field_all, title=None):
    bands = ['pv','npv','bs']
    columns = ['total_%s'%s for s in bands] + bands
    field_ls8 = field_all[columns][(field_all[bands]>=0.).all(axis=1)]
    columns = ['total_%s'%s for s in bands] + ['%s_o'%s for s in bands]
    field_ls8_o =  field_all[columns][(field_all[['%s_o'%s for s in bands]]>=0.).all(axis=1)]
    columns = ['total_%s'%s for s in bands] + ['%s_ls7'%s for s in bands]
    field_ls7 =  field_all[columns][(field_all[['%s_ls7'%s for s in bands]]>=0.).all(axis=1)]
    print("# of validation points:", len(field_all), '\n')
    
    regr = linear_model.RANSACRegressor() #create linear regression model, use ransac to factor in the noises
    
    #set up plot for results
    f = plt.figure(figsize=(20,20))
    gs = gridspec.GridSpec(2,2)
    xedges=yedges=list(np.arange(0,102,2))
    X, Y = np.meshgrid(xedges, yedges)
    cmname='YlGnBu'
    if title: plt.suptitle(title)
    ax1 = plt.subplot(gs[0])
    field.plot(markersize=10, ax= ax1, color='r')
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax1.set_title('Field Sites')
    ax1.text(0.05, 0.05, "%d points"%len(field), transform=ax1.transAxes)
    
    for band_id, band in enumerate(bands): 

        sr = spearmanr((field_ls8['total_%s'%band].to_numpy() * 100).reshape(-1, 1),
                 (field_ls8[band].to_numpy()).reshape(-1, 1))[0]
        rmse = np.sqrt(mean_squared_error((field_ls8['total_%s'%band].to_numpy() * 100).reshape(-1, 1),
                 (field_ls8[band].to_numpy()).reshape(-1, 1)))
        
        ax1 = plt.subplot(gs[band_id+1])
        ax1.scatter(field_ls8['total_%s'%band].to_numpy() * 100, field_ls8[band].to_numpy(), s=20,
                    facecolors='darkorange', edgecolor='face', alpha=0.5, label='After')
        ax1.scatter(field_ls8_o['total_%s'%band].to_numpy() * 100, field_ls8_o["%s_o"%band].to_numpy(), s=20,
                    facecolors='SteelBlue', edgecolor='face', marker='d', alpha=0.5, label='Before')
        ax1.scatter(field_ls7['total_%s'%band].to_numpy() * 100, field_ls7["%s_ls7"%band].to_numpy(), s=20,
                    facecolors='brown', edgecolor='face', marker='s', alpha=0.5, label='LS7')
        ax1.set_title(band)
        
        ax1.plot([0,100],[0,100])
        regr.fit((field_ls8['total_%s'%band].to_numpy() * 100).reshape(-1, 1),
                 (field_ls8[band].to_numpy()).reshape(-1, 1)) # plot the linear regression fit
        ax1.plot(np.arange(0,110,10), regr.predict(np.arange(0,110,10)[:,np.newaxis]), 
                 '--', linewidth=2, color='red', label='After')
        regr.fit((field_ls8_o['total_%s'%band].to_numpy() * 100).reshape(-1, 1), 
                 (field_ls8_o["%s_o"%band].to_numpy()).reshape(-1, 1)) # plot the linear regression fit
        ax1.plot(np.arange(0,110,10), regr.predict(np.arange(0,110,10)[:,np.newaxis]),
                 '--', linewidth=2, color='blue', label='Before')      
        regr.fit((field_ls7['total_%s'%band].to_numpy() * 100).reshape(-1, 1), 
                 (field_ls7["%s_ls7"%band].to_numpy()).reshape(-1, 1)) # plot the linear regression fit
        ax1.plot(np.arange(0,110,10), regr.predict(np.arange(0,110,10)[:,np.newaxis]),
                 '--', linewidth=2, color='black', label='LS7')    
        ax1.text(5, 95, 'spearmanr = {0:.2f}'.format(sr))
        ax1.text(5, 90, 'rmse = {0:.2f}'.format(rmse))
        ax1.set_xlabel('Field Measured')
        ax1.set_ylabel('%s FC'%title.upper())
        ax1.set_xlim((0,100))
        ax1.set_ylim((0,100))
    plt.tight_layout()
    handles, labels = ax1.get_legend_handles_labels()
    f.legend(handles, labels, loc='upper left', ncol=2)
    
    f.savefig('validate_reprocessed_%s.png'%''.join(title.split()),  bbox_inches='tight')

In [None]:
validate(field, title='LS8')

## Results
The plot from the validation against the fields data shows:
* All the data points shown in the plot are valid, i.e., masked out cloud/shadow/water
* Red dash line is LS8 after treatment, Blue dash line is LS8 before treatment
* After applying treatment, LS8-after is closer to LS7 (red dash line vs black dash line) on all bands
* On PV, LS7 and LS8-after (red dash line and black dash line) is closer to the "ground truth", on NPV and BS, not so much
* The fields data are very biased towards NPV and BS

__We successfully applied the reprocessing to make LS8 FC more similar to LS7 FC__

<img align="left" 
src="validate_reprocessed_LS8.png">

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/GeoscienceAustralia/dea-notebooks).

**Last modified:** March 2022

**Compatible datacube version:** 

In [3]:
print(datacube.__version__)

1.8.6


In [None]:
## Tags
Browse all available tags on the DEA User Guide's [Tags Index](https://docs.dea.ga.gov.au/genindex.html)