# Compare Fractional Cover to the reprocessed FC 

# Background

2022 Update:
We are running this notebook to check that the reprocessed FC in 02/2022 will improve the alignment of Landsat 8 FC with Landsat 5 and 7 FC. Discrepancies were observed after the first full DEA Collection 3 Landsat Vegetation Fractional Cover processing.


- This requires us to recalculate the FC to demonstrate the fix

- This notebook will ideally demonstrate that the reprocessed FC aligns better with the field data and with LS 5 and LS 7 FC than the original FC C3 data did. 

- For reprocessed FC, take coefficients from here: https://github.com/GeoscienceAustralia/dea-config/blob/master/prod/services/alchemist/ga_ls_fc_3/ga_ls_fc_3.alchemist.yaml 

- the updated FC coefficients are applied after the first FC calculation. The first FC calculation is performed using the existing FC module. band * scale + interception will be good enough, e.g. bs * 0.9499 + 2.45 

 See https://github.com/GeoscienceAustralia/fc/pull/48/files
 fc_coefficients:
 
 bs:
 - 2.45
- 0.9499

pv:
- 2.77
- 0.9481

npv:
- -0.73
- 0.9578 

# Description

- Find field data; this field data is the Star transects from [the JRSRP geoserver wfs service](https://field-geoserver.jrsrp.com/geoserver/aus/wfs?service=wfs&version=1.1.0&request=GetFeature&typeNames=aus:star_transects&outputFormat=csv) which can be visualised through [the TERN Landscapes-JRSRP Field Data Portal](https://field.jrsrp.com/) and is available as a csv
- Load corresponding surface reflectance from datacube or save file
- Calculate FC and compare to field data using scikit-learn

## Install additional packages before running this notebook the first time

The FC repository is used in this notebook to calculate DEA Fractional Cover. 
- I cloned the fractional cover repository https://github.com/GeoscienceAustralia/fc into ~/dev/fc on 07/02/2022 before PR #48 implemented (Emma's fix of ls8 fc coefficients to deal with changes in the Collection Upgrade of DEA data from Collection 2 to Collection 3 Landsat data).
- To install the fractional cover module, I ran `#!pip install --extra-index-url=https://packages.dea.ga.gov.au/ fc` on 07/02/2022 (prior to PR #48 being approved)
- If you have not installed the fractional cover module, you will need to do so before running this notebook. This can be completed either by cloning the FC module as I did above or by running the command in the cell below, as I use in this notebook following instructions in the `~/dev/fc/README.rst` file

In [None]:
#!pip install --extra-index-url=https://packages.dea.ga.gov.au/ fc

In [None]:
#!pip install dea-tools

# Getting started


To run this analysis, choose the satellite you are comparing to the field data in the next cell by uncommenting its name 
e.g. `sensor_name = 'Landsat 8'`
Then run all the cells in the notebook, starting with the "Define sensor name" cell.

Choose which sensor we are comparing to the field data

## Define sensor name
Choose which satellite to compare to field data

In [None]:
sensor_name = 'Landsat 8'

## Load packages

In [None]:
%matplotlib inline 

import datacube
from fc.fractional_cover import compute_fractions

import warnings; warnings.simplefilter('ignore')
import sys

#import dea-tools module - this should have been pip-installed earlier
#sys.path.insert(1, '../../../Tools/') #this may need moving later
from dea_tools.datahandling import load_ard

import numpy as np
import xarray as xr
import pandas as pd
import geopandas as gpd
from matplotlib import pyplot as plt
from matplotlib import gridspec 
from shapely import wkt

from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score
from scipy.stats import pearsonr, spearmanr, kendalltau

# instantiate a datacube
dc = datacube.Datacube()

### Define a function to compute the fractional covers as viewed by the satellite for the site


In [None]:
# Function to compute the fractional covers as viewed by the satellite for the site
# Required a site properties object

def fractionalCoverSatView(siteProperties):
    '''equations to calculate fractional cover from the csv data'''
    nTotal = siteProperties['num_points']
    
    # Canopy Layer
    nCanopyBranch = siteProperties['over_b'] * nTotal / 100.0
    nCanopyDead = siteProperties['over_d'] * nTotal / 100.0
    nCanopyGreen = siteProperties['over_g'] * nTotal / 100.0
    
    # Midstory Layer
    nMidBranch = siteProperties['mid_b'] * nTotal / 100.0
    nMidGreen = siteProperties['mid_g'] * nTotal / 100.0
    nMidDead = siteProperties['mid_d'] * nTotal / 100.0
    
    # Ground Layer
    nGroundDeadLitter = (siteProperties['dead'] + siteProperties['litter']) * nTotal / 100.0
    nGroundCrustDistRock = (siteProperties['crust'] + siteProperties['dist'] + siteProperties['rock']) * nTotal / 100.0
    nGroundGreen = siteProperties['green'] * nTotal / 100.0
    nGroundCrypto = siteProperties['crypto'] * nTotal / 100.0
    
    # Work out the canopy elements as viewed from above
    canopyFoliageProjectiveCover = nCanopyGreen / (nTotal - nCanopyBranch)
    canopyDeadProjectiveCover = nCanopyDead / (nTotal - nCanopyBranch)
    canopyBranchProjectiveCover = nCanopyBranch / nTotal * (1.0 - canopyFoliageProjectiveCover - canopyDeadProjectiveCover)
    canopyPlantProjectiveCover = (nCanopyGreen+nCanopyDead + nCanopyBranch) / nTotal
    
    # Work out the midstorey fractions
    midFoliageProjectiveCover = nMidGreen / nTotal
    midDeadProjectiveCover = nMidDead / nTotal
    midBranchProjectiveCover = nMidBranch / nTotal
    midPlantProjectiveCover = (nMidGreen + nMidDead + nMidBranch) / nTotal
    
    # Work out the midstorey  elements as viewed by the satellite using a gap fraction method
    satMidFoliageProjectiveCover = midFoliageProjectiveCover * (1 - canopyPlantProjectiveCover)
    satMidDeadProjectiveCover = midDeadProjectiveCover * (1 - canopyPlantProjectiveCover)
    satMidBranchProjectiveCover = midBranchProjectiveCover * (1 - canopyPlantProjectiveCover)
    satMidPlantProjectiveCover = midPlantProjectiveCover * (1 - canopyPlantProjectiveCover)
    
    # Work out the groundcover fractions as seen by the observer
    groundPVCover = nGroundGreen / nTotal
    groundNPVCover = nGroundDeadLitter / nTotal
    groundBareCover = nGroundCrustDistRock / nTotal
    groundCryptoCover = nGroundCrypto / nTotal
    groundTotalCover = (nGroundGreen + nGroundDeadLitter + nGroundCrustDistRock) / nTotal
    
    # Work out the ground cover proportions as seen by the satellite
    satGroundPVCover = groundPVCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundNPVCover = groundNPVCover * ( 1- midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundBareCover = groundBareCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundCryptoCover = groundCryptoCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    satGroundTotalCover = groundTotalCover * (1 - midPlantProjectiveCover) * (1 - canopyPlantProjectiveCover)
    
    # Final total covers calculated using gap probabilities through all layers
    totalPVCover = canopyFoliageProjectiveCover + satMidFoliageProjectiveCover + satGroundPVCover
    totalNPVCover = canopyDeadProjectiveCover + canopyBranchProjectiveCover + satMidDeadProjectiveCover + satMidBranchProjectiveCover + satGroundNPVCover
    totalBareCover = satGroundBareCover
    totalCryptoCover = satGroundCryptoCover
    
    return np.array([totalPVCover,totalNPVCover+totalCryptoCover,totalBareCover])

## Regressions coefficients for FC computation
These are the `Landsat 8 Fudge Factor` used to make the spectrally different sensor on Landsat 8 perform similarly to Landsats 5 and 7 in the Fractional Cover algorithm. These are incorporated in to the Collection 3 config as `regression_coefficients` in [C3 product definition yaml](https://github.com/opendatacube/datacube-alchemist/blob/C3_Processing/examples/c3_config_fc.yaml)

In [None]:
# coefficients for FC compute

#these were used in collection 2 but are not correct as don't multiply by 10000/ 1e4
ls8_coefficients_c2 = {'blue': [0.00041, 0.9747], 'green': [0.00289, 0.99779], 'red': [0.00274, 1.00446], 
                       'nir': [4e-05, 0.98906], 'swir1': [0.00256, 0.99467], 'swir2': [-0.00327, 1.02551]}

#use these ones, they are correct (need to multiply by 10000 as the algorithm expects 0-10000 not 0-1)
ls8_coefficients = {'blue': [0.00041*1e4, 0.9747], 'green': [0.00289*1e4, 0.99779], 'red': [0.00274*1e4, 1.00446], 
                       'nir': [4e-05*1e4, 0.98906], 'swir1': [0.00256*1e4, 0.99467], 'swir2': [-0.00327*1e4, 1.02551]}

s2_coefficients = {'blue':[-0.0022*1e4, 0.9551],
                   'green':[0.0031*1e4, 1.0582],
                   'red':[0.0064*1e4, 0.9871],
                   'nir':[0.012*1e4, 1.0187],
                   'swir1':[0.0079*1e4, 0.9528],
                   'swir2':[-0.0042*1e4, 0.9688]}

### C3 Reprocessing coefficients for updated FC C3 computation for Landsat 8

In [None]:
#fc coefficients updated as per https://github.com/GeoscienceAustralia/dea-config/blob/master/prod/services/alchemist/ga_ls_fc_3/ga_ls_fc_3.alchemist.yaml
fc_coefficients = {'bs':[2.45, 0.9499],
                   'pv':[2.77, 0.9481],
                   'npv':[-0.73, 0.9578]}                   

In [None]:
# # compute FC 
# def compute_fc(input_ds, regression_coefficients):
#     '''takes input dataset and multiplies by regression coefficients to compute fractional cover.
#     returns an xarray DataArray with pv,npv,bs,ue bands'''
    
#     input_data = input_ds.to_array().data
#     is_valid_array= (input_data >0).all(axis=0)
#     # Set nodata to 0                                                       
#     input_data[:, ~is_valid_array] = 0
#     # compute fractional_cover
#     output_data = compute_fractions(input_data, regression_coefficients)
#     output_data[:, ~is_valid_array] = -1
#     return xr.DataArray(output_data, dims=('band','y','x'),
#                         coords={'x':input_ds.x, 'y':input_ds.y, 'band':['pv', 'npv', 'bs', 'ue']})

In [None]:
# compute FC - updated with new fc_coefficients
def compute_fc(input_ds, regression_coefficients):#, fc_coefficients):
    '''takes input dataset and multiplies by regression coefficients to compute fractional cover.
    returns an xarray DataArray with pv,npv,bs,ue bands'''
    
    input_data = input_ds.to_array().data
    is_valid_array= (input_data >0).all(axis=0)
    # Set nodata to 0                                                       
    input_data[:, ~is_valid_array] = 0
    # compute fractional_cover
    output_data = compute_fractions(input_data, regression_coefficients)
    output_data[:, ~is_valid_array] = -1
    print(output_data)
    return xr.DataArray(output_data, dims=('band','y','x'),
                        coords={'x':input_ds.x, 'y':input_ds.y, 'band':['pv', 'npv', 'bs', 'ue']})

In [None]:
# select good pixels using pixel quality
def ls_good(pq):
    return masking.make_mask(pq, cloud_acca = "no_cloud", cloud_fmask = "no_cloud",
                             cloud_shadow_acca = "no_cloud_shadow",
                             cloud_shadow_fmask = "no_cloud_shadow",
                             contiguous = True)
def s2_good(pq):
    return pq == 1

## Set up the query for each sensor 

In [None]:
# spectral bands used for fractional cover calculation

#this is correct for collection3  
ls_bands = ['nbart_green','nbart_red','nbart_nir','nbart_swir_1','nbart_swir_2']

#this is correct for collection 3
s2_bands = ['nbart_green','nbart_red','nbart_nir_1','nbart_swir_2','nbart_swir_3']

# sensor specific configurations - note no Landsat 5 as doesn't overlap with site surveys

sensor_config = {'Landsat 7':{'startdate':'1999-05-01', #updated for collection3
                              'product':'ga_ls7e_ard_3', 
                              'bands':ls_bands,
                              'resolution':(-30,30), 
                              'fc_coefficients': None}, 
                 
                 'Landsat 8 noscaling':{'startdate':'2013-03-01',
                                        'product':'ga_ls8c_ard_3', 
                                        'bands':ls_bands,
                                        'resolution':(-30,30), 
                                        'fc_coefficients': None}, 
                 
                 'Landsat 8':{'startdate':'2013-03-01', 
                              'product':'ga_ls8c_ard_3', 
                              'bands':ls_bands,
                              'resolution':(-30,30), 
                              'fc_coefficients':ls8_coefficients}, 
                 
                 'Landsat 8 c2':{'startdate':'2013-03-01', 
                                      'product':'ga_ls8c_ard_3', 
                                      'bands':ls_bands,
                                      'resolution':(-30,30), 
                                      'fc_coefficients':ls8_coefficients_c2}, 
                 
                 'Sentinel 2A':{'startdate':'2015-07-01', 
                                'product':'s2a_ard_granule', 
                                'bands':s2_bands,
                                'resolution':(-10,10), 
                                'fc_coefficients':s2_coefficients},
                 
                'Sentinel 2B':{'startdate':'2017-06-01', 
                               'product':'s2b_ard_granule', 
                               'bands':s2_bands,
                               'resolution':(-10,10), 
                               'fc_coefficients':s2_coefficients}
                }

### Load field data in from csv

In [None]:
# Load star_transects field data 
field = pd.read_csv('star_transects_short.csv')
field['geometry'] = field.geom.apply(wkt.loads)
field = gpd.GeoDataFrame(field)

#field data comes in in WGS84
field.crs = {'init': 'EPSG:4326'}

#transform to Australian Albers Equal Area 
field = field.to_crs({'init':'EPSG:3577'})

In [None]:
# Filter data by date - get dates later than the first observation of the satellite
field = field.loc[field['obs_time'] > sensor_config[sensor_name]['startdate']]

### Calculate field measured fractions

In [None]:
# Calculate field measured fractions
field = field.merge(
    field.apply(fractionalCoverSatView, axis=1, result_type= 'expand').rename(
        columns = {0:'total_pv',1:'total_npv',2:'total_bs'}),
    left_index=True, right_index=True)
field = field[field.apply(lambda x: x['total_pv']+x['total_npv']+x['total_bs'], axis=1) >0.95]

### Match to albers tiles to check distribution

In [None]:
# Match to albers tiles to check distribution
albers_tiles = gpd.read_file('../Validation_against_field_data/Albers_Australia_Coast_and_Islands.shp')
albers_tiles.crs = {'init':'EPSG:3577'}
matched = gpd.sjoin(field, albers_tiles, how='inner', op = 'intersects')
field_tiles = albers_tiles.merge(matched.groupby('label')['FID'].count().sort_values(ascending=False).to_frame('count').reset_index()
    , on='label', how='right')
print("Total number of data points is",len(field))
print("Largest number of data points in a tile is", field_tiles.loc[field_tiles['count'].idxmax()]['count'])

### Visual check of field data distribution

In [None]:
# Visual check of field data distribution
f, axes = plt.subplots(1, 2, figsize=(10,3))
plt.suptitle('Density of field data (star transects per albers tile)', size =14)
ax0 = field.plot(markersize=1, ax=axes[0])
albers_tiles.plot(alpha=0.2, ax = ax0)
levels = list(range(0,300,100))
ax1 =field_tiles.plot(column='count', cmap = 'viridis', ax=axes[1], legend=True)
albers_tiles.plot(alpha=0.2, ax = ax1)
None

### calculate fractional cover

In [None]:
# calculate fractional cover from surface reflectance
def fractionalCover(row, sensor_config = sensor_config[sensor_name], plot_rad = 50, window_days = 15):
    '''this finds data within 15 days of an observation and 50 ??? of an observation'''

    # nodata default to return in case no data matches the chosen window around the field observation
    fc_dict = {'fc_time': '', 'pv': -1, 'npv': -1, 'bs': -1, 'pv_std': -1, 'npv_std': -1, 'bs_std': -1 }

    # define search - grab near observations in space and time
    x = row.geometry.x - plot_rad, row.geometry.x + plot_rad
    y = row.geometry.y - plot_rad, row.geometry.y + plot_rad
    time = (str(np.datetime64(row.obs_time) - np.timedelta64(window_days,'D')),
            str(np.datetime64(row.obs_time) + np.timedelta64(window_days,'D'))
           )

    #use a reusable query dictionary
    query = {'measurements':sensor_config['bands'],
             'group_by':'solar_day',
             'x' : x, 
             'y' : y,  
             'time' : time, 
             'crs' : 'EPSG:3577', #this defines the crs of the input query
             'resolution' : sensor_config['resolution'],
             'output_crs' : 'EPSG:3577'}          

    try: 
        #need to test if load_ard will return data here, as the alternative is that the data is not returned
        nbart = load_ard(dc,
            products = [sensor_config['product']], #need square brackets to deliver as a list
            min_gooddata=1, #need all valid pixels for the comparison
            fmask_categories=['valid'], #don't want snow or water in fractional cover comparison
            mask_pixel_quality=True, #this could give us some issues with dtype later on
            mask_contiguity='nbart_contiguity',
            ls7_slc_off=True,
            **query); #suppress output for this function to save printing space
        
    except ValueError:
        print(f"No data found at {x},{y},{time}")
        return fc_dict
    
    # If there aren't any results, this function will return nodata values.
    if len(nbart.time) == 0: return fc_dict

    ### choose the closest clear timestep to keep

    # only keep closest time
    nbart = nbart.isel(time=[np.abs(nbart.time-np.datetime64(row.obs_time)).argmin()])

    # compute FC
    fc = nbart.groupby('time').apply(compute_fc,
                                     regression_coefficients = sensor_config['fc_coefficients'],
                                    ).to_dataset(dim='band')

        # take average
    fc_mean = fc.where(fc>=0).groupby('time').mean(dim=['x','y'])
    fc_std = fc.where(fc>=0).groupby('time').std(dim=['x','y'])

    fc_dict['fc_time'] = fc.time.values[0].astype(str)
    for var_name in fc_mean.data_vars:
        fc_dict[var_name.lower()] = fc_mean[var_name].values[0]
        fc_dict[var_name.lower()+'_std'] = fc_std[var_name].values[0]
        
    return fc_dict

### Compute Fractional Cover satellite data that matches the field site data

In [None]:
%%time
# If there aren't any results, this function will return nodata values.
fractions = field.apply(fractionalCover, axis=1, result_type = 'expand')
field = field.merge(fractions, how = 'inner', left_index=True, right_index=True)
field = field[field['pv']>=0]

In [None]:
pd.set_option('display.max_columns', None)
field.head()

In [None]:
#field.to_file('field_full_with_fc_%s.shp'%''.join(sensor_name.split()))

In [None]:
#field = gpd.read_file('field_full_with_fc_%s.shp'%''.join(sensor_name.split()))

In [None]:
### Regress 

###  `field` is a massive geodataframe full of results which we can write to a shapefile to preserve our results
Within this GeoDataFrame, the columns `bs, pv, npv, pv_std, npv_std, bs_std, ue, ue_std` are calculated from surface reflectance time- matching our field data observations. 

The columns `total_pv, total_npc, total_bs` are calculated from the field-measured fractions.

In [None]:
pd.set_option('display.max_columns', None)

In [None]:
field.head()

In [None]:
# def bs_fn(bs):
#     '''takes bare soil result and scales and adds value. coefficients from: https://github.com/GeoscienceAustralia/fc/pull/48/files''' 
#     bs = (bs * 0.9499) + 2.45
#     return bs

# def pv_fn(pv):
#     '''takes pv result and scales and adds value. coefficients from: https://github.com/GeoscienceAustralia/fc/pull/48/files''' 
#     pv = (pv * 0.9481) + 2.77
#     return pv

# def npv_fn(npv):
#     '''takes npv result and scales and adds value. coefficients from: https://github.com/GeoscienceAustralia/fc/pull/48/files'''
#     npv = (npv * 0.9578) - 0.73
#     return npv

In [None]:
#check with Emma that this is reasonable
bs_fn(0)

In [None]:
# #Create columns in the results 'field' table to calculate the transformed, new FC results

# field['new_pv'] = field['pv'].apply(pv_fn)
# field['new_npv'] = field['npv'].apply(npv_fn)
# field['new_bs'] = field['bs'].apply(bs_fn)

In [None]:
field.head()


### Fix this section

In [None]:
# field.to_file('field_reprocessed_fc_%s.shp'%''.join(sensor_name.split()))
# field = gpd.read_file('field_reprocessed_fc_%s.shp'%''.join(sensor_name.split()))

In [None]:
def validate(field_all, title=None):
    field = field_all[(field_all[['new_pv','new_npv','new_bs']]>=0.).all(axis=1)]
    field = field[(field[['pv_std','npv_std','bs_std']]<=10.).all(axis=1)] #HAVENT CHANGED THIS YET #FIXME
    field = field[field['ue'] <25]
    print("# of validation points:", len(field))
    
    regr = linear_model.LinearRegression(fit_intercept=False)    

    f = plt.figure(figsize=(12,12))
    gs = gridspec.GridSpec(2,2)

    xedges=yedges=list(np.arange(0,102,2))
    X, Y = np.meshgrid(xedges, yedges)
    cmname='YlGnBu'
    if title: plt.suptitle(title)
    
    ax1 = plt.subplot(gs[0])
    field.plot(markersize=1, ax= ax1, color='r')
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax1.set_title('Field Sites')
    ax1.text(0.05, 0.05, "%d points"%len(field), transform=ax1.transAxes)
    
    rmses = []
    for band_id, band in enumerate(['BS','PV','NPV']):
        arr1 = field['total_%s'%band.lower()].values.ravel()*100.
        arr2 = field[band.lower()].values.ravel()
        regr.fit(arr1[:,np.newaxis], arr2[:,np.newaxis])
        
        print('Band:{0}, slope={1}, r2={2}'.format(band, regr.coef_[0][0],
                                                regr.score(arr1[:,np.newaxis], arr2[:,np.newaxis])))
        sr = spearmanr(arr1, arr2)[0]
        print('Correlations:', pearsonr(arr1, arr2)[0], sr, kendalltau(arr1, arr2)[0])
        rmse = np.sqrt(mean_squared_error(arr1, arr2))
        print('RMSE:',rmse)
        rmses.append(rmse)

        ax1 = plt.subplot(gs[band_id+1])
        ax1.scatter(arr1, arr2, s=3)
        ax1.set_title(band)
        
        ax1.plot([0,100],[0,100])
        ax1.plot(np.arange(0,100,10), regr.predict(np.arange(0,100,10)[:,np.newaxis]), ':')
        ax1.text(5, 95, 'spearmanr = {0:.2f}'.format(sr))
        ax1.text(5, 90, 'rmse = {0:.2f}'.format(rmse))
        ax1.set_xlabel('Field Measured')
        ax1.set_ylabel('%s FC'%sensor_name.upper())
        ax1.set_xlim((0,100))
        ax1.set_ylim((0,100))
    
    f.savefig('validate_reprocessed_%s.png'%''.join(sensor_name.split()))


validate(field, title=sensor_name)
