# Accuracy Assessment of Water Observations from Space (WOfS) Product in Africa


## Description
Now that we have run WOfS classification for each AEZs in Africa, its time to conduct an accuracy assessment. The data used for assessing the accuracy was collected previously and set aside. It is stored in the Results folder: `Results/WOfS_Assessment/Point_Based/Intermediate_Per_AEZ`.

Accuracy assessment for WOfS product in Africa includes generating a confusion error matrix for a WOFL binary classification.
The inputs for the estimating the accuracy of WOfS derived product are a binary classification WOFL layer showing water/non-water and a shapefile containing validation points collected by [Collect Earth Online](https://collect.earth/) tool. Validation points are the ground truth or actual data while the extracted value for each location from WOFL is the predicted value. 

This notebook will explain how you can perform accuracy assessment for WOfS using collected ground truth dataset. It will output a confusion error matrix containing overall, producer's and user's accuracy, along with the F1 score for each class.


## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.

### Load packages
Import Python packages that are used for the analysis.

In [1]:
%matplotlib inline

import sys
import os
import rasterio
import xarray
import glob
import numpy as np
import pandas as pd
import seaborn as sn
import geopandas as gpd
import matplotlib.pyplot as plt
import scipy, scipy.ndimage
import warnings
warnings.filterwarnings("ignore") #this will suppress the warnings for multiple UTM zones in your AOI 

from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
from sklearn.metrics import confusion_matrix, accuracy_score 
from sklearn.metrics import plot_confusion_matrix, f1_score  
from deafrica_tools.plotting import map_shapefile,display_map, rgb
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.datahandling import wofs_fuser, mostcommon_crs,load_ard,deepcopy
from deafrica_tools.dask import create_local_dask_cluster

### Load the datasets
Ground truth points 

In [2]:
#Read the ground truth data 
#For each AEZ
file_path = ('../02_Validation_results/WOfS_Assessment/wofs_ls/')
validation_files = [i for i in glob.glob(os.path.join(file_path, '*.{}'.format('csv')))]
validation_files

['../02_Validation_results/WOfS_Assessment/wofs_ls/Sahel_wofs_ls_valid.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Northern_wofs_ls_valid.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Eastern_wofs_ls_valid.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Central_wofs_ls_valid.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Indian_ocean_wofs_ls_valid.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Western_wofs_ls_valid.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Southern_wofs_ls_valid.csv']

In [3]:
t=[]
for v in validation_files:
    df = pd.read_csv(v, delimiter=",").rename(columns={"WATERFLAG": "ACTUAL"})
    aez = v[49:-18]
    # setting the class_wet column to be prediction
    df["PREDICTION"] = df["CLASS_WET"].apply(lambda x: "1" if x >= 1 else "0")

    # Remove the duplicated plot IDs which means those that are labeled for similar month as 0, 1, 2  or 3.
    df = df.drop_duplicates(["LAT", "LON", "MONTH"], keep=False)

    # Filter out those rows that are labeled more than 1 or there is no clear WOfS/SCL observations
    indexNames = df[
        (df["ACTUAL"] > 1) | (df["CLEAR_OBS"] == 0.0) | (df["CLEAR_OBS"].isna())
    ].index
    df = df.drop(indexNames)
    
    #how many samples in total
    t.append(len(df))
    
    # create a confusion matrix
    confusion_matrix = pd.crosstab(
        df["ACTUAL"],
        df["PREDICTION"],
        rownames=["ACTUAL"],
        colnames=["PREDICTION"],
        margins=True,
    )
    
    #producer's accuracy
    confusion_matrix["Producer's"] = [
        confusion_matrix.loc[0][0] / confusion_matrix.loc[0]["All"] * 100,
        confusion_matrix.loc[1][1] / confusion_matrix.loc[1]["All"] * 100,
        np.nan,
    ]
    
    #user's acc
    users_accuracy = pd.Series(
        [
            confusion_matrix['0'][0] / confusion_matrix['0']["All"] * 100,
            confusion_matrix['1'][1] / confusion_matrix['1']["All"] * 100,
        ]
    ).rename("User's")

    confusion_matrix = confusion_matrix.rename({"0": 0, "1": 1}, axis=1).append(
        users_accuracy
    )
    
    #overall acc
    confusion_matrix.loc["User's", "Producer's"] = (
        (confusion_matrix[0][0] + confusion_matrix[1][1])
        / confusion_matrix["All"]["All"]
        * 100
    )
    df["PREDICTION"] = df["PREDICTION"].astype(str).astype(int)
    
    #fscore
    fscore = pd.Series(
        [
            (
                2
                * (
                    confusion_matrix.loc["User's"][0]
                    * confusion_matrix.loc[0]["Producer's"]
                )
                / (
                    confusion_matrix.loc["User's"][0]
                    + confusion_matrix.loc[0]["Producer's"]
                )
            )
            / 100,
            f1_score(df["ACTUAL"], df["PREDICTION"]),
        ]
    ).rename("F-score")
    
    #tidy confusion matrix
    confusion_matrix = confusion_matrix.append(fscore)
    confusion_matrix = confusion_matrix.round(decimals=2)
    confusion_matrix = confusion_matrix.rename(
        columns={
            "0": "NoWater",
            "1": "Water",
            0: "NoWater",
            1: "Water",
            "All": "Total",
        },
        index={"0": "NoWater", "1": "Water", 0: "NoWater", 1: "Water", "All": "Total"},
    )
    print('\n')
    print('n samples for', aez,':', len(df))
    print(confusion_matrix)
    # saving out the confusion matrix
    confusion_matrix.to_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/ConfusionMatrix/'+aez+'_confusion_matrix.csv')



n samples for Sahel : 1224
         NoWater   Water   Total  Producer's
ACTUAL                                      
NoWater   390.00  120.00   510.0       76.47
Water      80.00  634.00   714.0       88.80
Total     470.00  754.00  1224.0         NaN
User's     82.98   84.08     NaN       83.66
F-score     0.80    0.86     NaN         NaN


n samples for Northern : 1125
         NoWater   Water   Total  Producer's
ACTUAL                                      
NoWater   393.00   71.00   464.0       84.70
Water     113.00  548.00   661.0       82.90
Total     506.00  619.00  1125.0         NaN
User's     77.67   88.53     NaN       83.64
F-score     0.81    0.86     NaN         NaN


n samples for Eastern : 2669
         NoWater    Water   Total  Producer's
ACTUAL                                       
NoWater   651.00    63.00   714.0       91.18
Water     277.00  1678.00  1955.0       85.83
Total     928.00  1741.00  2669.0         NaN
User's     70.15    96.38     NaN       87.26
F-

In [4]:
print('Total number of samples across all regions:', sum(t))

Total number of samples across all regions: 10323


***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Last modified:** January 2020

**Compatible datacube version:** 

## Tags
Browse all available tags on the DE Africa User Guide's [Tags Index](https://) (placeholder as this does not exist yet)