# Seasonal Accuracy Assessment of Water Observations from Space (WOfS) Product in Africa<img align="right" src="../Supplementary_data/DE_Africa_Logo_Stacked_RGB_small.jpg">

## Description
Now that we have run WOfS classification for each AEZs in Africa, its time to conduct seasonal accuracy assessment for each AEZ in Africa which is already compiled and stored in the following folder:`Results/WOfS_Assessment/Point_Based/ValidPoints_Per_AEZ`.

Accuracy assessment for WOfS product in Africa includes generating a confusion error matrix for a WOFL binary classification.
The inputs for the estimating the accuracy of WOfS derived product are a binary classification WOFL layer showing water/non-water and a shapefile containing validation points collected by [Collect Earth Online](https://collect.earth/) tool. Validation points are the ground truth or actual data while the extracted value for each location from WOFL is the predicted value. 

This notebook will explain how you can perform seasonal accuracy assessment for WOfS starting with `Western` AEZ using collected ground truth dataset. It will output a confusion error matrix containing overall, producer's and user's accuracy, along with the F1 score for each class.

## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.

### Load packages
Import Python packages that are used for the analysis.

In [1]:
%matplotlib inline

import sys
import os
import rasterio
import xarray
import glob
import numpy as np
import pandas as pd
import seaborn as sn
import geopandas as gpd
import matplotlib.pyplot as plt
import scipy, scipy.ndimage
import warnings
warnings.filterwarnings("ignore") #this will suppress the warnings for multiple UTM zones in your AOI 

sys.path.append("../Scripts")
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
from sklearn.metrics import confusion_matrix, accuracy_score 
from sklearn.metrics import plot_confusion_matrix, f1_score  
from deafrica_tools.plotting import map_shapefile,display_map, rgb
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.datahandling import wofs_fuser, mostcommon_crs,load_ard,deepcopy
from deafrica_tools.dask import create_local_dask_cluster

### Analysis Parameters 

- CEO : groundtruth points containing valid points in each AEZ containing WOfS assigned classes, WOfS clear observations and the labels identified by analyst in each calendar month 
- input_data : dataframe for further analysis and accuracy assessment 

### Load the Dataset

Validation points that are valid for each AEZ  

In [2]:
#Read the valid ground truth data 
CEO = '../Results/WOfS_Assessment/Beta/Point_Based/ValidPoints_Per_AEZ/ValidationPoints_Western.csv'

df = pd.read_csv(CEO,delimiter=",")

In [3]:
#explore the dataframe
df.columns

In [4]:
#rename a column in dataframe 
input_data = df.drop(['Unnamed: 0'], axis=1)
input_data = input_data.rename(columns={'WATERFLAG':'ACTUAL'})

In [5]:
#The table contains each calendar month as well as CEO and WOfS lables for each validation points 
input_data

In [6]:
#Counting the number of rows in valid points dataframe 
count = input_data.groupby('PLOT_ID',as_index=False,sort=False).last()

In [7]:
count

From the table, choose those rows that are in Wet season and also choose those in Dry season, then save them in separate tables. 

In [8]:
#setting the months that are identified as wet in the AEZ using Climatology dataset  
WetMonth = [5,6,7,8,9,10]

In [9]:
#identifying the points that are in wet season and counting their numbers 
Wet_Season = input_data[input_data['MONTH'].isin(WetMonth)]
count_Wet_Season = Wet_Season.groupby('PLOT_ID',as_index=False,sort=False).last()
count_Wet_Season

In [10]:
#setting the months that are identified as dry in the AEZ using Climatology dataset then counting the points that are in dry season 
Dry_Season = input_data[~input_data['MONTH'].isin(WetMonth)]
count_Dry_Season = Dry_Season.groupby('PLOT_ID',as_index=False,sort=False).last()
count_Dry_Season

Some points are in both dry and wet seasons as the number of points show.

### Create a Confusion Matrix 

In [11]:
confusion_matrix = pd.crosstab(Wet_Season['ACTUAL'],Wet_Season['PREDICTION'],rownames=['ACTUAL'],colnames=['PREDICTION'],margins=True)
confusion_matrix

`Producer's Accuracy` is the map-maker accuracy showing the probability that a certain class on the ground is classified. Producer's accuracy complements error of omission. 

In [12]:
confusion_matrix["Producer's"] = [confusion_matrix.loc[0][0] / confusion_matrix.loc[0]['All'] * 100, confusion_matrix.loc[1][1] / confusion_matrix.loc[1]['All'] *100, np.nan]
confusion_matrix

`User's Accuracy` is the map-user accuracy showing how often the class on the map will actually be present on the ground. `User's accuracy` shows the reliability. It is calculated based on the total number of correct classification for a particular class over the total number of classified sites.

In [13]:
users_accuracy = pd.Series([confusion_matrix[0][0] / confusion_matrix[0]['All'] * 100,
                                confusion_matrix[1][1] / confusion_matrix[1]['All'] * 100]).rename("User's")

confusion_matrix = confusion_matrix.append(users_accuracy)
confusion_matrix 

`Overal Accuracy` shows what proportion of reference(actual) sites mapped correctly.

In [14]:
confusion_matrix.loc["User's", "Producer's"] = (confusion_matrix[0][0] + confusion_matrix[1][1]) / confusion_matrix['All']['All'] * 100
confusion_matrix

In [15]:
input_data['PREDICTION'] = input_data['PREDICTION'] .astype(str).astype(int)

The F1 score is the harmonic mean of the precision and recall, where an F1 score reaches its best value at 1(perfect precision and recall), and is calculated as:

In [16]:
fscore = pd.Series([(2*(confusion_matrix.loc["User's"][0]*confusion_matrix.loc[0]["Producer's"]) / (confusion_matrix.loc["User's"][0] + confusion_matrix.loc[0]["Producer's"])) / 100,
                   f1_score(input_data['ACTUAL'],input_data['PREDICTION'])]).rename("F-score")
confusion_matrix = confusion_matrix.append(fscore)

In [17]:
confusion_matrix

In [18]:
confusion_matrix = confusion_matrix.round(decimals=2)

In [19]:
confusion_matrix = confusion_matrix.rename(columns={'0':'NoWater','1':'Water', 0:'NoWater',1:'Water','All':'Total'},index={'0':'NoWater','1':'Water',0:'NoWater',1:'Water','All':'Total'})

In [20]:
confusion_matrix

In [21]:
confusion_matrix.to_csv('../Results/WOfS_Assessment/Beta/Point_Based/ConfusionMatrix/Western_WetSeason_confusion_matrix.csv')

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Last modified:** January 2020

**Compatible datacube version:** 

## Tags
Browse all available tags on the DE Africa User Guide's [Tags Index](https://) (placeholder as this does not exist yet)