# Accuracy Assessment of Water Observations from Space (WOfS) Product in Africa


The cleaned validation samples from the previous step `02b_Convert_Institution_to_AEZ.ipynb` are ingested here to create confusion mattrices for each agro-ecological zone, and one for the entire continent.


**Input data** : `<AEZ>_wofs_ls_validation_points.csv>`

**Output_data** : `<AEZ>_confusion_matrix.csv`

Last modified: 04/02/2022


### Load packages
Import Python packages that are used for the analysis.

In [1]:
%matplotlib inline

import os
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, accuracy_score 
from sklearn.metrics import plot_confusion_matrix, f1_score  

## Load the datasets
Ground truth points 

In [2]:
#Read the ground truth data 
#For each AEZ
file_path = ('../02_Validation_results/WOfS_Assessment/wofs_ls/')
validation_files = [i for i in glob.glob(os.path.join(file_path, '*.{}'.format('csv')))]
validation_files

['../02_Validation_results/WOfS_Assessment/wofs_ls/Northern_wofs_ls_validation_points.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Central_wofs_ls_validation_points.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Sahel_wofs_ls_validation_points.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Western_wofs_ls_validation_points.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Indian_ocean_wofs_ls_validation_points.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Southern_wofs_ls_validation_points.csv',
 '../02_Validation_results/WOfS_Assessment/wofs_ls/Eastern_wofs_ls_validation_points.csv']

## Create a continental validation dataset

In [3]:
continental = pd.concat([pd.read_csv(f) for f in validation_files])

## Function for creating confusion matrix

In [4]:
def create_confusion_matrix(df, aez):

    # create a confusion matrix
    confusion_matrix = pd.crosstab(
        df["ACTUAL"],
        df["PREDICTION"],
        rownames=["ACTUAL"],
        colnames=["PREDICTION"],
        margins=True,
    )
    
    #producer's accuracy
    confusion_matrix["Producer's"] = [
        confusion_matrix.loc[0][0] / confusion_matrix.loc[0]["All"] * 100,
        confusion_matrix.loc[1][1] / confusion_matrix.loc[1]["All"] * 100,
        np.nan,
    ]
    
    #user's acc
    users_accuracy = pd.Series(
        [
            confusion_matrix[0][0] / confusion_matrix[0]["All"] * 100,
            confusion_matrix[1][1] / confusion_matrix[1]["All"] * 100,
        ]
    ).rename("User's")

    confusion_matrix = confusion_matrix.append( 
        users_accuracy
    )
    
    #overall acc
    confusion_matrix.loc["User's", "Producer's"] = (
        (confusion_matrix[0][0] + confusion_matrix[1][1])
        / confusion_matrix["All"]["All"]
        * 100
    )
    df["PREDICTION"] = df["PREDICTION"].astype(str).astype(int)
    
    #fscore
    fscore = pd.Series(
        [
            (
                2
                * (
                    confusion_matrix.loc["User's"][0]
                    * confusion_matrix.loc[0]["Producer's"]
                )
                / (
                    confusion_matrix.loc["User's"][0]
                    + confusion_matrix.loc[0]["Producer's"]
                )
            )
            / 100,
            f1_score(df["ACTUAL"], df["PREDICTION"]),
        ]
    ).rename("F-score")
    
    #tidy confusion matrix
    confusion_matrix = confusion_matrix.append(fscore)
    confusion_matrix = confusion_matrix.round(decimals=2)
    confusion_matrix = confusion_matrix.rename(
        columns={
            "0": "NoWater",
            "1": "Water",
            0: "NoWater",
            1: "Water",
            "All": "Total",
        },
        index={"0": "NoWater", "1": "Water", 0: "NoWater", 1: "Water", "All": "Total"},
    )
    
    #remove the nonsensical values in the table
    confusion_matrix.loc["User's", 'Total'] = '--'
    confusion_matrix.loc['Total', "Producer's"] = '--'
    confusion_matrix.loc["F-score", 'Total'] = '--'
    confusion_matrix.loc["F-score", "Producer's"] = '--'
    
    print('\n')
    print('n samples for', aez,':', len(df))
    print(confusion_matrix)
    # saving out the confusion matrix
    confusion_matrix.to_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/ConfusionMatrix/'+aez+'_confusion_matrix.csv')

## AEZ confusion matrices

In [5]:
for v in validation_files:
    df = pd.read_csv(v, delimiter=",")
    aez = v[49:-30]
    create_confusion_matrix(df, aez)




n samples for Northern : 1180
         NoWater   Water   Total Producer's
ACTUAL                                     
NoWater   387.00   63.00   450.0       86.0
Water     109.00  621.00   730.0      85.07
Total     496.00  684.00  1180.0         --
User's     78.02   90.79      --      85.42
F-score     0.82    0.88      --         --


n samples for Central : 590
         NoWater   Water  Total Producer's
ACTUAL                                    
NoWater    66.00   13.00   79.0      83.54
Water     108.00  403.00  511.0      78.86
Total     174.00  416.00  590.0         --
User's     37.93   96.88     --      79.49
F-score     0.52    0.87     --         --


n samples for Sahel : 1236
         NoWater   Water   Total Producer's
ACTUAL                                     
NoWater   409.00  119.00   528.0      77.46
Water      85.00  623.00   708.0      87.99
Total     494.00  742.00  1236.0         --
User's     82.79   83.96      --       83.5
F-score     0.80    0.86      --    

## Continental confusion matrix

In [6]:
df = pd.concat([pd.read_csv(f) for f in validation_files])

create_confusion_matrix(df, 'Continental')



n samples for Continental : 10845
         NoWater    Water    Total Producer's
ACTUAL                                       
NoWater  2673.00   356.00   3029.0      88.25
Water    1602.00  6214.00   7816.0       79.5
Total    4275.00  6570.00  10845.0         --
User's     62.53    94.58       --      81.95
F-score     0.73     0.86       --         --


***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Last modified:** January 2020

**Compatible datacube version:** 