# Accuracy Assessment

## Installing Required Libraries

geopandas: Used for handling geographic data (such as shapefiles).
rasterio: Used for reading and analyzing raster data (such as satellite imagery).

In [1]:
pip install geopandas

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install rasterio

Note: you may need to restart the kernel to use updated packages.


## Importing Libraries

- geopandas: Alias gpd is used to work with vector data (shapefiles).
- rasterio: Reads raster data (e.g., satellite images).
- numpy: Alias np, provides efficient array handling.
- sklearn.metrics: For calculating accuracy metrics like overall accuracy, Kappa Coefficient, and confusion matrix.

In [1]:
# Import required libraries
import geopandas as gpd
import rasterio
from rasterio import features
import numpy as np
from sklearn.metrics import accuracy_score, cohen_kappa_score, confusion_matrix

## Loading Vector Data (Ground Truth)

- What it does: Loads the vector data (shapefile) containing ground truth points.
    - gpd.read_file(): Reads the shapefile, which contains points representing ground truth locations (e.g., actual land cover type).
    - vector_data: The loaded shapefile is stored in a GeoDataFrame. It contains attributes like geometry (the point locations) and other columns (e.g., ground truth labels).



NOTE: In this case, I create ground truth data by creating 40 random sampling points in area of ​​interest including 20 points for non-water class (class = 0) and 20 points for water class (class = 1).

In [2]:
# Load vector data (ground truth points)
vector_data = gpd.read_file("/home/jovyan/shared/Arissara/ALOS-2/sample-points/Indo_Random_sampling_points.shp")


## Loading the Raster File

- What it does: Opens and reads the raster file (e.g., satellite image).
    - rasterio.open(): Opens the raster file for reading.
    - src.read(1): Reads the first band of the raster data (many satellite images contain multiple bands).
    - raster_data: The pixel values (as an array) of the raster are stored in this variable.

#### Sampling Raster Values at Vector Point Locations
- What it does: Extracts the raster values at the exact locations of the ground truth points.
    - vector_data.geometry.x and vector_data.geometry.y: Accesses the X and Y coordinates of the vector points.
    - coords: Creates a list of tuples representing the coordinates (x, y) for each ground truth point.
    - src.sample(coords): Samples (extracts) the raster values at the given coordinates.
raster_values: Contains the sampled pixel values from the raster at the ground truth locations.


In [3]:
# Load raster file
with rasterio.open("/home/jovyan/shared/Arissara/ALOS-2/Output/rm-noise_lee_calib_N01E102_sl_HH_water_extent_mask.tif") as src:
    # Read raster data
    raster_data = src.read(1)
    # Get raster values at ground truth points
    coords = [(x, y) for x, y in zip(vector_data.geometry.x, vector_data.geometry.y)]
    raster_values = [val[0] for val in src.sample(coords)]

## Extracting Ground Truth Labels

- What it does: Extracts the ground truth labels from the shapefile (vector data).
    - vector_data['class']: Assumes the ground truth labels (e.g., 0 = non-water, 1 = water) are stored in a column named class.
    - .values: Converts the column into a NumPy array.


In [4]:
# Extract ground truth labels (assuming 'class' column holds 0/1 labels)
ground_truth_labels = vector_data['class'].values

## Ensuring Arrays are NumPy Arrays

- What it does: Converts both lists to NumPy arrays for more efficient and accurate comparison operations.
    - Why: Ensures that the ground truth and raster values can be processed efficiently by the metrics functions (like accuracy_score).

In [7]:
# Ensure both arrays are numpy arrays for correct comparison
raster_values = np.array(raster_values)
ground_truth_labels = np.array(ground_truth_labels)

## Computing Overall Accuracy and Kappa Coefficient

- What it does: Calculates two important accuracy metrics:
    - accuracy_score(): Compares the ground truth labels to the sampled raster values and computes the overall accuracy (i.e., the proportion of correct predictions).
    - cohen_kappa_score(): Measures the agreement between the ground truth and raster values, accounting for the possibility of agreement by chance (more robust than simple accuracy).
 

## Computing Confusion Matrix
- What it does: Computes the confusion matrix, which gives a detailed breakdown of:
    - True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
    - confusion_matrix(): Compares the ground truth and raster values, returning a matrix showing how many values were correctly or incorrectly classified.

In [6]:
# Compute accuracy metrics
overall_accuracy = accuracy_score(ground_truth_labels, raster_values)
kappa = cohen_kappa_score(ground_truth_labels, raster_values)

print(f"Overall Accuracy: {overall_accuracy}")
print(f"Kappa Coefficient: {kappa}")

# Compute confusion matrix
conf_matrix = confusion_matrix(ground_truth_labels, raster_values)
print("Confusion Matrix:")
print(conf_matrix)

Overall Accuracy: 1.0
Kappa Coefficient: 1.0
Confusion Matrix:
[[20  0]
 [ 0 20]]


Here's what it means:

- 20 (True Negative, TN): 20 points were correctly classified as non-water (0).
- 0 (False Positive, FP): No points were incorrectly classified as water (1) when they should have been non-water (0).
- 0 (False Negative, FN): No points were incorrectly classified as non-water (0) when they should have been water (1).
- 20 (True Positive, TP): 20 points were correctly classified as water (1).

## Summary

- The code samples raster data at ground truth locations from the shapefile.
- It compares the sampled raster values with the ground truth labels to compute accuracy metrics (Overall Accuracy, Kappa Coefficient) and a confusion matrix.
- This provides insight into how well the raster data (predicted values) matches the ground truth.
