# 04. Validation and Verification

This notebook verifies the integrity of the analysis results through sanitary checks and statistical summaries.

**Checks Performed:**
1. **Completeness**: Are all forest pixels classified?
2. **Consistency**: Do vector areas match raster areas?
3. **Plausibility**: Are the statistics reasonable?

In [None]:
import numpy as np
import rasterio
import geopandas as gpd
import matplotlib.pyplot as plt
import os

# Paths
raster_path = "data/sample_lulc.tif" # Or actual output if saved
vector_path = "outputs/vectors/forest_connectivity.geojson"

## 1. Check Completeness
Ensure every pixel identified as forest in input has a class in output.

In [None]:
if os.path.exists(raster_path):
    with rasterio.open(raster_path) as src:
        data = src.read(1)
        # Re-run partial logic to simulate "truth"
        forest_mask = np.isin(data, [3, 4])
        total_forest_pixels = np.sum(forest_mask)
        print(f"Total Forest Pixels in Input: {total_forest_pixels}")
else:
    print("Input raster not found. Skipping completeness check.")

## 2. Check Vector Integrity

In [None]:
if os.path.exists(vector_path):
    gdf = gpd.read_file(vector_path)
    print(f"Loaded {len(gdf)} polygons.")
    
    # Check for invalid geometries
    invalid = gdf[~gdf.is_valid]
    if len(invalid) > 0:
        print(f"⚠️ Found {len(invalid)} invalid geometries!")
    else:
        print("✅ All geometries are valid.")
        
    # Area Sum Check
    total_vector_area = gdf['area_ha'].sum()
    print(f"Total Vector Area: {total_vector_area:.2f} ha")
else:
    print("Vector output not found. Run Notebook 02 first.")

## 3. Class Distribution

In [None]:
if os.path.exists(vector_path) and not gdf.empty:
    class_counts = gdf['class_name'].value_counts()
    print(class_counts)
    
    class_counts.plot(kind='bar', color=['green', 'orange', 'red'])
    plt.title("Polygon Count by Class")
    plt.ylabel("Count")
    plt.show()