# Comparing two hydrological basin rasters

This notebook compares two basin raster products (e.g. derived from different DEMs,
stream thresholds or processing choices) on a **pixel‑by‑pixel** basis.

The workflow:

1. Load the two input basin rasters.
2. Reproject / resample one raster so both are on the same grid.
3. For each basin ID in the reference raster, find the best‑matching basin in the comparison raster.
4. Classify each pixel into agreement classes and save a difference GeoTIFF.


## 1. Load, align and compare basin rasters

This cell does the full comparison:

- Reads `basin1` (reference) and `basin2` (comparison) using `rasterio`.
- Uses `reproject` so that `basin2` is aligned to the grid of `basin1`.
- Loops over each basin ID in `basin1` to find overlapping basins in `basin2`.
- For each reference basin, the **best‑overlap** basin in `basin2` is identified.
- A `agree` raster is built with the following classes:

  * `0` – background / nodata.
  * `1` – pixels where the reference basin and comparison basin **agree** (same dominant basin, overlap ≥ 50%).  
  * `2` – pixels where the reference basin is **split or changed** in the comparison (no clear dominant match, or overlap shared among several basins / outside mask).
  * `3` – pixels belonging to basins that exist **only in one product**  
    (reference basin with no overlap in comparison, or basin present only in `basin2`).

- The result is written to `out` as a `uint8` GeoTIFF that can be styled in QGIS with a categorical color table.


In [5]:
import numpy as np
import rasterio
from rasterio.warp import reproject, Resampling
from tqdm import tqdm

# basin1 = r"E:\Rasmus\DTU\Cryo\4DGreenland\Basins_serious\prodem_19\basins_hydro.tif"
# basin2 = r"E:\Rasmus\DTU\Cryo\4DGreenland\Basins_serious\prodem_24\basins_hydro.tif"
# out   = r"E:\Rasmus\DTU\Cryo\4DGreenland\Basins_serious\prodem_diff.tif"
basin1 = r"E:\Rasmus\DTU\Cryo\4DGreenland\Basins_serious\rasmus_code_s_thres_500\basins_ocean.tif"
basin2 = r"E:\Rasmus\DTU\Cryo\4DGreenland\Basins_serious\anna_code_s_thres_500\basins_hydro.tif"
out   = r"E:\Rasmus\DTU\Cryo\4DGreenland\Basins_serious\ras_anna_500_diff.tif"

# --- Load & align rasters ---
with rasterio.open(basin1) as h:
    A  = h.read(1)
    meta = h.meta.copy()
    meta.update(count=1, dtype='uint8', nodata=0, compress='LZW')
    A_nodata = h.nodata if h.nodata is not None else 0
    dst_transform, dst_crs = h.transform, h.crs

with rasterio.open(basin2) as o:
    Bsrc = o.read(1)
    B_nodata = o.nodata if o.nodata is not None else 0
    B = np.full_like(A, B_nodata)
    reproject(
        source=Bsrc, destination=B,
        src_transform=o.transform, src_crs=o.crs,
        dst_transform=dst_transform, dst_crs=dst_crs,
        resampling=Resampling.nearest,
        src_nodata=B_nodata, dst_nodata=B_nodata,
    )

maskA = (A != 0) & (A != A_nodata)
maskB = (B != 0) & (B != B_nodata)
agree = np.zeros_like(A, dtype=np.uint8)

# --- Loop over all basins in A (reference) ---
basin_ids = np.unique(A[maskA])
basin_ids = basin_ids[basin_ids != 0]

for bid in tqdm(basin_ids, desc="Comparing basins (split-aware)"):
    a_mask = (A == bid)
    area_a = np.count_nonzero(a_mask)
    if area_a == 0:
        continue

    # Find overlapping basin IDs in B
    b_ids, b_counts = np.unique(B[a_mask & maskB], return_counts=True)
    if len(b_ids) == 0:
        agree[a_mask] = 3  # no overlap at all
        continue

    # Identify best matching basin in B (max overlap)
    best_b = b_ids[np.argmax(b_counts)]
    overlap_best = np.count_nonzero((A == bid) & (B == best_b))
    overlap_ratio = overlap_best / area_a

    # Classify all pixels of A
    if overlap_ratio >= 0.5:
        # mark pixels where A and best B overlap → class 1
        agree[(A == bid) & (B == best_b)] = 1
        # any other overlapping B basins are splits → class 2
        for bsub in b_ids:
            if bsub == best_b:
                continue
            agree[(A == bid) & (B == bsub)] = 2
        # and any remaining pixels of A not covered by B at all
        agree[(A == bid) & (~maskB)] = 2
    else:
        # no dominant match
        agree[a_mask] = 2

# --- Add basins that exist only in B ---
onlyB = maskB & (~maskA)
agree[onlyB] = 3

# --- Save output ---
with rasterio.open(out, 'w', **meta) as dst:
    dst.write(agree, 1)

print("✓ Wrote split/merge-aware comparison map:", out)


ValueError: dst_nodata must be in valid range for destination dtype