# Hydro Basins Catchment QA

**Objective**
Ingest GRDC `stationbasins.geojson` files, consolidate their metadata, and deliver quick-look visuals for basin selection and QA.

**Data Requirements & Methods**
- Populate `pre-analysis/open-data/input/` (or subfolders) with one or more `stationbasins.geojson` exports from GRDC.
- Ensure `geopandas`, `folium`, `matplotlib`, and `pandas` are installed plus write access to `pre-analysis/open-data/output/`.
- The notebook crawls the input tree, merges the GeoJSON layers, builds an interactive Folium map with optional rivers, and plots comparative basin areas.

**Overview of Steps**
1. Step 1 - Import the required geospatial stack.
2. Step 2 - Discover and merge available GRDC catchments.
3. Step 3 - Generate an interactive Folium map with metadata-rich popups.
4. Step 4 - Plot and optionally export basin area comparisons.
5. Step 5 - Use the outputs for scoping and collaboration.



### Dataset Context
Catchment polygons are provided as GeoJSON files (an open standard for geospatial features). Coverage is incomplete for stations that entered the GRDC catalog after 2011, so expect some gaps.

**Outputs**
- `output/catchment_map.html`: interactive Folium map with station metadata in the popups.
- `output/catchment_areas.png`: static comparison of calculated catchment areas for quick QA checks.

### 📌 Key Field Descriptions

| Field Name   | Meaning                                                                 |
|--------------|-------------------------------------------------------------------------|
| `lat_org`    | Original (official) latitude of the gauging station (from GRDC).        |
| `long_org`   | Original longitude of the station.                                      |
| `lat_pp`     | Latitude of the pour point (where flow exits the catchment).            |
| `long_pp`    | Longitude of the pour point.                                            |
| `grdc_no`    | Unique GRDC station identifier.                                         |
| `river`      | Name of the river where the station is located.                         |
| `station`    | Name of the gauging station.                                            |
| `area`       | Reported catchment area in square kilometers.                           |
| `altitude`   | Elevation of the station in meters above sea level.                    |
| `dist_km`    | Distance (in km) between original station location and pour point.      |
| `area_calc`  | Calculated catchment area based on geometry (for verification).         |
| `quality`    | Quality rating of the catchment delineation (e.g., high, medium, low).  |
| `type`       | Type of catchment delineation or data source.                           |
| `comment`    | Additional notes or metadata about the station.                         |
| `source`     | Data source (e.g., GRDC, custom delineation, etc.).                     |
| `geometry`   | Polygon geometry of the catchment area.                                 |


## Step 1 - Import libraries
Load GeoPandas, Folium, Matplotlib, and supporting utilities for spatial analysis.



In [None]:
# Spatial analysis and visualization stack used across the notebook.
import geopandas as gpd
import matplotlib.pyplot as plt
import folium
import random
import os
import pandas as pd
from folium import FeatureGroup


## Step 2 - Discover and merge station basins
Walk the `input/` tree, load every `stationbasins.geojson`, and combine them into one GeoDataFrame.



In [None]:
# Discover and merge every `stationbasins.geojson` file under the local input directory.
base_dir = 'input'
output_dir = 'output'

os.makedirs(output_dir, exist_ok=True)

# Hold each GeoDataFrame before concatenation.
gdfs = []
merged_gdf = None

for root, dirs, files in os.walk(base_dir):
    if 'stationbasins.geojson' in files:
        file_path = os.path.join(root, 'stationbasins.geojson')
        try:
            gdf = gpd.read_file(file_path)
            gdfs.append(gdf)
            print(f"Loaded: {file_path} with {len(gdf)} features.")
        except Exception as exc:
            print(f"Failed to load {file_path}: {exc}")

if gdfs:
    merged_gdf = gpd.GeoDataFrame(pd.concat(gdfs, ignore_index=True), crs=gdfs[0].crs)
    print(f"Total merged features: {len(merged_gdf)}")
    if 'station' in merged_gdf.columns:
        print(f"Unique stations covered: {merged_gdf['station'].nunique()}")
    else:
        print('Warning: `station` column not found in the merged data.')
else:
    raise FileNotFoundError('No `stationbasins.geojson` files found under the input directory.')


## Step 3 - Build the interactive catchment map
Encapsulate the Folium logic so any merged GeoDataFrame can be visualized with optional river overlays.



In [None]:
def plot_catchment_map(gdf, river_geojson_path=None):
    """Build an interactive Folium map with polygons and station metadata.

    Parameters
    ----------
    gdf : GeoDataFrame
        Combined basin polygons with station attributes.
    river_geojson_path : str, optional
        Path to a supplementary river layer to provide regional context.
    """
    if gdf is None or gdf.empty:
        raise ValueError('GeoDataFrame is empty. Load data before plotting the map.')

    def add_river_layer(map_object, river_geojson_path):
        """Overlay an optional river network for context."""
        rivers = gpd.read_file(river_geojson_path)
        folium.GeoJson(
            rivers,
            name='Rivers',
            style_function=lambda x: {
                'color': 'blue',
                'weight': 2,
                'opacity': 0.8
            }
        ).add_to(map_object)

    def random_color():
        """Generate a random HEX color for polygon styling."""
        return "#{:06x}".format(random.randint(0, 0xFFFFFF))

    # Set the initial map view at the average station location.
    center_lat = gdf['lat_org'].mean()
    center_lon = gdf['long_org'].mean()
    m = folium.Map(location=[center_lat, center_lon], zoom_start=6, tiles='CartoDB positron')

    # Add a contextual river layer when available.
    if river_geojson_path and os.path.exists(river_geojson_path):
        try:
            add_river_layer(m, river_geojson_path)
            print(f"River layer added from {river_geojson_path}.")
        except Exception as exc:
            print(f"Could not add river layer ({river_geojson_path}): {exc}")

    # Draw each catchment polygon with a detailed popup.
    for _, row in gdf.iterrows():
        color = random_color()
        fg = FeatureGroup(name=row.get('station', 'Catchment'))
        popup_html = f"
        <b>Station:</b> {row.get('station', 'n/a')}<br>
        <b>River:</b> {row.get('river', 'n/a')}<br>
        <b>GRDC No:</b> {row.get('grdc_no', 'n/a')}<br>
        <b>Area (Reported):</b> {row.get('area', 'n/a')} km²<br>
        <b>Area (Calculated):</b> {row.get('area_calc', 'n/a')} km²<br>
        <b>Altitude:</b> {row.get('altitude', 'n/a')} m<br>
        <b>Lat (Original):</b> {row.get('lat_org', 'n/a')}<br>
        <b>Lon (Original):</b> {row.get('long_org', 'n/a')}<br>
        <b>Lat (Pour Point):</b> {row.get('lat_pp', 'n/a')}<br>
        <b>Lon (Pour Point):</b> {row.get('long_pp', 'n/a')}<br>
        <b>Distance to Pour Point:</b> {row.get('dist_km', 'n/a')} km<br>
        <b>Quality:</b> {row.get('quality', 'n/a')}<br>
        <b>Type:</b> {row.get('type', 'n/a')}<br>
        <b>Source:</b> {row.get('source', 'n/a')}<br>
        <b>Comment:</b> {row.get('comment', 'n/a')}
        "

        folium.GeoJson(
            row['geometry'],
            name=row.get('station', 'Catchment'),
            tooltip=row.get('station', 'Catchment'),
            popup=folium.Popup(popup_html, max_width=300),
            style_function=lambda feature, col=color: {
                'fillColor': col,
                'color': col,
                'weight': 2,
                'fillOpacity': 0.5
            }
        ).add_to(fg)

        folium.Marker(
            location=[row['lat_org'], row['long_org']],
            popup=folium.Popup(popup_html, max_width=350),
            tooltip=row.get('station', 'Catchment'),
            icon=folium.Icon(color='blue', icon='tint', prefix='fa')
        ).add_to(fg)
        fg.add_to(m)

    folium.LayerControl(collapsed=False).add_to(m)
    return m

catchment_map = None
if 'merged_gdf' in locals():
    catchment_map = plot_catchment_map(merged_gdf)
    catchment_map_path = os.path.join(output_dir, 'catchment_map.html')
    catchment_map.save(catchment_map_path)
    print(f"Interactive catchment map saved to {catchment_map_path}")
else:
    print('`merged_gdf` is not defined. Run the data aggregation step first.')


## Step 4 - Plot catchment areas
Create comparison charts of calculated basin areas and export the PNG if desired.



In [None]:
def plot_catchment_areas(gdf, output_path=None):
    """Create a bar chart that compares calculated catchment areas.

    Parameters
    ----------
    gdf : GeoDataFrame
        Dataset containing `station` and `area_calc` columns.
    output_path : str, optional
        If provided, the chart is also saved to this path.
    """
    if gdf is None or gdf.empty:
        raise ValueError('GeoDataFrame is empty. Load data before plotting catchment areas.')

    gdf = gdf.copy()
    gdf['area_km2'] = pd.to_numeric(gdf['area_calc'], errors='coerce')
    gdf_clean = gdf.dropna(subset=['area_km2'])

    if gdf_clean.empty:
        raise ValueError('No numeric `area_calc` values available to visualize.')

    gdf_sorted = gdf_clean.sort_values(by='area_km2', ascending=False)
    x_pos = range(len(gdf_sorted))

    fig, ax = plt.subplots(figsize=(12, 6))
    bars = ax.bar(x_pos, gdf_sorted['area_km2'], color='skyblue')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(gdf_sorted['station'], rotation=45, ha='right')
    ax.set_ylabel('Catchment Area (km²)')
    ax.set_title('Catchment Areas by Station (Calculated)')
    ax.grid(axis='y', linestyle='--', alpha=0.5)
    fig.tight_layout()

    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2, height, f"{height:.0f}", ha='center', va='bottom', fontsize=8)

    if output_path:
        fig.savefig(output_path, bbox_inches='tight')
        print(f"Catchment area chart saved to {output_path}")

    plt.show()
    return fig

if 'merged_gdf' in locals():
    plot_catchment_areas(
        merged_gdf,
        output_path=os.path.join(output_dir, 'catchment_areas.png')
    )
else:
    print('`merged_gdf` is not defined. Run the data aggregation step first.')


## Step 5 - Use the outputs
Open `output/catchment_map.html`, share `output/catchment_areas.png`, and rerun the workflow whenever new GRDC files become available.



### How to Use the Outputs
- Open `output/catchment_map.html` in any browser to explore station-level metadata interactively.
- Share or embed `output/catchment_areas.png` when you need a quick visual comparison of basin sizes.
- Re-run this notebook whenever new GRDC catchment files are added under `input/` to refresh both products.
