Skip to content

WSG-scale floodplain pipeline: multi-AOI iteration + memory + intermediate storage #35

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

flooded today operates on a single in-memory DEM + single stream network in one pass. wedzin's pipeline (restoration_wedzin_kwa_2024/scripts/floodplain_lcc/02_floodplain_model.R:133-180) runs one watershed (Neexdzii Kwa, FWA bladindex hardcoded at line 45) and loops over scenarios — manageable for a small AOI.

Scaling to FWCP peace WSG-wide (Pars to start, then more) means:

  • Multiple AOIs (per WSG, then sub-basins per WSG via fresh::frs_watershed_split driven by break_points.csv)
  • Each AOI's DEM may be too large to hold + process simultaneously
  • Intermediate rasters need on-disk storage strategy (per-AOI dirs? naming convention?)
  • Per-scenario output must remain attributable to AOI

Wedzin uses terra::terraOptions(threads = 12) (line 40) for within-call parallelism. No outer multi-AOI loop, no chunking inside fl_valley_confine(), no resumability. Manual spatial cropping to stream extent + 2 km buffer (lines 105-107) is the only memory mitigation.

Proposed Solution

Decisions called out as options where the right shape isn't obvious yet:

a. API shape

Lean: vignette pattern + small helpers, not a god-function.

  • fl_aoi_iter() — iterator over a list of AOIs, yields (aoi, output_dir) tuples
  • fl_output_path() — path resolver: <root>/floodplain/<wsg>/<sub_basin_id>/<scenario_id>.tif
  • Vignette: WSG-scale workflow that composes existing flooded fns inside the iterator

A single fn (fl_valley_confine_aoi_list()) was considered and rejected — overfits the wedzin shape, blocks per-project customization (e.g., FWCP peace may need different sub-basin definitions than wedzin's break_points pattern).

b. Memory mgmt

  • Document when fl_valley_confine holds rasters fully in memory vs uses terra raster proxies
  • Caller controls chunking via terra::terraOptions(memmax = ...)
  • No new in-flooded chunking — leverage terra's existing facilities

c. Output organization

Codify the convention: <root>/floodplain/<wsg>/<sub_basin_id>/<scenario_id>.tif + matching .gpkg. Report-side code can then find outputs without per-project glue.

d. Resumability

skip_existing = TRUE arg on the iterator/orchestrator: if <output_path> exists, skip the AOI/scenario combo. Long runs can resume.

Acceptance

  • Vignette: WSG-scale workflow on 2+ AOIs (test data, can reuse Neexdzii + a peace AOI)
  • fl_aoi_iter() and fl_output_path() helpers, tested
  • Output path convention documented in vignette + helpers
  • skip_existing behaviour with test
  • Memory considerations section in vignette
  • Bench: Pars WSG bull trout run completes within reasonable time/memory bounds (numbers TBD; record as baseline)

Out of scope

  • Distributed compute (multi-machine fan-out) — that's rtj's compute-fan-out runbook (rtj#84), not flooded
  • Land cover co-analysis — drift's domain, deferred per user
  • Species-specific habitat classification — fresh's domain (network extraction)
  • Single mega-function — explicitly rejected; we want composable helpers + vignette

References

  • wedzin scenario loop: restoration_wedzin_kwa_2024/scripts/floodplain_lcc/02_floodplain_model.R:133-180
  • wedzin AOI scoping (single watershed): restoration_wedzin_kwa_2024/scripts/floodplain_lcc/01_network_extract.R:40-46
  • existing helper to build on: fl_scenarios() (reads/validates scenario tables)

Depends on companion issue (DEM source helpers — file first)
Relates to NewGraphEnvironment/rtj#84 (compute-fan-out runbook)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions