GRIME — Garbage River Interception and Modeling Engine

Video - https://www.youtube.com/watch?v=GsYtVGVTPWM

1. Executive Summary

GRIME is a multi-parameter optimization system that identifies optimal locations for deploying trash interception barriers ("nets") on urban waterways. It evaluates candidate sites across 28 geospatial parameters organized into 6 parameter families, producing 4 sub-scores that combine into a single composite ranking per candidate location.

Built for: 2026 SmathHacks hackathon

What makes it technically interesting:

A two-level weighted scoring architecture (parameters → sub-scores → composite) that is both interpretable and tunable
Hydrological pipeline built on real DEM data: pit-filling → depression-filling → flat resolution → D8 flow direction → flow accumulation → stream extraction
Manning's equation applied to estimate flow velocity from DEM slope and channel geometry, with feasibility gates that eliminate sites where deployment is physically impossible
Monte Carlo sensitivity analysis via Dirichlet-perturbed weight vectors to assess ranking robustness
Three-phase candidate placement algorithm: spatial constraint satisfaction → full-parameter scoring → population-scaled risk-percentile filtering
On-demand real waterway geometry from OpenStreetMap's Overpass API for 108,772 cities across 240 countries

Scope: Site selection modeling and scoring. GRIME does not design the physical trap, predict trash composition, or model individual debris trajectories.

Non-goals: Real-time sensor integration, computer vision trash classification, trap mechanical design, economic cost optimization.

Visualizations

All surfaces generated in the Wolfram Language from real USGS elevation data and GRIME's scoring functions. See full documentation for derivations and code.

DEM Terrain — Durham, NC

3D elevation surface from USGS 3DEP at 10m resolution. Stream valleys visible as low-elevation grooves define where GRIME extracts candidate net sites.

Composite Score Surface

The final composite score as a function of Generation (trash input) and Impact (downstream consequence), with Flow and Feasibility held constant. The diagonal ridge shows that high scores require both trash presence and downstream consequence — neither alone is sufficient.

Feasibility Score — Width x Velocity

Deployment feasibility as a function of channel width and flow velocity. The green plateau marks the sweet spot: narrow, moderate-velocity channels where nets can be spanned and anchored. Red zones are eliminated by hard gates.

Manning's Equation — V(Slope, Roughness)

Flow velocity estimated from Manning's equation across slope and roughness parameter space. Steep, smooth channels (high slope, low roughness) produce dangerous velocities; flat, rough channels produce stagnant conditions.

Environmental Justice Index

Environmental justice burden across a synthetic metro region. Peaks identify overburdened communities where trash interception has the highest equity value — GRIME weights these areas higher in the Impact sub-score.

Dirichlet Sensitivity — Weight Perturbations

500 weight vectors sampled from a Dirichlet distribution (κ=10) projected onto a ternary diagram. The red dot is the baseline. GRIME recomputes rankings under each perturbation to test whether top-ranked sites are robust to weight assumptions.

2. Problem Statement

The problem

Urban waterways accumulate trash from stormwater runoff, illegal dumping, combined sewer overflows, and bridge crossings. Deploying interception devices (nets, booms, trash traps) requires choosing locations that maximize debris captured per device while remaining physically feasible to install and maintain.

Why it is hard

A naive approach — placing traps at the largest rivers — fails because:

Large rivers are too wide (>30m) for stationary nets; debris passes around or damages the device
High-traffic waterways (navigable canals, shipping channels) cannot be obstructed
Upstream tributaries with high impervious surface and population density generate more trash per unit area than rural mainstems
Downstream impact varies: a trap upstream of a drinking water intake has orders of magnitude more public health value than one upstream of an industrial canal
Physical feasibility (road access, bank slope, flow velocity, land ownership) eliminates many otherwise-optimal locations

Constraints and assumptions

All data sources must be free and require no API keys (census, EPA, USGS, OSM)
The model assumes stationary barrier-style traps deployable in channels ≤30m wide (~100 ft)
Scoring weights are set by informed heuristic and literature, not supervised learning (no ground-truth dataset of "correct" trap placements exists at scale)
The system must produce results in <15 seconds per city for interactive demo use

3. System Overview

Architecture

graph TD
    A[DEM Raster - USGS 3DEP] --> B[Hydrological Conditioning - pysheds]
    B --> C[Flow Direction - D8 Algorithm]
    C --> D[Flow Accumulation]
    D --> E[Stream Network Extraction]
    E --> F[Candidate Site Generation]

    G[EPA APIs - TRI ECHO EJSCREEN SDWIS FRS] --> H[Parameter Computation - 28 params x 6 families]
    I[USGS APIs - NWIS StreamStats PAD-US] --> H
    J[Census APIs - ACS TIGER] --> H
    K[OSM - Overpass API] --> H
    F --> H

    H --> L[Hard Gate Filtering]
    L --> M[MinMax Normalization]
    M --> N[Weighted Sub-scores x4]
    N --> O[Composite Score]
    O --> P[Risk-Percentile Filtering]
    P --> Q[Ranked Deployment Sites]

    Q --> R[FastAPI Backend]
    R --> S[Mapbox GL JS Dashboard]
    K --> S

Subsystems

Subsystem	Location	Purpose
DEM Pipeline	`core/pipeline.py`	Fetch elevation data, extract stream network, generate candidate points
Generation Params	`core/generation.py`	Compute trash generation indicators (population, land use, industrial sources)
Flow Params	`core/flow.py`	Compute hydraulic transport parameters (discharge, velocity, flood frequency)
Impact Params	`core/impact.py`	Compute downstream consequence indicators (drinking water, EJ, protected areas)
Feasibility Params	`core/feasibility.py`	Compute deployment constraint parameters (road access, channel width, slope)
Scoring Engine	`core/scoring.py`	Normalize, weight, composite, sensitivity analysis
API Server	`api/main.py`	REST + WebSocket endpoints serving scored GeoJSON
Dashboard	`dashboard/index.html`	Interactive map with on-demand OSM waterway fetching and client-side scoring
Places Database	`mock_data/places.json`	108,772 city/town records across 240 countries (7MB compact JSON)

Data flow

Two execution modes exist:

Mode 1 — Full Python pipeline (research/validation): DEM fetch → pysheds hydrology → stream extraction → candidate generation → API-based parameter computation → composite scoring → GeoJSON output

Mode 2 — Dashboard on-demand (demo/interactive): User clicks city → Overpass API returns real waterway geometry → client-side JS generates candidate positions with spatial constraints → client-side scoring using simplified parameter model → Mapbox GL renders results

Mode 2 exists because Mode 1 takes 3–5 minutes per watershed and requires installing pysheds (which has C dependencies that fail on some Windows machines). Mode 2 runs in <5 seconds anywhere with a browser.

4. Core Technical Ideas

4.1 Two-level weighted scoring

The 28 raw parameters are not directly comparable (population density in persons/km² vs flow velocity in m/s vs a binary land ownership flag). The system handles this through two-level aggregation:

Parameter level: Each raw parameter is MinMax-normalized to [0, 1] independently within the candidate set, then multiplied by its within-family weight. The weighted sum produces a sub-score in [0, 100].
Sub-score level: The four sub-scores are combined via a second set of weights into the composite score in [0, 100].

This two-level structure has a specific advantage: it makes the model interpretable at the sub-score level. A judge or engineer can look at a candidate and immediately see "high generation, low feasibility" without needing to parse 28 individual numbers.

4.2 Hard gates vs soft scoring

Some parameters act as binary disqualifiers rather than continuous scores. A channel wider than 50m cannot hold a net regardless of how much trash flows through it. These are implemented as hard gates that remove candidates before scoring, separate from the soft scoring that ranks survivors:

Gate	Condition	Rationale
Velocity	V > 3.0 m/s	Trap will be damaged or torn loose
Channel width	W > 50m or W < 0.5m	Too wide to span or too narrow for meaningful accumulation
Land ownership	Confirmed private, no permission	Legal barrier to deployment

4.3 Placement as constraint satisfaction + optimization

Candidate placement is not random scatter. It is a three-phase algorithm:

Constraint satisfaction: Generate all positions that pass spatial, width, and traffic constraints
Full scoring: Evaluate every surviving position on the composite model
Risk-percentile selection: Keep only the top N% by score, where N scales with city population

This separates "can we physically put a net here?" (phase 1) from "should we?" (phases 2–3).

4.4 Population-scaled risk thresholds

A city of 20 million people needs more nets than a town of 10,000, but not linearly more. The risk percentile threshold scales in steps:

Population	Percentile kept	Rationale
>10M	Top 35%	Mega-cities have extensive waterway networks; more sites are genuinely high-risk
>1M	Top 30%	Large cities still have substantial catchments
>100K	Top 25%	Mid-size cities, moderate network complexity
<100K	Top 20%	Small towns, fewer waterways, tighter selection

A minimum floor of 5 deployed sites ensures the model always produces enough output to demonstrate ranking behavior.

5. Mathematical Foundations

5.1 Composite scoring function

Let x ∈ ℝ²⁸ be the raw parameter vector for a candidate site. The composite score S(x) is:

S(x) = Σ(k=1..4) ωk · Gk(x)

where ω = [0.30, 0.25, 0.30, 0.15] are the sub-score weights and each sub-score Gk is:

Gk(x) = 100 · Σ(j ∈ Fk) wj · x̂j

where Fk is the set of parameter indices belonging to family k, wj is the within-family weight for parameter j (renormalized to sum to 1 after filtering unavailable parameters), and x̂j is the MinMax-normalized value:

x̂j = (xj - min(xj)) / (max(xj) - min(xj))

For distance-based parameters where lower is better (estuary distance, beach distance), the normalization is inverted: x̂j = 1 − x̂j.

Important implementation detail: Normalization is computed across the candidate set, not against a global reference. This means scores are relative rankings, not absolute measures. A score of 80 means "top of this candidate pool," not "80% of some theoretical maximum."

5.2 Manning's velocity equation

Flow velocity at a candidate site is estimated via Manning's equation:

V = (1/n) · R^(2/3) · S^(1/2)

where:

n is Manning's roughness coefficient (dimensionless), selected by channel type:
- Clean straight: 0.030
- Winding with pools (typical urban creek): 0.040
- Sluggish, weedy: 0.070
- Urban concrete-lined: 0.015
- (Source: Chow, V.T., 1959, Open-Channel Hydraulics)
R is the hydraulic radius (m) = A_cross / P_wetted, approximated as rectangular channel: R = (W × D) / (W + 2D), where depth D ≈ 0.3W (bankfull approximation)
S is the channel slope (dimensionless), computed from DEM as elevation difference over a 100m reach: S = (Z_here − Z_downstream) / 100, clamped to minimum 0.0001

A continuity cross-check is performed: V_continuity = Q / A_cross, where Q is the USGS-measured discharge converted to m³/s. The final velocity estimate is the geometric mean of Manning's and continuity estimates:

V_final = sqrt(V_Manning · V_continuity)

This hedges against errors in either the DEM slope (which can be noisy at 10m resolution) or the channel geometry assumption (rectangular approximation).

5.3 Velocity feasibility function

The velocity feasibility score is a piecewise function mapping velocity to a deployment viability multiplier:

f(V) =
  0.3   if V < 0.05 m/s     (stagnant — debris doesn't concentrate)
  0.7   if 0.05 ≤ V < 0.30  (slow but workable)
  1.0   if 0.30 ≤ V ≤ 1.50  (optimal interception range)
  0.5   if 1.50 < V ≤ 2.50  (fast — heavy anchoring needed)
  0.1   if V > 2.50          (too fast — trap damage likely)

Sites with V > 3.0 m/s are removed entirely by the hard gate before scoring.

5.4 Runoff coefficient estimation

The rational method runoff coefficient C is estimated from impervious surface percentage via a linear model:

C = 0.05 + 0.009 · I

where I is the NLCD impervious surface percentage [0, 100]. This yields C ∈ [0.05, 0.95], ranging from forest (≈5% runoff) to fully paved (≈95% runoff). This is the same linearization used in the WaterGate methodology.

5.5 Inverse distance scoring

Several parameters (CSO proximity, Superfund proximity) use an inverse-distance kernel to compute influence from point sources:

score = Σ(i=1..N) 1 / (1 + (di / h)²)

where di is the Euclidean distance (in UTM meters) from the candidate to source i, and h is the half-decay distance (500m default). This is a Cauchy kernel that gives full weight at distance 0 and half weight at distance h.

5.6 Drinking water intake scoring

Proximity to downstream drinking water intakes uses an exponential decay:

score = Σ(i=1..N) exp(-di / 10)

where di is the distance in km. Intakes within 10km get weight ≈0.37, within 5km ≈0.61, within 1km ≈0.90. Intakes beyond 50km are ignored.

5.7 Sensitivity analysis via Dirichlet perturbation

To assess whether the top-ranked sites are robust to weight uncertainty, the system performs Monte Carlo sensitivity analysis:

Sample N = 50 perturbed weight vectors from a Dirichlet distribution: ω' ~ Dir(α), where α = 10 × [0.30, 0.25, 0.30, 0.15]
The α scaling factor (×10) controls perturbation magnitude — higher α concentrates samples closer to the baseline weights
For each perturbed weight vector, recompute composite scores and record which sites appear in the top 5
The robustness percentage for each site is: (count of times in top 5) / N × 100%

A site with robustness > 80% is ranked highly regardless of reasonable weight changes. A site at 30% is sensitive to weight assumptions.

5.8 Haversine distance (placement spacing)

The minimum-spacing constraint uses the haversine formula for geodesic distance:

a = sin²(Δφ/2) + cos(φ₁) · cos(φ₂) · sin²(Δλ/2)
d = 2R · atan2(√a, √(1−a))

where R = 6,371,000 m. This is used instead of Euclidean distance because the candidate set can span several kilometers, where flat-earth approximation introduces meaningful error at high latitudes.

5.9 Environmental justice composite

The EJ priority score combines three EPA EJSCREEN percentiles:

EJ = (0.4 · P_discharge + 0.3 · P_minority + 0.3 · P_income) / 100

where P_discharge is the wastewater discharge EJ percentile, P_minority is the minority population percentile, and P_income is the low-income percentile. All are [0, 100] percentiles from EJSCREEN. The result is [0, 1].

6. Algorithms

6.1 DEM Hydrological Conditioning Pipeline

Purpose: Convert raw elevation data into a hydrologically consistent surface from which flow direction and stream networks can be extracted.

Input: DEM raster from USGS 3DEP (10m resolution)

Output: Flow direction grid, flow accumulation grid, stream network GeoJSON

Steps:

FUNCTION condition_dem(dem):
    pit_filled    ← fill_pits(dem)              // remove single-cell sinks
    flooded       ← fill_depressions(pit_filled) // fill multi-cell depressions
    inflated      ← resolve_flats(flooded)       // assign gradient to flat areas
    flow_dir      ← D8_flowdir(inflated)         // each cell → 1 of 8 neighbors
    accumulation  ← flow_accumulation(flow_dir)  // count upstream cells per cell
    RETURN flow_dir, accumulation

Stream extraction: Cells where accumulation exceeds a threshold (default 500 cells = 500 × 10m × 10m = 0.05 km²) are classified as stream cells. Connected stream cells are vectorized into LineString geometries.

Complexity: O(n) for each step where n is the number of DEM cells. For the Ellerbe Creek bbox at 10m resolution: approximately 3000 × 1500 = 4.5M cells. Total conditioning time: ~30–60 seconds.

Why pysheds: It operates entirely in-memory on NumPy arrays without requiring ArcGIS or GRASS GIS. The D8 algorithm assigns each cell exactly one of 8 cardinal/diagonal flow directions based on steepest descent, which is the standard approach for stream extraction in computational hydrology.

Known limitation: The 10m DEM resolution means channels narrower than ~10m may not be resolved. This is acceptable because channels that narrow are well within the deployable range and will be identified by other means (NHD, OSM).

6.2 Candidate Placement Algorithm (Client-side)

Purpose: Given a set of waterway geometries and city metadata, produce a set of spatially valid, risk-ranked candidate sites for trap deployment.

Input: Array of stream geometries (from Overpass API), city population, country code

Output: Ranked array of candidate objects with scores and parameters

FUNCTION generate_candidates(streams, pop, country):
    // ── Phase 1: Constraint satisfaction ──
    MIN_SPACE ← 120m
    MAX_WIDTH ← 30m
    placed ← []
    valid_positions ← []

    FOR EACH stream IN streams:
        IF stream.width > MAX_WIDTH: CONTINUE
        cap ← traffic_capacity(stream)
        count_on_stream ← 0
        dist_since_last ← MIN_SPACE

        FOR EACH point IN stream.coords:
            dist_since_last += haversine(previous_point, point)
            IF dist_since_last < MIN_SPACE: CONTINUE
            IF any p in placed where haversine(p, point) < MIN_SPACE: CONTINUE
            IF count_on_stream >= cap: CONTINUE

            placed.add(point)
            count_on_stream++
            dist_since_last ← 0
            valid_positions.add(point with metadata)

    // ── Phase 2: Score every valid position ──
    scored ← []
    FOR EACH pos IN valid_positions:
        compute 28 parameters (simplified model)
        compute 4 sub-scores
        compute composite
        scored.add(pos with scores)

    // ── Phase 3: Risk-percentile selection ──
    scored.sort_by(composite, descending)
    pctile ← population_scaled_percentile(pop)
    cutoff ← max(5, ceil(len(scored) * pctile))
    RETURN scored[0:cutoff]

Complexity: Phase 1 is O(n × m) where n is total coordinate points and m is placed candidates. Phase 2 is O(k) where k is valid positions. Phase 3 is O(k log k) for sorting. In practice k < 200 and the entire function runs in <100ms.

6.3 Sensitivity Analysis (Dirichlet Monte Carlo)

FUNCTION sensitivity_analysis(candidates, n_perturbations=50):
    baseline ← compute_composite_score(candidates)
    top5_counts ← zeros(len(candidates))

    REPEAT n_perturbations TIMES:
        α ← [3.0, 2.5, 3.0, 1.5]
        ω' ← sample_dirichlet(α)
        composite' ← ω'[0]·gen + ω'[1]·flow + ω'[2]·impact + ω'[3]·feas
        top5 ← indices of 5 highest composite'
        top5_counts[top5] += 1

    robustness ← top5_counts / n_perturbations × 100
    RETURN baseline with robustness column

6.4 Bayesian Weight Optimization (Optional Enhancement)

When ground-truth trap locations are available, weights can be optimized via scikit-optimize:

FUNCTION optimize_weights(candidates, known_good_sites):
    FUNCTION objective(weights):
        w ← normalize(weights)
        scored ← recompute with w
        penalty ← sum of ranks of known_good_sites in scored
        RETURN penalty

    search_space ← [Real(0.05, 0.60)] × 4
    result ← gp_minimize(objective, search_space, n_calls=50)
    RETURN normalize(result.x)

Status: Implemented in core/scoring.py but not yet run against real ground-truth data.

7. Architecture and Design Decisions

ADR-1: Two-level scoring instead of flat weighted sum

Decision: Aggregate 28 parameters into 4 sub-scores, then combine sub-scores into a composite.

Context: A flat 28-weight sum is opaque — changing one weight has a non-obvious effect.

Alternatives considered: (1) Flat weighted sum. (2) PCA dimensionality reduction. (3) Random forest classifier.

Chosen approach: Two-level weighted sum. Sub-scores map to real questions ("how much trash?", "how does it move?", "does it matter?", "can we deploy?").

Consequences: Interpretable and tunable, but assumes linear parameter contributions within each family. Non-linear interactions are not captured.

ADR-2: MinMax normalization instead of Z-score or rank

Decision: Use MinMax scaling to [0, 1] per parameter across the candidate set.

Alternatives considered: Z-score, percentile rank, log-transform + MinMax.

Chosen approach: MinMax. Simple, bounded, interpretable.

Consequences: Outliers dominate. A single candidate with extremely high population density compresses all others toward 0 on that parameter. Acknowledged limitation.

ADR-3: Client-side scoring in the dashboard

Decision: The dashboard computes scores in JavaScript, not by calling the Python backend.

Context: pysheds/rasterio have C dependencies that fail on Windows. Dashboard must work by opening one HTML file.

Chosen approach: Simplified JS scoring model that parallels the Python implementation but uses OSM-estimated widths and seeded random parameter generation rather than real API data.

Consequences: Dashboard scores are approximate. Python pipeline is the authoritative scoring implementation.

ADR-4: OpenStreetMap Overpass for waterway geometry

Decision: Fetch real waterway geometry from the Overpass API on each city click.

Alternatives considered: Pre-generated GeoJSON per city (storage), procedural random-walk rivers (alignment), NHD (US-only).

Chosen approach: Overpass API with 12s timeout and procedural fallback.

Consequences: Requires internet. Coverage varies globally. Fallback doesn't align with terrain.

8. Data Model and Schemas

Candidate site (GeoJSON Feature)

{
  "type": "Feature",
  "geometry": {"type": "Point", "coordinates": [-78.898, 35.994]},
  "properties": {
    "id": 0,
    "city": "durham",
    "city_name": "Durham, NC",
    "stream_name": "Ellerbe Creek",
    "composite_score": 47.08,
    "generation_score": 42.5,
    "flow_score": 38.1,
    "impact_score": 35.2,
    "feasibility_score": 82.0,
    "population_density": 1424.3,
    "impervious_pct": 51.8,
    "usgs_mean_q_cfs": 37.0,
    "flow_velocity_ms": 1.377,
    "strahler_order": 5,
    "catchment_area_km2": 63.5,
    "channel_width_m": 13.1,
    "ej_index": 0.595,
    "road_access_m": 366.4,
    "bank_slope_deg": 19.6,
    "robustness_pct": 72.6,
    "rank": 1
  }
}

Places database record (compact JSON)

{"n":"Durham","c":"US","p":278993,"la":35.994,"lo":-78.8986}

Fields: n=name, c=ISO country code, p=population, la=latitude, lo=longitude.

Parameter taxonomy (all 28)

#	Parameter	Unit	Family	Default Weight	Data Source
1	Population density	persons/km²	Generation	0.18	US Census ACS
2	Impervious surface %	%	Generation	0.20	NLCD 2021
3	Road density	km/km²	Generation	0.10	Census TIGER / OSMnx
4	EPA TRI facility count	facilities/km²	Generation	0.18	EPA TRI API
5	NPDES discharge points	count	Generation	0.12	EPA ECHO API
6	CSO/storm outfall density	points/km²	Generation	0.12	EPA ECHO
7	Litter complaint density	reports/km²	Generation	0.10	Durham 311 / local GIS
8	USGS mean discharge Q	cfs	Flow	0.22	USGS NWIS
9	Flow velocity	m/s	Flow	0.16	Manning's eq from DEM
10	Strahler stream order	ordinal	Flow	0.14	Computed from topology
11	Catchment area A	km²	Flow	0.18	pysheds DEM analysis
12	Flood return period Q10	cfs	Flow	0.14	USGS StreamStats
13	Seasonal flow variability	CV	Flow	0.10	USGS NWIS annual stats
14	Runoff coefficient C	dimensionless	Flow	0.06	NLCD k-means
15	Drinking water intake proximity	exp(-d/10)	Impact	0.22	EPA SDWIS / ECHO
16	Protected area proximity	score	Impact	0.16	USGS PAD-US
17	Environmental justice index	[0,1]	Impact	0.18	EPA EJSCREEN
18	Ocean/estuary proximity	km (inverted)	Impact	0.14	NHD terminus
19	Recreational beach proximity	km (inverted)	Impact	0.12	EPA BEACH Program
20	Tourism/recreation value	amenity count	Impact	0.10	OSM amenity density
21	Superfund site proximity	score	Impact	0.08	EPA FRS/CERCLIS
22	Road access distance	m	Feasibility	0.25	OSMnx routing
23	Channel width	m	Feasibility	0.20	NHD VAA + NBI span
24	Flow velocity (penalty)	m/s	Feasibility	0.20	Manning's eq
25	Land ownership	binary	Feasibility	0.15	USGS PAD-US
26	Bank slope stability	degrees	Feasibility	0.10	DEM gradient
27	Bridge/structure proximity	bonus	Feasibility	0.10	FHWA NBI

9. Codebase Structure

grime/
├── core/                       # Python scoring pipeline (core deliverable)
│   ├── __init__.py             # Constants, safe_call(), helpers
│   ├── pipeline.py             # DEM → pysheds → stream extraction → candidates
│   ├── generation.py           # 7 trash generation parameters + API integrations
│   ├── flow.py                 # 7 flow parameters, Manning's equation, USGS data
│   ├── impact.py               # 7 downstream impact parameters, EJ scoring
│   ├── feasibility.py          # 6 deployment feasibility parameters, hard gates
│   └── scoring.py              # Normalization, weighting, composite, sensitivity
├── api/
│   └── main.py                 # FastAPI: REST + WebSocket + static serving
├── dashboard/
│   └── index.html              # Mapbox map, Overpass integration, client-side scoring
├── mock_data/
│   └── places.json             # 108,772 cities, 240 countries (7MB)
├── scripts/
│   └── generate_mock.py        # Builds places.json from geonamescache
├── notebooks/
│   └── validate_pipeline.ipynb # Pipeline validation notebook
├── requirements.txt
├── start.sh
└── README.md

Where critical logic lives:

Logic	File	Function
Composite scoring formula	`core/scoring.py`	`compute_composite_score()`
Manning's velocity	`core/flow.py`	`compute_flow_velocity()`
Hard gate filtering	`core/scoring.py`	`apply_hard_gates()`
Sensitivity analysis	`core/scoring.py`	`sensitivity_analysis()`
Client-side placement	`dashboard/index.html`	`generateCandidates()`
Overpass waterway fetch	`dashboard/index.html`	`fetchRealStreams()`
Sub-score normalization	`core/scoring.py`	`compute_subscore()`

10. Execution Flow

Python pipeline (research mode)

User runs: python -m core.pipeline --bbox "-79.05,35.90,-78.75,36.05"
py3dep.get_map('DEM', bbox, resolution=10) fetches 3DEP raster
pysheds: fill_pits → fill_depressions → resolve_flats → flowdir → accumulation
extract_river_network(threshold=500) → stream GeoJSON
generate_candidates(spacing=200m) → candidate points along streams
For each candidate: compute pixel coords, elevation, catchment area from DEM
Output: mock_data/candidates.geojson

Dashboard (demo mode)

Browser opens dashboard/index.html
Fetches places.json (7MB) → parses 108,772 cities
Mapbox GL JS renders clustered city markers
User clicks city → openCity(idx) fires
POST to Overpass API → returns waterway geometry as JSON
fetchRealStreams() filters: tidal=no, width≤30m, top 15–25 by length
generateCandidates() runs 3-phase placement + scoring
Mapbox renders: stream lines (cyan) + candidate dots (color-coded by score)
User clicks candidate → sidebar shows score breakdown + parameters

11. APIs and Interfaces

REST endpoints

Method	Path	Parameters	Response
GET	`/api/candidates`	`?min_score=N` `?top_n=N` `?subscore=field`	GeoJSON FeatureCollection
GET	`/api/candidates/{id}`	—	Score breakdown with 4 sub-score parameter trees
GET	`/api/weights`	—	All parameter and sub-score weights
GET	`/api/stats`	—	Count, score range, mean, top 5
GET	`/map`	—	Serves dashboard HTML
WS	`/ws`	—	Real-time candidate updates

External APIs consumed

API	Auth	Rate Limit	Timeout	Fallback
Overpass API	None	Informal	12s	Procedural generation
USGS NWIS	None	None published	30s	Hardcoded Ellerbe Creek stats
EPA ECHO	None	None published	30s	Empty GeoDataFrame
EPA EJSCREEN	None	None published	30s	0.5 (neutral)
Census ACS	None	None published	30s	Durham average (500/km²)
USGS 3DEP	None	None published	60s	Fatal — no fallback

Every external API call is wrapped in safe_call() with a default fallback value.

12. Configuration

Setting	Location	Default	Notes
`MAPBOX_TOKEN`	`dashboard/config.js` (gitignored) and `.env`	Placeholder	Must replace — free at mapbox.com. Copy `dashboard/config.example.js` → `dashboard/config.js`, and `.env.example` → `.env`
`ELLERBE_BBOX`	`core/__init__.py`	`(-79.05, 35.90, -78.75, 36.05)`	Ellerbe Creek watershed
`ELLERBE_GAUGE`	`core/__init__.py`	`"02086849"`	USGS gauge site number
`UTM_CRS`	`core/__init__.py`	`"EPSG:32617"`	UTM zone 17N (Durham, NC)
DEM resolution	`core/pipeline.py`	10m	Passed to py3dep
Accumulation threshold	`core/pipeline.py`	500 cells	Stream extraction sensitivity
Candidate spacing	`core/pipeline.py`	200m	Along-stream distance
Composite weights	`core/scoring.py`	[0.30, 0.25, 0.30, 0.15]	Gen, Flow, Impact, Feas
Min spacing (dashboard)	`dashboard/index.html`	120m	Haversine between nets
Max width (dashboard)	`dashboard/index.html`	30m	Skip channels wider
Risk percentile	`dashboard/index.html`	20–35%	Population-scaled
Min deploy floor	`dashboard/index.html`	5	Always at least this many

13. Installation and Setup

First-time setup (macOS / Linux)

macOS only ships python3 (no bare python). The cleanest setup is a venv, which gives you a local python + pip that won't collide with system Python or other projects.

cd ~/Downloads/GRIME

# 1. Create + activate the venv (you'll see (.venv) in your prompt after activation)
python3 -m venv .venv
source .venv/bin/activate

# 2. Install web-server deps (FastAPI, uvicorn, websockets, etc.)
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

# 3. Configure secrets (both files are gitignored)
cp dashboard/config.example.js dashboard/config.js   # then paste your Mapbox pk.* token
cp .env.example .env                                  # then set MAPBOX_TOKEN=...

# 4. Generate the places database (one-time)
python scripts/generate_mock.py

# 5. Start the API (this also serves the dashboard)
python -m uvicorn api.main:app --reload --port 8000

Open http://localhost:8000/ — landing page. /explore for the map app. /api/swagger for API docs.

Run again next session

Once setup is done, every subsequent run is just two commands:

cd ~/Downloads/GRIME
source .venv/bin/activate
python -m uvicorn api.main:app --reload --port 8000

Without a venv

If you'd rather skip the venv, substitute python3 and python3 -m pip everywhere:

python3 -m pip install -r requirements.txt
python3 -m uvicorn api.main:app --reload --port 8000

Windows / full hydrology pipeline

For the full Python pipeline (rasterio, fiona, pysheds), use conda — pip wheels for these often fail on Windows:

conda install -c conda-forge rasterio fiona geopandas pysheds
pip install -r requirements-full.txt

Common gotchas

Error	Cause	Fix
`command not found: python`	macOS only has `python3`	Activate the venv, or use `python3`
`No module named uvicorn`	Deps not installed in the active Python	`python -m pip install -r requirements.txt` (note `python -m pip`, not bare `pip`)
`Mapbox token not configured` (500 from `/api/config`)	`.env` missing or `MAPBOX_TOKEN=` empty	`cp .env.example .env` and fill in the token
Port 8000 in use	Old uvicorn still running	`lsof -i :8000` to find PID, or pass `--port 8001`
Empty candidates / blank map	Mock data not generated	`python scripts/generate_mock.py`

14. Usage

Dashboard demo (90-second path)

Open map → 108,772 cities visible as clustered dots
Search "Durham" → click → fly to Durham, NC
Wait 3s → real Ellerbe Creek geometry appears with scored candidate dots
Click top-ranked site → score breakdown panel shows 4 sub-scores
Toggle "Waterways" → OSM overlay confirms alignment
Toggle "Light theme" → clean beige presentation mode
Explain the 28-parameter model and show weight sensitivity

Python pipeline on custom watershed

python -m core.pipeline --bbox "-79.05,35.90,-78.75,36.05" --resolution 10 --threshold 500

API queries

curl http://localhost:8000/api/candidates?top_n=10
curl http://localhost:8000/api/candidates/3
curl http://localhost:8000/api/weights

15. Performance Characteristics

Operation	Time	Bottleneck
Dashboard initial load	~3s	Parsing 7MB places.json
Overpass API query	2–8s	Network + OSM server
Client-side placement + scoring	<100ms	Haversine collision checks
Python DEM pipeline (full)	30–90s	py3dep HTTP fetch + pysheds
Python parameter computation	3–5 min	Sequential EPA/USGS API calls

No formal benchmarks have been run. Times above are observed during development.

16. Reliability and Failure Modes

Failure	Impact	Mitigation
Overpass API down/slow	No real waterway data	Procedural fallback auto-activates
EPA/USGS API timeout	Missing parameter values	`safe_call()` returns fallback defaults
Mapbox token missing	Map blank	Fatal for dashboard; replace token
pysheds DEM fetch fails	No stream network	Fatal for Python pipeline; mock data works
MinMax with identical values	Division by zero	Returns 0.5 for all candidates

17. Security Considerations

No authentication on the API — local development only
Mapbox token is a publishable (pk.*) client-side token. It lives in dashboard/config.js (gitignored) and in .env (also gitignored). Use dashboard/config.example.js and .env.example as templates. Restrict the token by URL in the Mapbox dashboard before deploying anywhere public.
Overpass queries use numeric interpolation only — no injection risk
places.json contains only public geographic data, no PII
All external APIs are read-only and keyless

18. Testing Strategy

Current state: No automated tests. Validation is:

notebooks/validate_pipeline.ipynb — manual pipeline verification
Visual inspection of stream networks in geojson.io
Dashboard visual QA — verify rivers align with satellite

Recommended if continuing: Unit tests for compute_subscore(), Manning's velocity, hard gates. Property test: composite always in [0, 100].

19. Limitations

Scores are relative, not absolute. MinMax normalization means scores across cities are not comparable.
Client-side scoring is approximate. Dashboard uses heuristics; Python pipeline uses real API data.
Weight values are heuristic. Not optimized against ground truth. Bayesian scaffold exists but hasn't been run.
OSM coverage varies. Excellent in US/Europe, variable in developing nations.
Channel width is estimated. <5% of OSM waterways have width tags. Estimated from type heuristic.
No temporal modeling. Scores are static snapshots, not seasonal.
US-centric data sources for the Python pipeline. Dashboard bypasses this with population-scaled heuristics globally.
120m spacing threshold is arbitrary — tuned by inspection, not engineering spec.

20. Future Work

Ground-truth validation against actual trap deployment locations (Durham Stormwater Services)
Bayesian weight optimization with real data
Temporal scoring with USGS real-time discharge
Computer vision trash detection from satellite/drone imagery
Multi-city normalized scoring for global prioritization
Cost modeling (deployment cost, maintenance frequency)
Research publication: extend WaterGate paper, contact Wolfram Research authors
University collaboration: Duke Environmental Engineering

21. Appendix

A. References

Chow, V.T. (1959). Open-Channel Hydraulics. McGraw-Hill.
Leopold, L.B. (1964). Fluvial Processes in Geomorphology.
Manning, R. (1891). On the flow of water in open channels and pipes.

B. Glossary

Term	Definition
Candidate site	A point on a waterway evaluated for trap deployment
Composite score	Final weighted combination of 4 sub-scores (0–100)
CSO	Combined Sewer Overflow
D8	Deterministic 8-direction flow algorithm
DEM	Digital Elevation Model
EJ	Environmental Justice
Hard gate	Binary feasibility check that eliminates a candidate
NPDES	National Pollutant Discharge Elimination System
NBI	National Bridge Inventory
Strahler order	Stream classification: order 1 = headwater, higher = larger
Sub-score	Intermediate score (0–100) for one parameter family
TRI	Toxic Release Inventory

C. License

MIT.

GRIME · 28 parameters · 6 families · 108,772 cities · 240 countries

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
api		api
core		core
dashboard		dashboard
mock_data		mock_data
notebooks		notebooks
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
requirements-full.txt		requirements-full.txt
requirements.txt		requirements.txt
start.sh		start.sh
uvicorn		uvicorn

Folders and files

Latest commit

History

Repository files navigation