Make id_segment globally unique (position-derived, bcfp-style) — id_segment-alone persist joins cartesian across WSGs

## Problem

link's `id_segment` is a **per-WSG row index**, unique only within the persist primary key `(id_segment, watershed_group_code)` — not globally. In province-wide persist (`fresh.streams`): **1,542,427 rows but only 80,555 distinct `id_segment`** — each value repeats ~19× across WSGs. So any join between persist tables on `id_segment` **alone** is a ~19–22× cartesian blow-up.

Active damage found 2026-05-24 (during #175):
- `lnk_compare_rollup` joined `streams_habitat_<sp>` on `id_segment` alone against province-wide persist → PARS BT `spawning_km` **36,819.8 vs correct 1,681.1 (~22×)**. The km parity numbers from that path are garbage whenever persist holds >1 WSG. **Tactically fixed in #175** (full-PK joins).
- `lnk_compare_mapping_code` had the same latent join — also fixed in #175.

Safe today (join the *working* schema = single WSG = id unique): `lnk_compare_wsg` rollup, `lnk_pipeline_persist`.

## Root cause + proposed fix

bcfp's `segmented_stream_id` is **position-derived** — it encodes `blue_line_key` + `downstream_route_measure` (e.g. `359371372.3868298`), so it is globally unique by construction. link should do the same: derive `id_segment` deterministically from `(blue_line_key, downstream_route_measure)` so it is globally unique.

Benefits:
1. Kills the whole `id_segment`-alone-join bug class (defense in depth, not just per-join PK discipline).
2. Bonus: link `id_segment` == bcfp `segmented_stream_id` → compares can join **directly on the id** (no `blue_line_key + round(downstream_route_measure)` position join), and it becomes a stable cross-build / cross-system key.

## Blast radius (scope carefully)

`id_segment` is generated upstream (fresh segmentation / link persist — confirm during exploration) and is a PK component + join key across `streams`, `streams_habitat_<sp>`, `streams_access`, `streams_mapping_code`, `barriers`. Changing its derivation touches generation + every consumer. Decide: determinism, uniqueness guarantee, type (bcfp uses a float position-encoding — weigh numeric vs text vs bigint), and migration of existing persist data.

## Interim discipline (until root fix)

All persist joins MUST use the full PK `(id_segment, watershed_group_code)`. Periodic audit:
```
grep -rnE "id_segment *= *[a-z]+\.id_segment|USING \(id_segment\)" R/ data-raw/ | grep -v watershed_group_code
```

Relates to #175. SRED NewGraphEnvironment/sred-2025-2026#24.


## Verified (2026-05-24): bcfp segmented_stream_id

- **Globally unique** — `fresh.streams_vw_bcfp`: 4,230,118 rows = 4,230,118 distinct `segmented_stream_id` (no `watershed_group_code` needed in the key).
- **Position-derived from `blue_line_key` + `downstream_route_measure`**, per bcfp's data dictionary (`docs/05_data_dictionary.rst`): *"segmented_stream_id is based on FWA blue_line_key and downstream_route_measure for a given segment. When the segmentation by bcfishpass modelling changes, the segmented_stream_id will also change."*
- **Type: text** (`character varying`). Integer part = `blue_line_key` (verified for every sampled row); fractional part encodes the rounded measure — e.g. measure `120.287972` → `…903.120288`, `48.3061357` → `…361.48306` (≈ `blue_line_key || '.' || digits-of round(downstream_route_measure, 3)`). It is **not** a meaningful float (a fixed `blk + measure/scale` does NOT fit — the fraction-digit count varies); treat it as an opaque deterministic position key.
- **Segmentation-dependent** — regenerated each bcfp build; only comparable within the same model run (already handled: we pin the build via `s3://fresh-bc/bcfishpass/log.json`).

Implication for the fix: adopting bcfp's exact text construction for link's `id_segment` would make it globally unique AND `id_segment == segmented_stream_id` (direct id-join in compares). Pin the exact generating SQL from bcfp source (likely fwapg `FWA_*` / the bcfp segmentation step) during implementation — the `AS segmented_stream_id` expression wasn't located by code-search at `@e12c1a5`, so trace it from the segmentation/load step.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make id_segment globally unique (position-derived, bcfp-style) — id_segment-alone persist joins cartesian across WSGs #203

Problem

Root cause + proposed fix

Blast radius (scope carefully)

Interim discipline (until root fix)

Verified (2026-05-24): bcfp segmented_stream_id

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Make id_segment globally unique (position-derived, bcfp-style) — id_segment-alone persist joins cartesian across WSGs #203

Description

Problem

Root cause + proposed fix

Blast radius (scope carefully)

Interim discipline (until root fix)

Verified (2026-05-24): bcfp segmented_stream_id

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions