Skip to content

telco_network_recovery: harden the template + refresh expected results#71

Merged
cafzal merged 2 commits into
mainfrom
telco-template-learnings
May 20, 2026
Merged

telco_network_recovery: harden the template + refresh expected results#71
cafzal merged 2 commits into
mainfrom
telco-template-learnings

Conversation

@cafzal
Copy link
Copy Markdown
Collaborator

@cafzal cafzal commented May 20, 2026

Summary

Hardens the telco_network_recovery template against two crash bugs, bumps the SDK pin, fixes doc accuracy, and refreshes every expected-result figure to a verified RAI 1.4.2 run. Learnings propagated from the internal summit-demo refactor of the same chain.

Changes

Crash fixes (telco_network_recovery.py)

  • reset_index(drop=True) on the train/val/test split DataFrames. The val split is an .iloc slice starting mid-frame, so model.data()'s df[col][0] type-inference lookup raised KeyError: 0 — a hard crash before Stage 1 on current SDK releases.
  • MIP solve wrapped in a Gurobi→HiGHS fallback. A customer whose prescriptive engine isn't Gurobi-licensed previously crashed at the payoff stage; it now falls back to the bundled open-source HiGHS solver automatically.

Dependency

  • pyproject.toml: relationalai 1.2.21.4.2.

Doc accuracy (runbook.md)

  • "8 concepts" → "8 source-data concepts" (the script defines more).
  • Stage 2 response separates advised-MODEL coverage (572/1500, 38.1%) from the at-risk label rate (597/1500, 39.8%) — two distinct metrics that were conflated.
  • Stage 8: RestorePlan.binding_constraint is a single String and chosen rows are marked via TowerUpgradeOption.is_selected_upgrade — corrected from the stale "list" + "SelectedUpgrade view" wording.

Expected-results refresh (README.md + runbook.md)

  • Every headline figure updated to a verified end-to-end run on RAI 1.4.2: 142 critical-restore towers, 36 selected, 207 Gbps, $4,997,992 / $5M binding, 194 install-weeks, 17B/15S/4G.
  • Added a stochasticity note — the GNN is stochastic, so exact figures shift run to run while the structural outcome (all 5 regions, budget binding, ~200 Gbps, ~36 towers) reproduces.

Verification

Ran telco_network_recovery.py end-to-end on RAI 1.4.2 — pipeline completed all 4 stages, MIP reached OPTIMAL, RestorePlan materialized. No KeyError, no traceback. The refreshed figures are this run's output.

Test plan

  • Re-run python telco_network_recovery.py on a fresh checkout to confirm it completes (figures will differ — GNN is stochastic — but structure holds).
  • Confirm the Gurobi→HiGHS fallback path on an engine without a Gurobi license.

cafzal added 2 commits May 19, 2026 20:16
Learnings propagated from the summit-demo refactor of the same chain.

Script (telco_network_recovery.py):
- reset_index(drop=True) on the train/val/test split DataFrames. The
  val split is an .iloc slice starting mid-frame, so model.data()'s
  df[col][0] type-inference lookup raised KeyError: 0 — a hard crash
  before Stage 1 fit on newer SDK releases.
- MIP solve wrapped in a Gurobi→HiGHS fallback. A customer whose
  prescriptive engine isn't Gurobi-licensed previously hit a hard
  crash at the payoff stage; now it falls back to the bundled
  open-source HiGHS solver automatically.

pyproject.toml: pin relationalai 1.2.2 -> 1.4.2.

runbook.md doc accuracy:
- "8 concepts" -> "8 source-data concepts" (the script defines more —
  TowerFailureScore, RestorePlan, GNN task tables).
- Stage 2 response now separates advised-MODEL coverage (572/1500,
  38.1%) from the at-risk label rate (597/1500, 39.8%) — two distinct
  metrics that were conflated.
- Stage 8: RestorePlan.binding_constraint is a single String, and the
  chosen rows are marked via TowerUpgradeOption.is_selected_upgrade —
  corrected from the stale "list" + "SelectedUpgrade view" wording.

Verified end-to-end against RAI 1.4.2.
…2 run

The headline figures in the README and runbook were captured from an
older run. Refreshed every expected-result number to the end-to-end
verification run on RAI 1.4.2 (the version this template now pins):

  142 critical-restore towers (was 166) · 36 selected (was 39)
  207 Gbps restored (was 214) · $4,997,992 / $5M binding (was 4,999,671)
  194 install-weeks (was 195) · 17 BRONZE / 15 SILVER / 4 GOLD
  GNN: failure_intensity median 2.92, 139/190 towers > 1.5

Added a stochasticity note to both the README expected-output block and
the runbook intro: the equipment-failure GNN is stochastic, so the exact
figures shift run to run while the structural outcome (all 5 regions,
budget binding, ~200 Gbps, ~36 towers) reproduces. The README heading is
now "Representative output (one run on RAI 1.4.2)" rather than implying
seed-exact reproducibility.
@cafzal cafzal marked this pull request as ready for review May 20, 2026 03:33
@cafzal cafzal merged commit 47f08dd into main May 20, 2026
3 checks passed
@cafzal cafzal deleted the telco-template-learnings branch May 20, 2026 03:33
@github-actions
Copy link
Copy Markdown

The docs preview for this pull request has been deployed to Vercel!

✅ Preview: https://relationalai-docs-r4vexnyv3-relationalai.vercel.app/build/templates
🔍 Inspect: https://vercel.com/relationalai/relationalai-docs/Cryw7J4AYpZ21hcsGmvyDeXMVPxd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant