Skip to content

memory_supply_allocation template + templates library index restructure#81

Closed
cafzal wants to merge 13 commits into
mainfrom
add-memory-supply-allocation-template
Closed

memory_supply_allocation template + templates library index restructure#81
cafzal wants to merge 13 commits into
mainfrom
add-memory-supply-allocation-template

Conversation

@cafzal

@cafzal cafzal commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Two related bodies of work on this branch.

1. memory_supply_allocation template

A multi-reasoner template: monthly rolling-horizon allocation of constrained memory-chip supply across customers with strategic supplier dependencies, named foundries, and raw-material inputs. Four-reasoner chain — predicted supplier capability feeds the optimization, customer-customer paths surface single points of failure, and two what-if scenarios trace supplier-offline and input-shortage cascades. Includes a paste-testable runbook.md walkthrough.

2. Templates library index restructure

  • Root README.md: the Repository layout section becomes a Templates section — the per-folder contents list (now including runbook.md: ordered prompts to recreate or adapt a template with a coding agent + RAI skills) plus a generated, collapsible-by-industry index of the current (v1) templates with each template's reasoners and description.
  • scripts/generate_version_indexes.py: now reads industry and reasoning_types from front matter and writes the collapsible, industry-grouped index to both the root README (between <!-- BEGIN/END TEMPLATE INDEX --> markers) and each version README.
  • Industry normalization: the fragmented ~25 ad-hoc industry labels are consolidated into 8 coarse, mutually-exclusive buckets across all version dirs (Financial Services, Healthcare & Life Sciences, Manufacturing, Retail & Consumer, Technology & Telecom, Energy & Utilities, Supply Chain & Logistics, Cross-Industry).
  • CI/tooling: CONTRIBUTING.md, .pre-commit-config.yaml, and the verify-version-indexes workflow updated to cover the root README and the additional front-matter fields.

python scripts/generate_version_indexes.py --check passes. industry is a free-form string in the docs templates:sync validator (only reasoning_types is enum-gated), so the new buckets require no docs-repo change.

cafzal added 13 commits May 28, 2026 18:16
…ed RCA, multi-reasoner)

Four-reasoner chain on one ontology: Predictive (pre-computed supplier
capability forecast) -> Rules (customer-customer dependency yield + elevated
floor) -> Prescriptive (monthly rolling-horizon LP across 36 months with two
disruption reveals) -> Graph (paths library for dependency-chain enumeration
+ RCA + two what-if scenarios).

Demonstrates a planner workflow under structural scarcity: HBM3E demand
exceeds capacity, and hyperscaler customers are willing to yield part of their
allocation so the equipment-maker customers they depend on stay supplied. As
disruptions surface (Orion Foundry downtime at month 5, helium shortage at
month 13), the LP re-solves the remaining horizon; plan-diff between
iterations exposes who absorbs each shock.

- Stage 1 (Rules): max_declared_yield_pct, elevated_floor_pct, depends_on
  graph edge, is_dependency_spof flag, all ontology-resident.
- Stage 2 (Predictive): SupplierCapabilityForecast loaded as a regression
  target per (supplier, month). Clean upgrade path to rai-predictive-modeling
  GNN node classification if richer features are available.
- Stage 3 (Prescriptive): revenue-max LP with effective_capacity per (product,
  period) = sum_suppliers(nominal x capability_pct) x product of
  (1 - intensity x (1 - availability)) across raw-material inputs. 3-step
  rolling horizon with disruption reveals; fresh Problem per iteration with
  populate=False extraction.
- Stage 4 (Graph / paths): model.path(Customer.depends_on.repeat(1, 3))
  .all_paths() enumerates variable-length chains. Two what-if branches
  ablate one supplier and one input at a time, ranked by cascade footprint
  on (product, period) cells.

Run output:
- iter 0 baseline (months 1-36): OPTIMAL, \$47.09B margin
- iter 1 Orion downtime revealed (months 5-36): OPTIMAL, \$41.96B;
  hyperscalers absorb the disruption, equipment-maker customers stay pinned
  at their elevated floors via the dependency mechanic
- iter 2 helium shortage revealed (months 13-36): OPTIMAL, \$30.19B;
  hyperscalers absorb a further \$4.5B, equipment makers still protected
- Apex Photonic Components flagged as the lone dependency-graph SPOF via
  ontology query; Orion Foundry has the widest supplier-offline impact;
  Helium has the widest input-shortage impact (180 affected cells)
…gy bindings

- Banner style: switch from `# ---- Stage N: ... ----` (single line) to the
  box-comment format used by `telco_network_recovery.py` and
  `energy_grid_planning.py`. Drop the "Stage 0" prefix from the ontology-load
  section to match the reference template (no stage number for the load
  block; stages 1-4 are the reasoner stages).
- Stage 4 ontology bindings: persist the what-if structural conclusions
  back to the ontology so a post-hoc analyst can query supplier-offline
  risk and input-shortage risk without re-running the pipeline:
    Supplier.offline_impact_cells (Integer)
    Supplier.offline_max_cap_drop_pct (Float)
    Input.shortage_impact_cells (Integer)
  Closes the "every reasoner stage binds output back to the ontology"
  checklist gap.
- Remove unused HELIUM_INPUT_ID constant.
- Reorder + format display-only columns so printed output matches the
  README expected-output block (max_cap_drop_% column appears between
  n_affected_cells and affected_products, formatted as percentage).

No behavior changes to the LP, the paths analysis, or the disruption
mechanics. Run reproduces identical margins ($47.09B / $41.96B / $30.19B)
and identical Headline (Apex SPOF, Orion widest supplier impact, Helium
widest input impact, $16.9B margin erosion).
Ten-step walkthrough of the four-reasoner chain. Each step has a Prompt
(skill-prefixed, question-shaped, with definitions inline and scope stated
explicitly) and a Response block whose numbers match the script's actual
print output. Sequential cascade: prompts inherit ontology state from
earlier steps via property and concept names, never via step numbers.

- Lead-in + chain ASCII: 11 customers / 5 SKUs / 36 months / 6 suppliers /
  3 inputs; baseline $47.09B margin headline, 24-month-tail $30.19B,
  Apex SPOF, Orion widest supplier impact, Helium widest input impact.
- Stages 1-4 each show: ontology enrichments written, row counts, and the
  business answer the named skill produces.
- Numerical cohesion against the script: per-customer service levels
  (Hyperion 76%, Aether 83.7%, equipment makers 95-99%), iter 1/iter 2
  margins ($41.96B / $30.19B), plan diffs (Aether -$369.62M / Hyperion
  -$280.36M at iter 1; Hyperion -$2,149.37M / Aether -$1,839.17M at iter 2),
  what-if cascade rankings (Orion 72 cells / 60.9% max; Helium 180 cells).

Closes the multi-reasoner-templates-have-a-runbook gap flagged in the
dev-templates-review pass.
Real paste-test against fresh /rai-* skill sessions verified the runbook
against the bundled ontology + data; this commit applies the wording
fixes the test surfaced. No script changes.

What reproduced exactly:
- Step 6 baseline LP margin: $47,089,150,341 to the dollar
- Step 7 iter 1 margin: $41,960,554,872 exactly
- Step 9 cascade rankings: Orion 72 cells / 60.9% max drop, Helium
  180 cells / 35% HBM3E avg drop — bit-exact
- Step 5 forecast stats and Step 4 SPOF flag (Apex Photonic) match

What needed clarification:
- Step 2 + chain ASCII: "4 customers declare yield" was wrong;
  Photonic Lithography is also a downstream yielder (to Apex) AND
  an upstream receiver (from Hyperion/Aether) -- the bridge node
  that creates both 2-hop chains. Now reported as 5 yielding
  customers with Photonic's dual role called out.
- Step 4: rules-authoring response now explicitly states the
  coalesce-to-default requirement; sparse derived properties
  silently break downstream solver constraints (the symptom is
  alloc grossly exceeding demand, not INFEASIBLE), so this is
  worth flagging in the prompt response.
- Step 6: per-customer service levels are LP-degenerate -- multiple
  optimal allocations exist on the feasible face with identical
  total margin. Response now hedges to representative ranges
  (hyperscalers 76-84%, equipment makers 95-99%) and leads with
  the invariant margin headline.
- Step 7: input disruption semantics are persistent through end of
  horizon in the script (suppliers DO respect their window).
  Response now states this explicitly and notes that window-only
  helium semantics would yield iter 2 margin ~$31.94B instead.
  Iter 1 reproduced exactly; iter 2 has a documented dependency
  on this semantic choice.
- Step 8: model.path(...).all_paths() requires explicit role
  short_names on the same-type two-slot relationship. Prompt now
  includes the exact declaration so the paths library can resolve
  the depends_on edge.

New "Reproducibility notes" section at the end of the runbook
collects these findings explicitly so future analysts know what's
deterministic and what isn't. Forecast range corrected from
[0.91, 0.99] to [0.93, 0.99] (the actual data range).

Out of scope, surfaced as follow-ups:
- rai-graph-analysis skill needs a paths-library section
- Input vs supplier disruption-window asymmetry in the script
- A pitfall card in rai-rules-authoring on sparse-derived-property
  -> ungrounded-solver-constraint symptom
Re-ran paste-test against the trimmed runbook (2026-05-29) in an isolated
dir with a fresh subagent that built walkthrough.py from the prompts alone.
Verified the three fixes from the prior commit:

- Step 2: yield count (5 with Photonic as bridge) -- PASS, naturally
  reproduced via the named skill.
- Step 4: coalesce-to-default guidance in the Response -- PASS, the rules
  derivation grounded correctly downstream.
- Step 6: hedged per-customer service levels -- PASS, margin reproduced
  EXACTLY ($47,089,150,341 to the cent).
- Step 7: persistent-helium semantics in the prompt -- PASS, both iter
  margins reproduced bit-exactly ($41,960,554,872 and $30,188,075,056,
  resolving the $1.75B discrepancy from the prior test).
- Step 8: PARTIAL on first re-test. Setup note covered the relationship
  signature (role short_names) but not the entity-binding population
  pattern. The subagent's natural FK-Property-navigation population
  (Customer.depends_on(Dependency.downstream, Dependency.upstream))
  triggered TyperError on model.path(...).all_paths(). Setup note now
  includes the working Customer.ref()-based population recipe explicitly.

Naturalness sweep also passed -- prompts read as analyst questions
(Steps 2/4/6/7 unchanged; Step 8's prompt body is question-shaped with
the implementation hint isolated in the Setup note above the prompt
block, consistent with the dev-templates-review setup-vs-analyst
distinction).
- Template structure tree + What's included now list runbook.md (was missing
  after the runbook.md addition in 8e03a4b).
- Stage 2 forecast-range claim corrected from [0.85, 1.00] (the data-generator
  clip bounds) to [0.93, 0.99] (actual range across the 216 bundled rows);
  per-supplier mean 0.95-0.97 added for sanity-check granularity. Matches the
  range already corrected in runbook.md after paste-test verification.
- "How it works" section subheadings de-duped: was '### 1. Stage 1: ...'
  through '### 4. Stage 4: ...' (numbers and Stage labels both numbered the
  subsections). Now '### Stage 1: ...' through '### Stage 4: ...'.
- Stage 2 snippet: added the pd.read_csv line so the bind makes sense
  standalone.
- Stage 3 compute_effective_capacity snippet: replaced the abbreviated
  excerpt (ending in '# ... product-level multiplier, then multiply by
  per-product sum') with the verbatim function body. Satisfies the
  templates-review "How it works code snippets copied verbatim from script
  (no cleanup)" rule and prevents drift between README and script.
- Stage 3 problem.satisfy snippet: added the name= kwarg dropped in the
  earlier excerpt; matches the actual constraint declaration.

No content changes to the headline narrative, expected output, customization
list, or troubleshooting.
… iteration)

Stage 2 (Predictive) now trains a real node-regression GNN on Supplier
predicting capability_pct per (supplier, month). The previous pre-computed
CSV path is preserved behind USE_PRECOMPUTED_FORECAST=True for fast
iteration / offline reproducibility.

New sample data (deterministic from seed=42):
- supplier_features.csv (6 rows × 5 features): equipment_age_months,
  geopolitical_exposure_score, region, process_node_nm, workforce_size_k.
- supplier_observations_historical.csv (144 rows = 6 suppliers × 24 months
  of past observations, periods -23..0). GNN training labels.

GNN architecture (Stage 2):
- SupplierObservation source concept (identify_by={supplier_id, period_id}).
- Heterogeneous graph: SupplierObservation -> Supplier (216 edges) plus
  same-region Supplier -> Supplier edges so feature signal flows between
  similar fabs.
- PropertyTransformer: 5 features per Supplier as category/continuous/
  integer; PKs/FKs and the target column dropped to prevent leakage.
- Temporal split: train on periods -23..-4 (120 rows), val on -3..0
  (24 rows), test = future periods 1..36 (216 rows).
- task_type="regression", eval_metric="rmse", n_epochs=30, lr=0.005.
- Predictions extracted via Source.predictions.predicted_value pattern,
  clipped to [0.85, 1.00], bound into the existing SupplierCapabilityForecast
  concept so Stage 3/4 see the same ontology surface either path.

Stage 2 setup adds Snowflake prerequisites (EXP_DATABASE + EXP_SCHEMA + 4
RAI Native App grants + GPU-sized predictive reasoner) -- documented in
README Prerequisites with the SQL block. Defaults: MEMORY_SUPPLY /
EXPERIMENTS. EXP_DATABASE/EXP_SCHEMA/GNN_DEVICE/GNN_N_EPOCHS/GNN_LR/GNN_SEED
are labeled top-of-file constants.

Run output (GNN-default, end-to-end verified against the bundled
ontology and Snowflake predictive reasoner test_gpu, 79.9s training):
- iter 0 baseline:  OPTIMAL  margin=$45,488,032,436
- iter 1 Orion:     OPTIMAL  margin=$40,523,678,803
- iter 2 helium:    OPTIMAL  margin=$28,972,506,958
- Margin erosion:   $16,515,525,478
- Stage 4 headline reproduces: Apex SPOF, Orion 72 cells / 60.0% drop,
  Helium 180 cells widest input impact, 9 dependency paths / 2 multi-hop.

CSV-fallback path (USE_PRECOMPUTED_FORECAST=True) yields ~$47.09B /
$41.96B / $30.19B (the GNN learns slightly lower capability_pct values
from features than the synthetic forecast). README expected output now
shows the GNN-default numbers as headline; both paths' numbers are
documented in runbook Reproducibility notes.

README + runbook updates: Stage 2 "How it works" rewrites for GNN;
template structure tree + What's Included list the two new CSVs;
Prerequisites adds the Snowflake setup block; chain ASCII forecast
range + Step 5 prompt/response rewritten; Reproducibility notes
clarify margins are path-dependent and within-path invariant.

py_compile + ruff clean.
Previously the CSV-fallback path used a feature-driven synthetic forecast
(0.93-0.99 range) while the GNN-default path learned slightly lower
values (0.92-0.93). Two different code paths produced two different
downstream margin numbers, which required a "divergence" caveat in
README + runbook.

This commit aligns the paths by treating data/supplier_capability_forecast.csv
as a snapshot of the GNN's output rather than a parallel synthetic file.
The snapshot is refreshed by dev_temp/snapshot_gnn_forecast.py (a
developer tool, not template runtime). The customer-facing template
itself stays read-only with respect to bundled data -- no write-back
to data/ from memory_supply_allocation.py.

Verified bit-identical downstream behavior:
- USE_PRECOMPUTED_FORECAST=False (GNN-default): $45,488,032,436 /
  $40,523,678,803 / $28,972,506,958 with margin erosion $16,515,525,478
- USE_PRECOMPUTED_FORECAST=True (CSV path):     same numbers, same headline
- Stage 9 cascade rankings (Orion 72 cells / 60.0%; Helium 180 cells)
  identical on both paths

README + runbook: drop the GNN-vs-CSV divergence language; reproducibility
notes simplified to "invariant across paths." Data generator note
clarified that the forecast CSV is GNN-snapshot-managed.
…anup

- pyproject.toml: relationalai==1.0.14 -> relationalai[gnn]==1.5.0. The
  [gnn] extra pulls in the predictive-reasoner dependencies needed for
  Stage 2's GNN training; 1.5.0 is current and ships the paths library
  and the predictive-reasoner submodule.
- README Prerequisites: matching version text update.
- memory_supply_allocation.py: lift the deferred Stage-2 imports
  (Any / Graph / GNN / PropertyTransformer) to the top of the file
  alongside the other relationalai.semantics imports. Avoids the
  inside-conditional import pattern, which the dev-templates-review
  checklist's import-ordering rule prefers.
- README Stage 2 "How it works" GNN snippet: add the stream_logs=False
  and seed=GNN_SEED kwargs that were missing from the README excerpt,
  bringing it back to verbatim parity with the script. Drift safety.
- v1/README.md index regenerated to include memory_supply_allocation
  in the v1 catalog (CI gate fix).
Address PR #77 review: drop H1, tighten 'What this template is for' to
value-focused intro, spell out linear programming, list outcomes in
'What you'll build', fold GNN setup into Prerequisites so Quickstart
follows Tools, trim Quickstart output to a confirmation snippet, add
Start-here pointer, Sample data, Model overview (per-concept tables),
H3-grouped Customize, and Learn more / Support sections.
- Restructure root README: replace "Repository layout" with a "Templates"
  section (per-folder contents list, now including runbook.md) plus a
  generated, collapsible-by-industry index
- generate_version_indexes.py now extracts industry + reasoning_types and
  writes the index to both the root README (between markers) and each
  version README, grouped by industry
- Normalize industry frontmatter into 8 coarse, mutually-exclusive buckets
  across all version dirs
- Update CONTRIBUTING, pre-commit, and the verify-version-indexes workflow
  to cover the root README and the new front-matter fields
@cafzal

cafzal commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #82. memory_supply_allocation is already on main, so this branch's commits for it were redundant; #82 carries just the library-index work, cleanly based on current main.

@cafzal cafzal closed this Jun 8, 2026
@cafzal cafzal deleted the add-memory-supply-allocation-template branch June 8, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant