Skip to content

Forbid randomness inside variable formulas#500

Merged
MaxGhenis merged 2 commits into
masterfrom
feature/forbid-randomness-in-formulas
Jun 4, 2026
Merged

Forbid randomness inside variable formulas#500
MaxGhenis merged 2 commits into
masterfrom
feature/forbid-randomness-in-formulas

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Why

A rules-engine formula must be a pure, deterministic function of its inputs — identical inputs must always produce identical outputs. Calling an RNG inside a formula breaks that contract and makes whole datasets non-reproducible.

This is not hypothetical: a property_purchased assignment in policyengine-uk-data built on unseeded np.random intermittently spiked the UK first income decile's effective tax rate to 251%, failing the data build's sanity test and blocking all UK data releases for ~2 weeks. The data-side fixes (seed the draw; flip the model default) address that instance; this PR makes the whole class impossible at the engine level.

What

While a formula runs, forbid_randomness replaces the public callables of numpy.random and the stdlib random module with functions that raise NonDeterministicFormulaError.

  • Seeding doesn't exempt younp.random.default_rng(0).random() inside a formula still raises. Stochastic inputs must be precomputed deterministically when building the dataset and stored as input variables.
  • The guard is re-entrant (nested formula evaluation handled) and restores the namespaces once the outermost formula returns.
  • Wired into the single formula-invocation chokepoint, Simulation._run_formula.

Also removes the per-variable np.random.seed(...) previously applied at the top of every calculate(). It existed only to make formula-level randomness reproducible — now moot — and it conflicted with the guard (it fires during nested formula execution). Core-internal seeding in __init__ (governing legitimate non-formula randomness, e.g. axes) is unchanged.

Impact

Enforcement of an already-held invariant: no formula in policyengine-uk or policyengine-us uses randomness (grep across both variables/ trees returns zero).

Test plan

  • New tests/core/test_randomness_guard.py: numpy random, stdlib random, and seeded-generator calls inside a formula all raise; deterministic formulas unaffected; namespaces restored afterwards; re-entrancy correct.
  • Full core suite: 580 passed, 5 skipped, 1 xfailed.

MaxGhenis added 2 commits June 4, 2026 11:27
A rules-engine formula must be a pure, deterministic function of its
inputs: identical inputs must always produce identical outputs. Calling a
random number generator inside a formula breaks that contract and makes
datasets non-reproducible (a property_purchased assignment built on
unseeded np.random recently spiked a UK income decile's tax rate and
blocked data releases for ~2 weeks).

While a formula runs, forbid_randomness replaces the public callables of
numpy.random and the stdlib random module with functions that raise
NonDeterministicFormulaError. Seeding does not make randomness acceptable
in a formula; stochastic inputs must be precomputed deterministically
when building the dataset and stored as inputs. The guard is re-entrant
and restores the namespaces once the outermost formula returns.

Removes the per-variable np.random.seed previously applied at the top of
calculate(), which existed only to make formula-level randomness
reproducible and now conflicts with the guard. Core-internal seeding in
__init__ (for non-formula randomness) is unchanged.

No formula in policyengine-uk or policyengine-us uses randomness, so this
is enforcement of an already-held invariant. Full core suite passes
(580 tests).
Independent review of the determinism guard surfaced two interception
gaps and missing exception-safety coverage:

- A generator hoisted to module scope (rng = np.random.default_rng(0))
  and used inside a formula is not caught: numpy's generator classes are
  immutable C extension types, so their bound methods cannot be patched.
  (Building a generator inside the formula IS caught, since the
  np.random.default_rng constructor is patched.)
- A drawing function imported by name before the guard installs
  (from numpy.random import random) is not caught.

Neither is closeable without heavy machinery, and no PolicyEngine formula
uses randomness, so document both accurately in the module docstring and
pin them with tests so the boundary cannot drift silently. Corrects an
earlier draft docstring that wrongly claimed the generator classes were
patched.

Adds tests:
- randomness namespaces are restored after a formula raises a non-RNG
  exception (pins _run_formula guarded-block exception safety),
- constructing a generator inside a formula raises,
- the two known gaps behave as documented.
@MaxGhenis MaxGhenis merged commit a15c188 into master Jun 4, 2026
23 checks passed
@MaxGhenis MaxGhenis deleted the feature/forbid-randomness-in-formulas branch June 4, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant