Symptom
When a Microsimulation is constructed from a dataset that supplies an ETERNITY-defined variable as input, calculate/calculate_dataframe returns the variable's default value for every row instead of the dataset-provided values, as soon as anything triggers Simulation._invalidate_all_caches (e.g. apply_reform, subsample).
We hit this in policyengine-us-data after upgrading to policyengine-us 1.674.1 / policyengine-core 3.25.3. is_household_head (a bool ETERNITY variable on Person) came back all-False from acs.calculate_dataframe([\"is_household_head\", ...]), even though the ACS H5 dataset has it correctly populated. Filtering by is_household_head then yielded zero rows and downstream code crashed.
Suspected mechanism
In policyengine_core/simulations/simulation.py:
Holder.set_input records (variable_name, branch_name, period) into simulation._user_input_keys using the user-supplied period (e.g. Period(2024) from a dataset's time_period).
- For an ETERNITY-defined variable,
Holder._set canonicalizes storage to period=ETERNITY, so _memory_storage._arrays is keyed by \"default:eternity\".
_invalidate_all_caches then iterates _user_input_keys and looks up f\"{branch_name}:{period}\" (\"default:2024\"), which misses the canonicalized \"default:eternity\" entry.
- The preserve-loop therefore doesn't re-add the array, the holder's
_arrays is wiped at the iteration over population._holders, and subsequent calculate calls fall back to the variable's default_value.
Affected versions: introduced sometime in 3.24.x (the _invalidate_all_caches + _user_input_keys machinery is in 3.24.0/3.24.4/3.25.x; not present in 3.23.6).
Minimal repro sketch
```python
import numpy as np
from policyengine_core.data import Dataset
from policyengine_core.simulations import Simulation
... a tiny tax-benefit system with one ETERNITY bool variable `flag` ...
class Tiny(Dataset):
data_format = Dataset.ARRAYS
time_period = 2024 # any non-ETERNITY period
def generate(self):
# Write {"flag": [True, False], "person_id": [1, 2], ...}
...
sim = Simulation(dataset=Tiny, tax_benefit_system=tbs)
print(sim.calculate("flag")) # [True, False]
sim.apply_reform(some_reform) # triggers _invalidate_all_caches
print(sim.calculate("flag")) # [False, False] ← bug
```
Suggested fix directions
In `_user_input_keys` accounting (in `Holder.set_input` and `Simulation.set_input`), normalize the recorded `period` to `ETERNITY` whenever `variable.definition_period == ETERNITY`, so the preserve-loop's `f"{branch_name}:{period}"` lookup matches the canonicalized storage key. Same normalization in the preserve-loop's lookup would also work.
Impact
Any country package that loads ETERNITY variables from a `Dataset` whose `time_period != ETERNITY` and that subsequently calls `apply_reform` or `subsample` will silently see those inputs replaced with default values. In `policyengine-us-data`, this took down the CPS data build's rent imputation step.
Symptom
When a
Microsimulationis constructed from a dataset that supplies an ETERNITY-defined variable as input,calculate/calculate_dataframereturns the variable's default value for every row instead of the dataset-provided values, as soon as anything triggersSimulation._invalidate_all_caches(e.g.apply_reform,subsample).We hit this in
policyengine-us-dataafter upgrading topolicyengine-us 1.674.1/policyengine-core 3.25.3.is_household_head(a bool ETERNITY variable onPerson) came back all-Falsefromacs.calculate_dataframe([\"is_household_head\", ...]), even though the ACS H5 dataset has it correctly populated. Filtering byis_household_headthen yielded zero rows and downstream code crashed.Suspected mechanism
In
policyengine_core/simulations/simulation.py:Holder.set_inputrecords(variable_name, branch_name, period)intosimulation._user_input_keysusing the user-suppliedperiod(e.g.Period(2024)from a dataset'stime_period).Holder._setcanonicalizes storage toperiod=ETERNITY, so_memory_storage._arraysis keyed by\"default:eternity\"._invalidate_all_cachesthen iterates_user_input_keysand looks upf\"{branch_name}:{period}\"(\"default:2024\"), which misses the canonicalized\"default:eternity\"entry._arraysis wiped at the iteration overpopulation._holders, and subsequentcalculatecalls fall back to the variable'sdefault_value.Affected versions: introduced sometime in 3.24.x (the
_invalidate_all_caches+_user_input_keysmachinery is in 3.24.0/3.24.4/3.25.x; not present in 3.23.6).Minimal repro sketch
```python
import numpy as np
from policyengine_core.data import Dataset
from policyengine_core.simulations import Simulation
... a tiny tax-benefit system with one ETERNITY bool variable `flag` ...
class Tiny(Dataset):
data_format = Dataset.ARRAYS
time_period = 2024 # any non-ETERNITY period
def generate(self):
# Write {"flag": [True, False], "person_id": [1, 2], ...}
...
sim = Simulation(dataset=Tiny, tax_benefit_system=tbs)
print(sim.calculate("flag")) # [True, False]
sim.apply_reform(some_reform) # triggers _invalidate_all_caches
print(sim.calculate("flag")) # [False, False] ← bug
```
Suggested fix directions
In `_user_input_keys` accounting (in `Holder.set_input` and `Simulation.set_input`), normalize the recorded `period` to `ETERNITY` whenever `variable.definition_period == ETERNITY`, so the preserve-loop's `f"{branch_name}:{period}"` lookup matches the canonicalized storage key. Same normalization in the preserve-loop's lookup would also work.
Impact
Any country package that loads ETERNITY variables from a `Dataset` whose `time_period != ETERNITY` and that subsequently calls `apply_reform` or `subsample` will silently see those inputs replaced with default values. In `policyengine-us-data`, this took down the CPS data build's rent imputation step.