Skip to content

Pandas UI for initial conditions and Markov transition probabilities.#272

Merged
hmgaudecker merged 96 commits intomainfrom
pandas-ui
Mar 19, 2026
Merged

Pandas UI for initial conditions and Markov transition probabilities.#272
hmgaudecker merged 96 commits intomainfrom
pandas-ui

Conversation

@hmgaudecker
Copy link
Copy Markdown
Member

@hmgaudecker hmgaudecker commented Mar 8, 2026

Fix #242.
Fix #259.

Summary

Add pandas interoperability utilities and simplify the simulation API so users work
with DataFrames and a single initial_conditions dict instead of raw JAX arrays.

Public API changes

New module (lcm.pandas_utils, re-exported from lcm):

  • initial_conditions_from_dataframe(df, model=) — convert a DataFrame with a
    "regime" column to the initial_conditions dict expected by simulate() and
    solve_and_simulate(). Discrete state labels and the "regime" column are
    mapped to integer codes.
  • transition_probs_from_series(series, model=, regime_name=, ...)
    build transition probability arrays from a pandas Series with a named MultiIndex.
    Works for both state and regime transitions (inferred from next_<x>; automatically
    reorders index levels to match the function signature.
    Uses "age" (not "period") as the time-dimension level name.
  • validate_transition_probs(probs_array, ...) — validate shape, bounds, and row-sums
    of transition probability arrays.

Simulation API simplification:

  • model.simulate() and model.solve_and_simulate() now take a single
    initial_conditions: Mapping[str, Array] instead of separate initial_states +
    initial_regimes. The dict includes all state arrays plus "regime_id" (integer
    codes from model.regime_names_to_ids).
  • Internal simulate() and validate_initial_conditions() updated accordingly.

@categorical now requires ordered=True/False — breaking change to the decorator
syntax (@categorical@categorical(ordered=False)). The flag flows through to
pd.CategoricalDtype in simulation output, so ordered categoricals (e.g. education
levels) render correctly in pandas. Gives us a 1:1 mapping with Pandas categorical
dtypes.

Internal changes

  • AST-based probs_array validation (error_handling.py): At model init, inspects
    transition function source to verify probs_array[...] indexing matches the parameter
    declaration order. Warns when the pattern is non-trivial (computed indices, multiple
    subscripts).
  • Ordered category merging (regime_processing.py): _merge_ordered_categories()
    validates consistent ordering across regimes and merges category sets via topological
    sort.
  • Terminal-period validation (Raise a meaningful error if a non-terminal regime is active in the terminal period. #277): Non-terminal regimes active in the terminal
    period now raise at model init.
  • Simulation cleanup: Removed unused next_age parameter; moved regime transition
    probability validation from simulation to model init.
  • Shared test models: tests/test_models/basic_discrete.py (discrete + continuous
    states, no stochastic transitions) and tests/test_models/regime_markov.py
    (MarkovTransition on regime transitions).

Documentation

  • New user guide: Working with DataFrames and Series (pandas_interop.md)
  • New explanation notebook: Stochastic Transitions (stochastic_transitions.ipynb)
  • Updated Solving and Simulating: initial_conditions is now the primary API
  • Updated tiny example and all example docs to use initial_conditions

Infrastructure

  • Type-checking environment split: ty now runs in its own pixi environment
  • AI instructions consolidated into .ai-instructions submodule

hmgaudecker and others added 30 commits March 4, 2026 06:35
The compute_gridpoints / compute_transition_probs methods and _mixture_cdf
accept scalar distribution params, not arrays. Narrow annotations to float,
make _mixture_cdf params keyword-only, and add assert isinstance(val, float)
in interfaces.py to narrow the type from all_params. Fix FGP page references
and use DECIMAL_PRECISION in test assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove 4 ty:ignore comments in ages.py via assert narrowing
- Fix greedy next_ prefix removal in simulation/utils.py
- Set tmp=None after atomic replace in result.py
- Fix error message formatting in model.py
- Replace stale bug-claiming docstrings with regression guard refs (#230)
- Convert rST to MyST in grids.py docstring
- Remove stale docstring param in simulation/utils.py
- Add PD011 to docs per-file-ignores with corrected comment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ith dags.tree utilities

Convert Model class Attributes section to PEP 257 inline field docstrings.
Replace f-string concatenation and .split() on QNAME_DELIMITER with
qname_from_tree_path / tree_path_from_qname across 7 modules.

Closes #254.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename Model.params_template to Model._params_template so users interact
with the mutable dict returned by get_params_template() instead of an
opaque MappingProxyType. Update docs, tests, and CLAUDE.md accordingly.
Allow SLF001 in test per-file-ignores for legitimate private access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`precise_values` was confusing — unclear how it differs from `values`.
`exact_values` makes the distinction clear: `values` are JAX floats (lossy),
`exact_values` are `int | Fraction` (lossless). Also renames
`precise_step_size` → `exact_step_size`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s, focusing-on-the-economics guide

- B3: Fix regime transitions text in regimes.ipynb (states and actions, not just state)
- B4: Note alternative aggregation functions (e.g. Epstein–Zin) in regimes.ipynb
- B5a: Add function-composition explanation to tiny_example.ipynb
- B5b: New user guide page "Focusing on the Economics" with DAG explanation
- Port regimes.ipynb from removed branch, updated for current API
  (no RegimeTransition/MarkovRegimeTransition wrappers, exact_values, get_params_template)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docs:
- Rename "Focusing on the Economics" → "Write Economics, Not Glue Code" and
  move it first in the user guide
- Add explanatory cells between code blocks in function-composition notebook
- Include borrowing_constraint in the first example; show dict assembly with
  utility_working → "utility" rename
- Use AgeGrid(start=60, stop=85, step="5Y") instead of N_PERIODS
- Fix mermaid dark mode with neutral theme
- Remove stale #262 per-boundary transition language from regimes notebook
- Use t/t+1 notation instead of prime in H function
- Normalize params_workflow.ipynb cell sources to list-of-strings format

Code:
- Rename key → qname, parts/path → tree_path in params_processing.py and
  model.py for consistency with dags terminology
- Simplify AgeGrid overloads: omit unused params instead of None=None
- Restructure AgeGrid.__init__ to avoid assert narrowing
- Replace assert with cast in interfaces.py
- Render params template types as short strings (float not <class 'float'>)
- Update MutableParamsTemplate to use str values
- Use "compatible with" instead of "matching" for params docstrings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add real content to 6 User Guide pages (defining_models, grids, shocks,
parameters, solving_and_simulating) and installation.md. Remove empty
dispatchers.ipynb from explanations index. Update user_guide/index.md
and getting_started/index.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rework write_economics.ipynb with DAG-first didactical approach: show the
math, the dependency graph, traditional code, then pylcm. Trim regimes.ipynb
to focus on regime-specific topics (terminal/non-terminal, stochastic
transitions, active predicates). Both updated to current API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add annotated consumption-saving example walkthrough, development guide
with setup/testing/conventions, and examples index. Add cross-links
between User Guide pages and examples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create explanations/dispatchers.ipynb explaining productmap, vmap_1d, and
simulation_spacemap with worked examples and plotly visualizations. Restyle
function_representation.ipynb: plotly plots, simplified model setup, tighter
prose. Add dispatchers to TOC.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

Enable per-target dict transitions to carry independent parameters per target
regime (to_{target}_next_{state} template keys instead of collapsing under a
single base name), and allow state_transitions to reference states that only
exist in target regimes.

Key changes:
- Relax validation to allow extra keys in state_transitions (except None)
- Add second pass in _collect_state_transitions for target-only states
- Use to_ prefix in params template for per-target transitions
- Pass target regime name (not source) to weight function builders in Q_and_F
- Extend markov detection to cover target-only stochastic states

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Raise ModelInitializationError when a single (non-per-target-dict) transition
is used for a DiscreteGrid state whose categories differ across regimes. This
prevents JAX from silently clipping out-of-bounds indices and producing wrong
continuation values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename "working" → "working_life" and "retired" → "retirement" for regime
names, "WorkingStatus" → "LaborSupply" and "labor_supply" → "work" for
action names. Update all docs, examples, test models, and test files.
Regenerate regression and shock test data pickles with new names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate the two near-identical consumption-savings test models into one
configurable module with support for Normal GH, Rouwenhorst, and Tauchen
shocks plus configurable wealth grid type and interest rate. Add FGP (RED
2019) reference to docstring. Add Tauchen to economic validation tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hmgaudecker and others added 13 commits March 12, 2026 10:52
Scope state grids to the current regime instead of iterating all model
regimes, since a regime's transition function can only reference its own
states/actions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rminal period. (#277)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Catches silent bugs where probs_array[health, period] doesn't match the
function signature order (period, health). Warns when indexing uses computed
expressions or multiple branches that can't be validated automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… target regime-specific transitions in transition_probs_from_series and validation.
@hmgaudecker hmgaudecker marked this pull request as ready for review March 15, 2026 15:37
@hmgaudecker
Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. _collect_all_state_names includes shock states despite docstring claiming otherwise. The function does state_names.update(regime.states.keys()) without filtering out _ShockGrid instances. For any model with shock states (Normal, Uniform, Tauchen, Rouwenhorst), initial_states_from_dataframe will incorrectly require those shock columns in the DataFrame — but shock states are drawn automatically and should never be user-supplied. The error message at line 78 even says "All non-shock states must be provided", confirming the intent. The fix is to filter with isinstance(grid, _ShockGrid) as done elsewhere in the codebase (e.g. regime.py:346).

def _collect_all_state_names(
regimes: Mapping[str, Regime],
initial_regimes: list[str],
) -> set[str]:
"""Collect all non-shock state names from regimes present in initial_regimes."""
state_names: set[str] = set()
for regime_name in set(initial_regimes):
regime = regimes[regime_name]
state_names.update(regime.states.keys())
# Always include age
state_names.add("age")
return state_names

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

  1. Shock states excluded via isinstance(grid, _ShockGrid) filter
  2. Section separators removed from tests/test_grids.py
  3. Stale docstrings updated in result.py and simulate.py
  4. Test model _make_regime_markov_model fixed: active=lambda age: age < 62
  5. Docstring Dict → dict in pandas_utils.py
@hmgaudecker hmgaudecker requested a review from timmens March 15, 2026 20:15
Comment thread src/lcm/simulation/validation.py Outdated
Copy link
Copy Markdown
Member

@timmens timmens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice! No showstoppers, only a few comments on the interface.

Comment on lines +19 to +33
from lcm import initial_conditions_from_dataframe

df = pd.DataFrame({
"regime": ["working", "working", "retired"],
"wealth": [10.0, 50.0, 30.0],
"health": ["good", "bad", "good"],
"age": [25.0, 25.0, 25.0],
})

initial_conditions = initial_conditions_from_dataframe(df, model=model)

result = model.solve_and_simulate(
params=params,
initial_conditions=initial_conditions,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also allow model.solve_and_simulate to accept a pandas.DataFrame as the initial conditions. And if it is a dataframe we call initial_conditions_from_dataframe?

The user would get a very similar error, and it would probably feel even more natural?

Alternatively, this could be made a model method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot, a comment of mine here did not make it through yesterday, it seems. Sorry about that.

We went down that rout of allowing all kinds of different inputs in ttsim and it is a pain. While I agree that would be nice, I really want to keep the interface so that it accepts exactly one thing here, code maintenance is hard enough. We can expect users to be well-versed enough to run the pre-processing steps themselves.

Comment thread src/lcm/simulation/validation.py Outdated
Comment thread src/lcm_examples/mahler_yum_2024/_model.py Outdated
Comment thread src/lcm_examples/mahler_yum_2024/_model.py Outdated
Comment thread tests/test_models/deterministic/discrete.py
Comment thread src/lcm/pandas_utils.py
) -> Array: ...


def transition_probs_from_series(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this a method on Model and inferring state_name from the Series' MultiIndex — the "next_" prefix on the outcome level already identifies the state (or "next_regime" for regime transitions). That would reduce the signature to:

model.transition_probs_from_series(series, regime_name="working")

if I am not mistaken.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above for the model-method part. I am investigating the interface change right now!

@hmgaudecker hmgaudecker merged commit b4fb1fc into main Mar 19, 2026
9 checks passed
@hmgaudecker hmgaudecker deleted the pandas-ui branch March 19, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: DiscreteMarkovGrid transition as pd.DataFrame ENH: Add a .to_pd_categorical_dtype() to dataclasses decorated with @categorical

3 participants