Skip to content

Add a DataFrame front door to populace-fit with explicit-weights contract#290

Open
MaxGhenis wants to merge 1 commit into
mainfrom
fit-dataframe-front-door
Open

Add a DataFrame front door to populace-fit with explicit-weights contract#290
MaxGhenis wants to merge 1 commit into
mainfrom
fit-dataframe-front-door

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

What

RegimeGatedQRF.fit (and the ConditionalModel protocol, plus the populace.fit.fit convenience) now accept a plain pandas DataFrame as well as a Frame, mirroring the union predict already takes. This is the standalone front door for using the canonical imputer outside a populace stack.

The weight contract

A bare DataFrame has no typed weights, so the operator's defining rule — no silent unweighted fit — cannot ride on the "design" default. On the DataFrame path the rule inverts from safe default to no default:

  • weights=<column name> — a numeric weight column of the DataFrame (refused if it is also a predictor or target),
  • weights=<1-D vector> — array/Series read positionally, validated (length, finiteness, non-negativity, positive mass),
  • weights="none" — the only unweighted path, stated deliberately.

Omitting weights (or passing a typed-kind spelling like "design") raises with an actionable message. This makes the unweighted-training failure mode that produced the eCPS point-mass "landmines" unrepresentable at the API: you cannot forget weights, you can only decline them in writing.

resolve_dataframe_fit_weights / dataframe_fit_columns live in model.py beside resolve_fit_weights / predictors_targets_entity, keeping the weight rule enforced in one module for both front doors.

Behavior

  • Past weight resolution the two paths are the same model. A parity test pins this: same rows + same weights + same seed ⇒ bit-identical draws from a Frame fit and a DataFrame fit, both weighted and "none".
  • A DataFrame-fitted model has entity=None; predict(Frame) on it raises with guidance (predicting for DataFrames works as before, index preserved).
  • Frame path is unchanged (fit's first parameter is renamed frameframe_or_df; the repo has no keyword callers).

Tests

18 new tests in test_dataframe_fit.py: the explicit-weights requirement, typed-kind refusal, column/vector/Series equivalence, Frame↔DataFrame parity, "none" reservedness, weight and column validation matrix, entity-less predict behavior, and the convenience wrapper. Full workspace suite passes locally (3.14).

Why now

First of three steps making populace-fit reusable outside populace (per the imputation-paper plan): DataFrame front door → PyPI publication of populace-frame/populace-fit → standalone quickstart. The paper's software section will document this API as the external-use path.

🤖 Generated with Claude Code

…ract

RegimeGatedQRF.fit (and the ConditionalModel protocol) now accept a plain
pandas DataFrame as well as a Frame, for standalone use outside a populace
stack. A bare DataFrame has no typed weights, so the operator's no-silent-
unweighted-fit rule inverts from "safe default" to "no default": the caller
must state weights explicitly — a weight column name, a 1-D weight vector,
or weights="none" — and omitting them raises instead of silently fitting
unweighted. Past weight resolution the two paths are the same model, pinned
by a bit-for-bit parity test against the Frame path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant