Skip to content

v0.5.0

Latest

Choose a tag to compare

@sipemu sipemu released this 28 May 14:27
· 2 commits to main since this release
2441ded

polars-statistics 0.5.0

Major expansion of the public API — adds a Rust-callable rlib path, three new model wrappers, and a full diagnostics toolkit. Wheel + sdist publish to PyPI automatically.

Highlights

Hybrid crate (Rust + Python) — issue #13

`polars-statistics` now builds as both a `cdylib` (the Python plugin) and an `rlib` (a Rust dependency). Every Polars expression has a public `_fit` Rust entry point in `polars_statistics::expressions` that downstream Rust crates can call directly — no Python boundary, no FFI overhead. See the new "Use from Rust" section in the README and `examples/rust_wls.rs`.

```toml
[dependencies]
polars-statistics = { version = "0.5", default-features = false }
```

New model wrappers

  • `Huber` (M-estimator regression, robust to outliers) — class + `huber()` expression. (#14)
  • `LogisticRegression` — sklearn-style API with `predict_proba`, `decision_function`, `score`, `penalty="l2"`, `C` kwarg. Complementary to the existing `Logistic` wrapper. (#14)
  • `PLS` (Partial Least Squares) — class + `pls()` expression, plus `transform()` for the latent space. (#19)

GLM improvements

  • Penalized IRLS `lambda_` kwarg on `PyLogistic`, `PyPoisson`, `PyNegativeBinomial`, `PyTweedie`, `PyProbit`, `PyCloglog` (the expression layer already supported it). (#15)
  • ALM expression parity with `PyALM` — all 25 distributions now reachable from `ps.alm(...)`, plus the `loss`, `link`, `role_trim` and `extra_parameter` kwargs. (#16)

Summary / predict completeness — #18

Added `_summary` / `_predict` expressions for the regression families that previously only had a base fit:

  • `quantile_summary`, `quantile_predict`
  • `isotonic_predict`
  • `lm_dynamic_predict`

Diagnostics toolkit — #17 + #27

A full diagnostics suite wrapped as Polars expressions. Every diagnostic also has a public `*_fit` Rust entry point.

Multicollinearity

  • `vif`, `generalized_vif`, `high_vif_predictors`

OLS residual battery

  • `standardized_residuals`, `studentized_residuals`, `externally_studentized_residuals`, `residual_outliers`

GLM residuals (logistic + Poisson)

  • `_pearson_residuals`, `_deviance_residuals`, `*_working_residuals` for each family

Influence / leverage

  • `leverage`, `cooks_distance`, `dffits`, `influential_cooks`, `influential_dffits`, `high_leverage_points`

Goodness of fit

  • `pearson_chi_squared_logistic`, `pearson_chi_squared_poisson`

Dependency bumps

  • `anofox-regression` 0.5.2 → 0.5.4 (introduces `HuberRegressor` and the new sklearn-style `LogisticRegression`)
  • `anofox-statistics` 0.4.0 → 0.4.1

Test counts

The bundled test suites grew from 366 to 457 Python tests + 12 Rust integration tests, covering every new expression on both API paths.

Backwards compatibility

Additive. `PyLogistic` (the `BinomialRegressor` wrapper) is unchanged — the new sklearn-style `LogisticRegression` is a separate class. The `alm()` expression's input contract grew but all existing keyword-only callers continue to work because the new kwargs default to None / "likelihood".

Full PR list

#20, #21, #22, #23, #24, #25, #26, #28, #29, #30, #31, #32, #33.