polars-statistics 0.5.0
Major expansion of the public API — adds a Rust-callable rlib path, three new model wrappers, and a full diagnostics toolkit. Wheel + sdist publish to PyPI automatically.
Highlights
Hybrid crate (Rust + Python) — issue #13
`polars-statistics` now builds as both a `cdylib` (the Python plugin) and an `rlib` (a Rust dependency). Every Polars expression has a public `_fit` Rust entry point in `polars_statistics::expressions` that downstream Rust crates can call directly — no Python boundary, no FFI overhead. See the new "Use from Rust" section in the README and `examples/rust_wls.rs`.
```toml
[dependencies]
polars-statistics = { version = "0.5", default-features = false }
```
New model wrappers
- `Huber` (M-estimator regression, robust to outliers) — class + `huber()` expression. (#14)
- `LogisticRegression` — sklearn-style API with `predict_proba`, `decision_function`, `score`, `penalty="l2"`, `C` kwarg. Complementary to the existing `Logistic` wrapper. (#14)
- `PLS` (Partial Least Squares) — class + `pls()` expression, plus `transform()` for the latent space. (#19)
GLM improvements
- Penalized IRLS `lambda_` kwarg on `PyLogistic`, `PyPoisson`, `PyNegativeBinomial`, `PyTweedie`, `PyProbit`, `PyCloglog` (the expression layer already supported it). (#15)
- ALM expression parity with `PyALM` — all 25 distributions now reachable from `ps.alm(...)`, plus the `loss`, `link`, `role_trim` and `extra_parameter` kwargs. (#16)
Summary / predict completeness — #18
Added `_summary` / `_predict` expressions for the regression families that previously only had a base fit:
- `quantile_summary`, `quantile_predict`
- `isotonic_predict`
- `lm_dynamic_predict`
Diagnostics toolkit — #17 + #27
A full diagnostics suite wrapped as Polars expressions. Every diagnostic also has a public `*_fit` Rust entry point.
Multicollinearity
- `vif`, `generalized_vif`, `high_vif_predictors`
OLS residual battery
- `standardized_residuals`, `studentized_residuals`, `externally_studentized_residuals`, `residual_outliers`
GLM residuals (logistic + Poisson)
- `_pearson_residuals`, `_deviance_residuals`, `*_working_residuals` for each family
Influence / leverage
- `leverage`, `cooks_distance`, `dffits`, `influential_cooks`, `influential_dffits`, `high_leverage_points`
Goodness of fit
- `pearson_chi_squared_logistic`, `pearson_chi_squared_poisson`
Dependency bumps
- `anofox-regression` 0.5.2 → 0.5.4 (introduces `HuberRegressor` and the new sklearn-style `LogisticRegression`)
- `anofox-statistics` 0.4.0 → 0.4.1
Test counts
The bundled test suites grew from 366 to 457 Python tests + 12 Rust integration tests, covering every new expression on both API paths.
Backwards compatibility
Additive. `PyLogistic` (the `BinomialRegressor` wrapper) is unchanged — the new sklearn-style `LogisticRegression` is a separate class. The `alm()` expression's input contract grew but all existing keyword-only callers continue to work because the new kwargs default to None / "likelihood".
Full PR list
#20, #21, #22, #23, #24, #25, #26, #28, #29, #30, #31, #32, #33.