Initial Release
High-performance statistical testing and regression for https://pola.rs/ DataFrames, powered by Rust.
Highlights
- Native Polars Integration: Full support for group_by, over, and lazy evaluation
- Rust-Powered Performance: Zero-copy data transfer with SIMD-optimized linear algebra via https://github.com/sarah-ek/faer-rs
- Comprehensive API: 80+ functions covering statistical tests, regression models, and predictions
- R-Style Formula Syntax: Polynomial and interaction effects with per-group centering
Statistical Tests
Parametric Tests
- ttest_ind - Independent samples t-test
- ttest_paired - Paired samples t-test
- brown_forsythe - Brown-Forsythe test for equality of variances
- yuen_test - Yuen's trimmed mean test
Non-Parametric Tests
- mann_whitney_u - Mann-Whitney U test
- wilcoxon_signed_rank - Wilcoxon signed-rank test
- kruskal_wallis - Kruskal-Wallis H test
- brunner_munzel - Brunner-Munzel test
Distributional Tests
- shapiro_wilk - Shapiro-Wilk normality test
- dagostino - D'Agostino-Pearson normality test
Forecast Comparison Tests
- diebold_mariano - Diebold-Mariano test
- clark_west - Clark-West test
- spa_test - Superior Predictive Ability test
- model_confidence_set - Model Confidence Set
- mspe_adjusted - MSPE-adjusted test
- permutation_t_test - Permutation t-test
Modern Statistical Tests
- energy_distance - Energy distance test
- mmd_test - Maximum Mean Discrepancy test
Regression Models
Linear Models
| Model | Expression | Formula | Summary | Predict |
|---|---|---|---|---|
| OLS | ols | ols_formula | ols_summary | ols_predict |
| Ridge | ridge | ridge_formula | ridge_summary | ridge_predict |
| Elastic Net | elastic_net | elastic_net_formula | elastic_net_summary | elastic_net_predict |
| WLS | wls | wls_formula | wls_summary | wls_predict |
| RLS | rls | rls_formula | rls_summary | rls_predict |
| BLS | bls | bls_formula | bls_summary | bls_predict |
| NNLS | nnls | nnls_formula | - | nnls_predict |
Generalized Linear Models (GLM)
| Model | Expression | Formula | Summary | Predict |
|---|---|---|---|---|
| Logistic | logistic | logistic_formula | logistic_summary | logistic_predict |
| Poisson | poisson | poisson_formula | poisson_summary | poisson_predict |
| Negative Binomial | negative_binomial | negative_binomial_formula | negative_binomial_summary | negative_binomial_predict |
| Tweedie | tweedie | tweedie_formula | tweedie_summary | tweedie_predict |
| Probit | probit | probit_formula | probit_summary | probit_predict |
| Cloglog | cloglog | cloglog_formula | cloglog_summary | cloglog_predict |
Augmented Linear Model (ALM)
- 24+ error distributions including: Normal, Laplace, Student-t, Cauchy, Gamma, Exponential, Poisson, and more
- Access via alm, alm_formula, alm_summary, alm_predict
Model Classes
Direct model access outside Polars expressions:
Regression Models
OLS, Ridge, ElasticNet, WLS, RLS, BLS, Logistic, Poisson, NegativeBinomial, Tweedie, Probit, Cloglog, ALM
Statistical Test Models
TTestInd, TTestPaired, BrownForsythe, YuenTest, MannWhitneyU, WilcoxonSignedRank, KruskalWallis, BrunnerMunzel, ShapiroWilk, DAgostino
Bootstrap Methods
StationaryBootstrap, CircularBlockBootstrap
Features
Prediction with Confidence/Prediction Intervals
ps.ols_predict("y", "x1", "x2", interval="prediction", level=0.95)
Unique Prediction Column Names
Prediction outputs use model-type prefixes to avoid naming conflicts:
- Default: ols_prediction, ols_lower, ols_upper
- Custom: ps.ols_predict(..., name="model1") → model1_prediction, model1_lower, model1_upper
Tidy Coefficient Summaries
ps.ols_summary("y", "x1", "x2") # Returns: term, estimate, std_error, statistic, p_value
R-Style Formula Syntax
ps.ols_formula("y ~ x1 * x2") # Interactions
ps.ols_formula("y ~ poly(x, 2)") # Polynomials (centered per group)
ps.ols_formula("y ~ x1 + I(x^2)") # Transforms
Null Handling Policies
- null_policy="drop" - Drop rows with null values
- null_policy="drop_y_zero_x" - Drop nulls for fitting, predict for all rows
Installation
pip install polars-statistics
Requirements:
- Python 3.9+
- Polars 1.0.0+
Links