StatOxide is a modern, high-performance statistical computing library written in Rust, with comprehensive Python bindings via PyO3. Designed for data scientists, statisticians, and researchers who need both performance and productivity.
- Series: Columnar data with metadata (name, dtype, levels)
- DataFrame: Tabular data structure with column operations
- Formula: R-style formula parsing for model specification
- Descriptive Statistics: Mean, variance, skewness, kurtosis, quantiles
- Probability Distributions: 12 continuous + 6 discrete distributions
- Statistical Tests: t-test, chi-square, ANOVA, correlation tests
- Correlation Measures: Pearson, Spearman, Kendall tau
- Linear Models: OLS, Ridge, Lasso, Elastic Net with proper inference
- Generalized Linear Models: Logistic, Poisson, Gamma, Negative Binomial regression
- Mixed Effects Models: Linear and GLMMs with EM algorithm estimation
- Robust Statistics: M-estimators, S-estimators, MM-estimators
- Nonparametric Methods: Kernel regression, local regression, smoothing splines
- Core Structures:
TimeSerieswith datetime indexing - ARIMA Models: AR, MA, ARMA, ARIMA, SARIMA
- GARCH Models: ARCH, GARCH for volatility modeling
- Decomposition: STL, moving averages, Hodrick-Prescott filter
- Forecasting: Point forecasts, prediction intervals
- Linear Algebra: Matrix operations, solvers, decompositions
- Random Generation: Distributions, bootstrap, train-test split
- Data Validation: Type checking, missing value detection
- Numerical Methods: Softmax, standardization, normalization
StatOxide provides a complete Python interface through PyO3 bindings:
import statoxide
import statoxide.core as soc
import statoxide.stats as sos
# Core data structures
df = soc.DataFrame({
"x": [1.0, 2.0, 3.0, 4.0, 5.0],
"y": [2.0, 4.0, 5.0, 4.0, 5.0]
})
series = df.get_column("x")
print(f"Mean of x: {series.mean():.2f}")
print(f"Std of x: {series.std(1.0):.2f}")
# Statistical functions
print(f"Correlation: {sos.correlation(df.get_column('x').to_list(),
df.get_column('y').to_list()):.3f}")
summary = sos.descriptive_summary([1.0, 2.0, 3.0, 4.0, 5.0])
print(f"Summary: {summary}")
# Formula parsing
formula = soc.Formula("y ~ x + x^2")
print(f"Formula variables: {formula.variables()}")
# Models
import statoxide.models as som
result = som.linear_regression([[1, 1], [1, 2], [1, 3]], [5, 8, 11])
print(f"Regression coefficients: {result['coefficients']}")
# Mixed effects models
mixed_results = som.mixed_effects("y ~ x + (1 | group)", data)
print(f"Random effect variance: {mixed_results.random_variances}")
# Time series
import statoxide.tsa as sot
arima_result = sot.fit_arima([1.0, 2.0, 3.0, 4.0, 5.0], 1, 0, 1)
print(f"ARIMA AIC: {arima_result['aic']}")
# Utilities
import statoxide.utils as sou
train, test = sou.train_test_split([1.0, 2.0, 3.0, 4.0, 5.0], 0.2)
print(f"Train: {train}, Test: {test}")StatOxide is organized as a multi-crate Rust workspace:
statoxide/
โโโ Cargo.toml # Workspace configuration
โโโ crates/
โ โโโ so-core/ # Core data structures & formula parsing
โ โโโ so-linalg/ # Linear algebra abstraction
โ โโโ so-stats/ # Statistical functions & distributions
โ โโโ so-models/ # Statistical models (regression, GLM, mixed effects, etc.)
โ โโโ so-tsa/ # Time series analysis
โ โโโ so-utils/ # Utility functions
โ โโโ so-python/ # Python bindings (PyO3)
โโโ assets/logo.png # Project logo
โโโ LICENSE-MIT # MIT license
โโโ LICENSE-APACHE-2.0 # Apache 2.0 license
- Rust Toolchain:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh - Python Development Files:
- Ubuntu/Debian:
sudo apt-get install python3-dev python3.11-dev - macOS:
brew install python@3.11
- Ubuntu/Debian:
- Maturin (recommended):
pip install maturin
# Clone the repository
git clone https://github.com/EthanNOV56/StatOxide.git
cd StatOxide
# Build Python bindings with maturin
cd crates/so-python
maturin develop # Editable install for development
# or
maturin build --release # Build wheel for distributioncd /path/to/statoxide
export PYO3_PYTHON=python3.11
cargo build --release --package so-pythonThe shared library will be at target/release/libso_python.so.
cargo test --allAfter installation:
python -c "import statoxide; print(statoxide.version())"
python crates/so-python/test_api.py # API demonstration- API Reference: Run
cargo doc --all --no-deps --openfor Rust documentation - Python Docstrings: All Python functions include detailed docstrings
- Examples: See
crates/so-python/test_api.pyfor usage examples
- Performance: Leverage Rust's zero-cost abstractions and LLVM optimizations
- Safety: Memory safety guarantees without garbage collection
- Interoperability: Seamless Python integration with minimal overhead
- Modularity: Independent crates for clear separation of concerns
- API Consistency: Familiar interfaces inspired by R, pandas, and statsmodels
| Module | Status | Notes |
|---|---|---|
| so-core | โ Complete | Data structures, formula parsing |
| so-linalg | โ Complete | Linear algebra abstraction |
| so-stats | โ Complete | Statistical functions & distributions |
| so-models | โ Complete | Regression, GLM, mixed effects, robust, nonparametric |
| so-tsa | โ Complete | ARIMA, GARCH, decomposition, forecasting |
| so-utils | โ Complete | Random generation, validation, numerical methods |
| so-python | โ Complete | Full Python bindings implemented |
StatOxide is dual-licensed under both:
- MIT License: See LICENSE-MIT for details
- Apache License 2.0: See LICENSE-APACHE-2.0 for details
You may use StatOxide under either license at your option.
- R and statsmodels for statistical API inspiration
- pandas for DataFrame design patterns
- PyO3 team for excellent Rust-Python interop
- ndarray and faer for numerical computing foundations
Contributions are welcome!
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
cargo test --all - Submit a pull request
- Issues: GitHub Issues
- Repository: GitHub Repository
High-performance statistics meets Python productivity
