Run statistical analyses in seconds — no license fees, no GUI required.
OLS, logit, survival analysis, panel data, and 260+ more commands, right from your terminal.
Install • Demo • Why statika? • Commands • Examples • Contributing
Note: statika is an independent, community-driven open-source project. It is not affiliated with, endorsed by, or connected to StataCorp LLC or any commercial statistical software vendor.
pip install statika
statika replThat is all. No virtual environment required, no license server, no installer wizard.
pip install "statika[excel]" # Excel (.xlsx) import/export
pip install "statika[stata]" # Stata .dta import/export
pip install "statika[survival]" # Survival analysis (lifelines)
pip install "statika[all]" # Everything above$ statika repl
statika v1.0.0 — Open-source statistical analysis tool
Type help for commands, quit to exit.
statika> load examples/data.csv
Loaded 50 rows x 7 columns from examples/data.csv
statika> summarize age income score
┌──────────┬────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬─────────┐
│ Variable │ N │ Mean │ SD │ Min │ P25 │ P50 │ P75 │ Max │
├──────────┼────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────┤
│ age │ 50 │ 34.6600 │ 8.7634 │ 21.00 │ 27.2500 │ 34.0000 │ 42.5000 │ 53.0000 │
│ income │ 50 │ 49840.0 │ 17547.2 │ 26000 │ 34000.0 │ 47000.0 │ 66000.0 │ 88000.0 │
│ score │ 50 │ 7.4280 │ 1.2844 │ 4.90 │ 6.4750 │ 7.5000 │ 8.5500 │ 9.4000 │
└──────────┴────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴─────────┘
statika> ols score ~ age + income --robust
┌──────────┬────────┬─────────┬───────┬────────┬────────────┬─────────────┐
│ Variable │ Coef │ Std.Err │ t/z │ P>|t| │ [95% CI L] │ [95% CI H] │
├──────────┼────────┼─────────┼───────┼────────┼────────────┼─────────────┤
│ _cons │ 2.1435 │ 0.4521 │ 4.741 │ 0.0000 │ 1.2343 │ 3.0527 │
│ age │ 0.0312 │ 0.0187 │ 1.668 │ 0.1018 │ -0.0066 │ 0.0690 │
│ income │ 0.0001 │ 0.0000 │ 5.234 │ 0.0000 │ 0.0000 │ 0.0001 │
└──────────┴────────┴─────────┴───────┴────────┴────────────┴─────────────┘
N = 50 | R² = 0.5481 | Adj.R² = 0.5289 | F(2, 47) = 28.52 (p=0.0000)
statika> margins
Average marginal effects computed.
statika> estimates table
Model comparison table generated.
statika> quit
Bye!
statika run analysis.ost # Run an .ost script
statika run analysis.ost --strict # Stop on first error (useful in CI)| Feature | Stata | R | SPSS | statika |
|---|---|---|---|---|
| Price | $595/yr | Free | $99/mo | Free |
| Familiar CLI syntax | Yes | No | No | Yes |
| Scripting | Yes | Yes | No | Yes |
| Python ecosystem | No | No | No | Yes |
| No eval / safe DSL | — | — | — | Yes |
| Interactive REPL | No | Partial | No | Yes |
| Polars backend | No | No | No | Yes |
statika is designed for researchers and data scientists who want the muscle memory of a CLI workflow without paying for it, and who want scripted, reproducible analyses that fit into version-controlled projects.
statika distinguishes between a stable core and experimental modules:
- Stable core: data loading, transformation, descriptive statistics, core regression models, hypothesis tests, plotting, reporting, scripting.
- Experimental: panel data, survival analysis, survey-weighted estimation, SEM, network analysis, spatial statistics, and advanced ML commands.
Help and tab completion default to stable commands. To inspect the experimental surface:
statika> help --list --experimental
statika> load survey.csv
statika> describe
statika> summarize age income education
statika> tabulate region
statika> crosstab gender employed
statika> corr age income score
statika> load data.csv
statika> ols income ~ age + education + experience --robust
statika> predict yhat
statika> residuals resid
statika> vif
statika> estat all
statika> latex results/model.tex
statika> logit employed ~ age + income + education
statika> margins
statika> margins --at=means
statika> ols employed ~ age + income + education
statika> estimates table
statika> groupby region summarize mean(income) sd(income) count()
statika> ttest income by employed
statika> anova score by region
statika> chi2 region employed
Create analysis.ost:
# analysis.ost — reproducible wage regression
load data/wages.csv
describe
summarize wage age education experience
derive log_wage = log(wage)
encode region as region_code
ols log_wage ~ age + education + experience --robust
predict yhat
residuals resid
vif
estat all
bootstrap n=1000 ci=95
latex outputs/wage_table.tex
report outputs/wage_report.md
save outputs/wages_modeled.parquetRun it:
statika run analysis.ost --strictData Management (8 commands)
| Command | Description | Example |
|---|---|---|
load <path> |
Load CSV, Parquet, Stata (.dta), Excel (.xlsx) | load survey.csv |
save <path> |
Save data to any supported format | save results.parquet |
describe |
Show dataset structure (types, nulls) | describe |
head [N] |
Show first N rows (default: 10) | head 20 |
tail [N] |
Show last N rows | tail 5 |
count |
Row and column count | count |
merge <path> on <key> [how=...] |
Join with another file | merge scores.csv on id how=left |
undo |
Undo last data change (multi-level) | undo |
Data Transformation (18 commands)
| Command | Description | Example |
|---|---|---|
filter <expr> |
Filter rows with expressions | filter age > 30 and income < 50000 |
select <cols> |
Keep specific columns | select age income score |
derive <col> = <expr> |
Create new variables | derive bmi = weight / (height ** 2) |
dropna [cols] |
Drop missing values | dropna age income |
fillna <col> <strategy> |
Fill missing values | fillna income median |
sort <col> [--desc] |
Sort dataset | sort income --desc |
rename <old> <new> |
Rename a column | rename income salary |
cast <col> <type> |
Cast column type | cast age float |
encode <col> [as <new>] |
Label-encode strings | encode region as region_code |
recode <col> old=new ... |
Recode values | recode region North=N South=S |
replace <col> <old> <new> |
Replace values | replace region North Norte |
sample <N|N%> |
Random sample | sample 100 or sample 10% |
duplicates [drop] [cols] |
Find or drop duplicates | duplicates drop |
unique <col> |
List unique values | unique region |
lag <col> [N] |
Lag variable (shift down) | lag price 2 |
lead <col> [N] |
Lead variable (shift up) | lead price |
pivot <val> by <col> |
Reshape to wide format | pivot score by subject over name |
melt <ids>, <vals> |
Reshape to long format | melt name, math eng |
Descriptive Statistics (5 commands)
| Command | Description | Example |
|---|---|---|
summarize [cols] |
Summary statistics (N, Mean, SD, quartiles) | summarize age income |
tabulate <col> |
Frequency table (top 50 values) | tabulate education |
crosstab <row> <col> |
Two-way contingency table with row percentages | crosstab gender status |
corr [cols] |
Pearson correlation matrix | corr age income score |
groupby <cols> summarize <aggs> |
Group-by with aggregations | groupby region summarize mean(income) count() |
Statistical Models (6 commands)
| Command | Description | Example |
|---|---|---|
ols y ~ x1 + x2 |
OLS linear regression | ols score ~ age + income --robust |
logit y ~ x1 + x2 |
Logistic regression (binary) | logit employed ~ age + income |
probit y ~ x1 + x2 |
Probit regression (binary) | probit employed ~ age + income |
poisson y ~ x1 + x2 |
Poisson regression (counts) | poisson visits ~ age --exposure=time |
negbin y ~ x1 + x2 |
Negative Binomial (overdispersed) | negbin claims ~ age + gender |
quantreg y ~ x1 + x2 |
Quantile regression | quantreg wage ~ edu + exp tau=0.9 |
All models support:
--robust— heteroscedasticity-robust standard errors (HC1)--cluster=col— cluster-robust standard errors--weight=col— frequency/analytic weights
Formula syntax:
y ~ x1 + x2— standard predictorsy ~ x1*x2— full factorial (expands tox1 + x2 + x1:x2)y ~ x1:x2— interaction term onlyy ~ x1*x2*x3— three-way interaction
Post-Estimation (9 commands)
| Command | Description | Example |
|---|---|---|
predict [name] |
Predicted values from last model | predict yhat |
residuals [name] |
Residuals + diagnostic plots | residuals resid |
vif |
Variance Inflation Factor | vif |
margins [--at=means|average] |
Marginal effects (dy/dx) | margins --at=average |
bootstrap [n=N] [ci=N] |
Bootstrap confidence intervals | bootstrap n=1000 ci=95 |
estat <sub> |
Post-estimation diagnostics | estat all |
estimates table |
Side-by-side model comparison | estimates table |
stepwise y ~ x1 + ... |
Stepwise variable selection | stepwise y ~ x1 + x2 --backward |
latex [path.tex] |
Export model as LaTeX table | latex results.tex |
estat subcommands: hettest, ovtest, linktest, ic, all
Hypothesis Tests (5 commands)
| Command | Description | Example |
|---|---|---|
ttest <col> |
One-sample t-test | ttest score mu=7 |
ttest <col> by <group> |
Two-sample Welch t-test | ttest income by employed |
ttest <col> paired <col2> |
Paired t-test | ttest before paired after |
chi2 <col1> <col2> |
Chi-square independence test | chi2 region employed |
anova <col> by <group> |
One-way ANOVA (F-test) | anova score by region |
Visualization (7 commands)
| Command | Description | Example |
|---|---|---|
plot hist <col> |
Histogram | plot hist age |
plot scatter <y> <x> |
Scatter plot | plot scatter score income |
plot line <y> <x> |
Line plot | plot line score age |
plot box <col> [by <g>] |
Box plot (optionally grouped) | plot box income by region |
plot bar <col> [by <g>] |
Bar chart | plot bar income by region |
plot heatmap [cols] |
Correlation heatmap | plot heatmap age income score |
plot diagnostics |
Residual diagnostic plots | plot diagnostics |
Reporting and Utilities (4 commands)
| Command | Description | Example |
|---|---|---|
report <path> |
Generate Markdown report | report analysis.md |
help [cmd] |
Show help (all or specific command) | help ols |
esttab |
Publication-style coefficient table | esttab |
quit / exit / q |
Exit REPL | quit |
The expression language used by filter and derive is a safe, recursive-descent parser. No Python eval() is used anywhere in statika.
# Arithmetic
statika> derive income_k = income / 1000
statika> derive bmi = weight / (height ** 2)
# Comparisons and boolean logic
statika> filter age > 30 and income < 50000
statika> filter not is_null(score) and region == "North"
# Functions
statika> derive log_income = log(income)
statika> derive name_upper = upper(name)
statika> derive score_clean = fill_null(score, 0)| Category | Functions |
|---|---|
| Math | log(x), log10(x), sqrt(x), abs(x), exp(x), round(x, n) |
| String | upper(x), lower(x), len_chars(x), strip(x), contains(x, "pat") |
| Null | is_null(x), is_not_null(x), fill_null(x, value) |
| Type | cast_float(x), cast_int(x), cast_str(x) |
Aggregation functions for groupby ... summarize:
| Function | Description |
|---|---|
mean(col) |
Arithmetic mean |
sd(col) |
Standard deviation (sample) |
sum(col) |
Sum |
min(col) |
Minimum |
max(col) |
Maximum |
median(col) |
Median |
count() |
Row count per group |
Every model automatically checks for common problems:
- Multicollinearity — Condition number > 30 triggers a warning
- Heteroscedasticity — Breusch-Pagan test; suggests
--robustif p < 0.05 - Autocorrelation — Durbin-Watson statistic far from 2.0
- Convergence — Warns if logit/probit MLE did not converge
- Missing values — Reports how many observations were dropped
- Low sample size — Warns when the observation-to-predictor ratio is low
| Format | Import | Export | Notes |
|---|---|---|---|
| CSV | Yes | Yes | Built-in |
| Parquet | Yes | Yes | Built-in |
| Stata (.dta) | Yes | Yes | pip install "statika[stata]" |
| Excel (.xlsx) | Yes | Yes | pip install "statika[excel]" |
statika repl # Interactive REPL
statika run script.ost # Run an .ost script
statika run script.ost --strict # Stop on first error (exit code 1)
statika --verbose repl # Verbose logging
statika --debug repl # Debug logging
statika --version # Show versionLogs are written to ~/.statika/logs/openstat.log.
Create ~/.statika/config.toml to customize defaults:
[data]
output_dir = "outputs"
csv_separator = ","
[display]
tabulate_limit = 50
head_default = 10
[undo]
max_undo_stack = 20
max_undo_memory_mb = 500
[plotting]
plot_dpi = 150
plot_figsize_w = 8.0
plot_figsize_h = 5.0
[model]
condition_threshold = 30
min_obs_per_predictor = 5
bootstrap_iterations = 1000| Component | Library | Notes |
|---|---|---|
| Data engine | Polars | Rust-powered, zero-copy, 10-100x faster than pandas |
| Statistics | statsmodels | OLS, GLM, quantile regression |
| Scientific | SciPy | Hypothesis tests, distributions |
| Plotting | matplotlib | Publication-quality figures |
| CLI | Typer | Type-annotated CLI |
| Terminal UI | Rich | Tables and formatted output |
| REPL | prompt-toolkit | Tab completion, history |
Contributions are welcome. Whether you are fixing a typo, stabilizing an experimental module, or adding a new command, the process is the same:
- Fork the repository on GitHub
- Create a feature branch:
git checkout -b feature/your-feature - Write code and tests
- Confirm tests pass and lint is clean:
pytestandruff check src/ - Open a pull request with a clear description
- Stable-core hardening — CLI/REPL behavior, error handling, command metadata
- Experimental stabilization — panel, survival, survey, IV, mixed models
- New commands — any useful data manipulation or analysis command
- Expression language — new DSL functions
- Plot types — new visualization types
- File formats — SAS, SPSS, JSON, and others
- Documentation — tutorials, examples, translations
- Bug reports — open an issue on GitHub
New to open source? Look for issues labeled good first issue. See CONTRIBUTING.md for the full setup guide.
- OLS, Logit, Probit, Poisson, Negative Binomial, Quantile regression
- Interaction terms (
x1*x2,x1:x2, three-way) - Robust and cluster-robust standard errors
- Frequency and analytic weight support
- Marginal effects (average, at-means, for OLS/logit/probit)
- Bootstrap confidence intervals (parallelized)
- Post-estimation diagnostics (
estat,vif,residuals) - Model comparison tables (
estimates table) - Stepwise variable selection (forward/backward)
- Safe expression language (no eval)
- Tab completion and multi-level undo in REPL
- LaTeX and Markdown report export
- CSV, Parquet, Stata .dta, Excel import/export
- Configuration file support
- CI/CD with GitHub Actions (1173 tests, 91% coverage)
- Stabilize experimental estimators (panel, survival, survey, IV)
- Replace remaining pandas paths in large-data workflows
- Publish full documentation site
- Improve backend abstraction (shared engine contract for load/query/model/export)
- SAS and SPSS file format support
statika is built on top of excellent open-source libraries:
- Polars — for reimagining what a DataFrame library can be
- statsmodels — for bringing professional-grade statistics to Python
- SciPy — for decades of scientific computing
- Rich — for making terminal output readable
- prompt-toolkit — for the interactive REPL foundation
MIT License. See LICENSE for the full text.
GitHub • PyPI • Contributing
Not affiliated with StataCorp LLC, IBM SPSS, or SAS Institute Inc.
statika is an independent open-source project.