statika

Run statistical analyses in seconds — no license fees, no GUI required.
OLS, logit, survival analysis, panel data, and 260+ more commands, right from your terminal.

Install • Demo • Why statika? • Commands • Examples • Contributing

Note: statika is an independent, community-driven open-source project. It is not affiliated with, endorsed by, or connected to StataCorp LLC or any commercial statistical software vendor.

Installation

pip install statika
statika repl

That is all. No virtual environment required, no license server, no installer wizard.

Optional extras

pip install "statika[excel]"    # Excel (.xlsx) import/export
pip install "statika[stata]"    # Stata .dta import/export
pip install "statika[survival]" # Survival analysis (lifelines)
pip install "statika[all]"      # Everything above

30-Second Demo

$ statika repl
statika v1.0.0 — Open-source statistical analysis tool
Type help for commands, quit to exit.

statika> load examples/data.csv
Loaded 50 rows x 7 columns from examples/data.csv

statika> summarize age income score
┌──────────┬────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬─────────┐
│ Variable │ N  │ Mean    │ SD      │ Min   │ P25     │ P50     │ P75     │ Max     │
├──────────┼────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────┤
│ age      │ 50 │ 34.6600 │  8.7634 │ 21.00 │ 27.2500 │ 34.0000 │ 42.5000 │ 53.0000 │
│ income   │ 50 │ 49840.0 │ 17547.2 │ 26000 │ 34000.0 │ 47000.0 │ 66000.0 │ 88000.0 │
│ score    │ 50 │  7.4280 │  1.2844 │  4.90 │  6.4750 │  7.5000 │  8.5500 │  9.4000 │
└──────────┴────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴─────────┘

statika> ols score ~ age + income --robust
┌──────────┬────────┬─────────┬───────┬────────┬────────────┬─────────────┐
│ Variable │ Coef   │ Std.Err │ t/z   │ P>|t|  │ [95% CI L] │ [95% CI H]  │
├──────────┼────────┼─────────┼───────┼────────┼────────────┼─────────────┤
│ _cons    │ 2.1435 │ 0.4521  │ 4.741 │ 0.0000 │ 1.2343     │ 3.0527      │
│ age      │ 0.0312 │ 0.0187  │ 1.668 │ 0.1018 │ -0.0066    │ 0.0690      │
│ income   │ 0.0001 │ 0.0000  │ 5.234 │ 0.0000 │ 0.0000     │ 0.0001      │
└──────────┴────────┴─────────┴───────┴────────┴────────────┴─────────────┘
N = 50  |  R² = 0.5481  |  Adj.R² = 0.5289  |  F(2, 47) = 28.52 (p=0.0000)

statika> margins
Average marginal effects computed.

statika> estimates table
Model comparison table generated.

statika> quit
Bye!

Run a script instead

statika run analysis.ost           # Run an .ost script
statika run analysis.ost --strict  # Stop on first error (useful in CI)

Why statika?

Feature	Stata	R	SPSS	statika
Price	$595/yr	Free	$99/mo	Free
Familiar CLI syntax	Yes	No	No	Yes
Scripting	Yes	Yes	No	Yes
Python ecosystem	No	No	No	Yes
No eval / safe DSL	—	—	—	Yes
Interactive REPL	No	Partial	No	Yes
Polars backend	No	No	No	Yes

statika is designed for researchers and data scientists who want the muscle memory of a CLI workflow without paying for it, and who want scripted, reproducible analyses that fit into version-controlled projects.

Stable vs Experimental

statika distinguishes between a stable core and experimental modules:

Stable core: data loading, transformation, descriptive statistics, core regression models, hypothesis tests, plotting, reporting, scripting.
Experimental: panel data, survival analysis, survey-weighted estimation, SEM, network analysis, spatial statistics, and advanced ML commands.

Help and tab completion default to stable commands. To inspect the experimental surface:

statika> help --list --experimental

Quick Examples

1. Basic data exploration

statika> load survey.csv
statika> describe
statika> summarize age income education
statika> tabulate region
statika> crosstab gender employed
statika> corr age income score

2. OLS regression with post-estimation

statika> load data.csv
statika> ols income ~ age + education + experience --robust
statika> predict yhat
statika> residuals resid
statika> vif
statika> estat all
statika> latex results/model.tex

3. Logit with marginal effects and model comparison

statika> logit employed ~ age + income + education
statika> margins
statika> margins --at=means
statika> ols employed ~ age + income + education
statika> estimates table

4. Grouped analysis and hypothesis tests

statika> groupby region summarize mean(income) sd(income) count()
statika> ttest income by employed
statika> anova score by region
statika> chi2 region employed

5. Scripted reproducible analysis (.ost file)

Create analysis.ost:

# analysis.ost — reproducible wage regression
load data/wages.csv
describe
summarize wage age education experience

derive log_wage = log(wage)
encode region as region_code

ols log_wage ~ age + education + experience --robust
predict yhat
residuals resid
vif
estat all
bootstrap n=1000 ci=95

latex outputs/wage_table.tex
report outputs/wage_report.md
save outputs/wages_modeled.parquet

Run it:

statika run analysis.ost --strict

Command Reference

Data Management (8 commands)

Command	Description	Example
`load <path>`	Load CSV, Parquet, Stata (.dta), Excel (.xlsx)	`load survey.csv`
`save <path>`	Save data to any supported format	`save results.parquet`
`describe`	Show dataset structure (types, nulls)	`describe`
`head [N]`	Show first N rows (default: 10)	`head 20`
`tail [N]`	Show last N rows	`tail 5`
`count`	Row and column count	`count`
`merge <path> on <key> [how=...]`	Join with another file	`merge scores.csv on id how=left`
`undo`	Undo last data change (multi-level)	`undo`

Data Transformation (18 commands)

Command	Description	Example
`filter <expr>`	Filter rows with expressions	`filter age > 30 and income < 50000`
`select <cols>`	Keep specific columns	`select age income score`
`derive <col> = <expr>`	Create new variables	`derive bmi = weight / (height ** 2)`
`dropna [cols]`	Drop missing values	`dropna age income`
`fillna <col> <strategy>`	Fill missing values	`fillna income median`
`sort <col> [--desc]`	Sort dataset	`sort income --desc`
`rename <old> <new>`	Rename a column	`rename income salary`
`cast <col> <type>`	Cast column type	`cast age float`
`encode <col> [as <new>]`	Label-encode strings	`encode region as region_code`
`recode <col> old=new ...`	Recode values	`recode region North=N South=S`
`replace <col> <old> <new>`	Replace values	`replace region North Norte`
`sample <N\|N%>`	Random sample	`sample 100` or `sample 10%`
`duplicates [drop] [cols]`	Find or drop duplicates	`duplicates drop`
`unique <col>`	List unique values	`unique region`
`lag <col> [N]`	Lag variable (shift down)	`lag price 2`
`lead <col> [N]`	Lead variable (shift up)	`lead price`
`pivot <val> by <col>`	Reshape to wide format	`pivot score by subject over name`
`melt <ids>, <vals>`	Reshape to long format	`melt name, math eng`

Descriptive Statistics (5 commands)

Command	Description	Example
`summarize [cols]`	Summary statistics (N, Mean, SD, quartiles)	`summarize age income`
`tabulate <col>`	Frequency table (top 50 values)	`tabulate education`
`crosstab <row> <col>`	Two-way contingency table with row percentages	`crosstab gender status`
`corr [cols]`	Pearson correlation matrix	`corr age income score`
`groupby <cols> summarize <aggs>`	Group-by with aggregations	`groupby region summarize mean(income) count()`

Statistical Models (6 commands)

Command	Description	Example
`ols y ~ x1 + x2`	OLS linear regression	`ols score ~ age + income --robust`
`logit y ~ x1 + x2`	Logistic regression (binary)	`logit employed ~ age + income`
`probit y ~ x1 + x2`	Probit regression (binary)	`probit employed ~ age + income`
`poisson y ~ x1 + x2`	Poisson regression (counts)	`poisson visits ~ age --exposure=time`
`negbin y ~ x1 + x2`	Negative Binomial (overdispersed)	`negbin claims ~ age + gender`
`quantreg y ~ x1 + x2`	Quantile regression	`quantreg wage ~ edu + exp tau=0.9`

All models support:

--robust — heteroscedasticity-robust standard errors (HC1)
--cluster=col — cluster-robust standard errors
--weight=col — frequency/analytic weights

Formula syntax:

y ~ x1 + x2 — standard predictors
y ~ x1*x2 — full factorial (expands to x1 + x2 + x1:x2)
y ~ x1:x2 — interaction term only
y ~ x1*x2*x3 — three-way interaction

Post-Estimation (9 commands)

Command	Description	Example
`predict [name]`	Predicted values from last model	`predict yhat`
`residuals [name]`	Residuals + diagnostic plots	`residuals resid`
`vif`	Variance Inflation Factor	`vif`
`margins [--at=means\|average]`	Marginal effects (dy/dx)	`margins --at=average`
`bootstrap [n=N] [ci=N]`	Bootstrap confidence intervals	`bootstrap n=1000 ci=95`
`estat <sub>`	Post-estimation diagnostics	`estat all`
`estimates table`	Side-by-side model comparison	`estimates table`
`stepwise y ~ x1 + ...`	Stepwise variable selection	`stepwise y ~ x1 + x2 --backward`
`latex [path.tex]`	Export model as LaTeX table	`latex results.tex`

estat subcommands: hettest, ovtest, linktest, ic, all

Hypothesis Tests (5 commands)

Command	Description	Example
`ttest <col>`	One-sample t-test	`ttest score mu=7`
`ttest <col> by <group>`	Two-sample Welch t-test	`ttest income by employed`
`ttest <col> paired <col2>`	Paired t-test	`ttest before paired after`
`chi2 <col1> <col2>`	Chi-square independence test	`chi2 region employed`
`anova <col> by <group>`	One-way ANOVA (F-test)	`anova score by region`

Visualization (7 commands)

Command	Description	Example
`plot hist <col>`	Histogram	`plot hist age`
`plot scatter <y> <x>`	Scatter plot	`plot scatter score income`
`plot line <y> <x>`	Line plot	`plot line score age`
`plot box <col> [by <g>]`	Box plot (optionally grouped)	`plot box income by region`
`plot bar <col> [by <g>]`	Bar chart	`plot bar income by region`
`plot heatmap [cols]`	Correlation heatmap	`plot heatmap age income score`
`plot diagnostics`	Residual diagnostic plots	`plot diagnostics`

Reporting and Utilities (4 commands)

Command	Description	Example
`report <path>`	Generate Markdown report	`report analysis.md`
`help [cmd]`	Show help (all or specific command)	`help ols`
`esttab`	Publication-style coefficient table	`esttab`
`quit` / `exit` / `q`	Exit REPL	`quit`

Expression Language

The expression language used by filter and derive is a safe, recursive-descent parser. No Python eval() is used anywhere in statika.

# Arithmetic
statika> derive income_k = income / 1000
statika> derive bmi = weight / (height ** 2)

# Comparisons and boolean logic
statika> filter age > 30 and income < 50000
statika> filter not is_null(score) and region == "North"

# Functions
statika> derive log_income = log(income)
statika> derive name_upper = upper(name)
statika> derive score_clean = fill_null(score, 0)

Category	Functions
Math	`log(x)`, `log10(x)`, `sqrt(x)`, `abs(x)`, `exp(x)`, `round(x, n)`
String	`upper(x)`, `lower(x)`, `len_chars(x)`, `strip(x)`, `contains(x, "pat")`
Null	`is_null(x)`, `is_not_null(x)`, `fill_null(x, value)`
Type	`cast_float(x)`, `cast_int(x)`, `cast_str(x)`

Aggregation functions for groupby ... summarize:

Function	Description
`mean(col)`	Arithmetic mean
`sd(col)`	Standard deviation (sample)
`sum(col)`	Sum
`min(col)`	Minimum
`max(col)`	Maximum
`median(col)`	Median
`count()`	Row count per group

Automatic Model Diagnostics

Every model automatically checks for common problems:

Multicollinearity — Condition number > 30 triggers a warning
Heteroscedasticity — Breusch-Pagan test; suggests --robust if p < 0.05
Autocorrelation — Durbin-Watson statistic far from 2.0
Convergence — Warns if logit/probit MLE did not converge
Missing values — Reports how many observations were dropped
Low sample size — Warns when the observation-to-predictor ratio is low

File Formats

Format	Import	Export	Notes
CSV	Yes	Yes	Built-in
Parquet	Yes	Yes	Built-in
Stata (.dta)	Yes	Yes	`pip install "statika[stata]"`
Excel (.xlsx)	Yes	Yes	`pip install "statika[excel]"`

CLI Reference

statika repl                     # Interactive REPL
statika run script.ost           # Run an .ost script
statika run script.ost --strict  # Stop on first error (exit code 1)
statika --verbose repl           # Verbose logging
statika --debug repl             # Debug logging
statika --version                # Show version

Logs are written to ~/.statika/logs/openstat.log.

Configuration

Create ~/.statika/config.toml to customize defaults:

[data]
output_dir = "outputs"
csv_separator = ","

[display]
tabulate_limit = 50
head_default = 10

[undo]
max_undo_stack = 20
max_undo_memory_mb = 500

[plotting]
plot_dpi = 150
plot_figsize_w = 8.0
plot_figsize_h = 5.0

[model]
condition_threshold = 30
min_obs_per_predictor = 5
bootstrap_iterations = 1000

Technology Stack

Component	Library	Notes
Data engine	Polars	Rust-powered, zero-copy, 10-100x faster than pandas
Statistics	statsmodels	OLS, GLM, quantile regression
Scientific	SciPy	Hypothesis tests, distributions
Plotting	matplotlib	Publication-quality figures
CLI	Typer	Type-annotated CLI
Terminal UI	Rich	Tables and formatted output
REPL	prompt-toolkit	Tab completion, history

Contributing

Contributions are welcome. Whether you are fixing a typo, stabilizing an experimental module, or adding a new command, the process is the same:

Fork the repository on GitHub
Create a feature branch: git checkout -b feature/your-feature
Write code and tests
Confirm tests pass and lint is clean: pytest and ruff check src/
Open a pull request with a clear description

What to contribute

Stable-core hardening — CLI/REPL behavior, error handling, command metadata
Experimental stabilization — panel, survival, survey, IV, mixed models
New commands — any useful data manipulation or analysis command
Expression language — new DSL functions
Plot types — new visualization types
File formats — SAS, SPSS, JSON, and others
Documentation — tutorials, examples, translations
Bug reports — open an issue on GitHub

New to open source? Look for issues labeled good first issue. See CONTRIBUTING.md for the full setup guide.

Roadmap

Completed

Planned

Stabilize experimental estimators (panel, survival, survey, IV)
Replace remaining pandas paths in large-data workflows
Publish full documentation site
Improve backend abstraction (shared engine contract for load/query/model/export)
SAS and SPSS file format support

Acknowledgements

statika is built on top of excellent open-source libraries:

Polars — for reimagining what a DataFrame library can be
statsmodels — for bringing professional-grade statistics to Python
SciPy — for decades of scientific computing
Rich — for making terminal output readable
prompt-toolkit — for the interactive REPL foundation

License

MIT License. See LICENSE for the full text.

GitHub • PyPI • Contributing

Not affiliated with StataCorp LLC, IBM SPSS, or SAS Institute Inc.
statika is an independent open-source project.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
examples		examples
src/openstat		src/openstat
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

statika

Installation

Optional extras

30-Second Demo

Run a script instead

Why statika?

Stable vs Experimental

Quick Examples

1. Basic data exploration

2. OLS regression with post-estimation

3. Logit with marginal effects and model comparison

4. Grouped analysis and hypothesis tests

5. Scripted reproducible analysis (.ost file)

Command Reference

Expression Language

Automatic Model Diagnostics

File Formats

CLI Reference

Configuration

Technology Stack

Contributing

What to contribute

Roadmap

Completed

Planned

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

statika

Installation

Optional extras

30-Second Demo

Run a script instead

Why statika?

Stable vs Experimental

Quick Examples

1. Basic data exploration

2. OLS regression with post-estimation

3. Logit with marginal effects and model comparison

4. Grouped analysis and hypothesis tests

5. Scripted reproducible analysis (.ost file)

Command Reference

Expression Language

Automatic Model Diagnostics

File Formats

CLI Reference

Configuration

Technology Stack

Contributing

What to contribute

Roadmap

Completed

Planned

Acknowledgements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages