A comprehensive Python tool for performing regression analysis with an interactive text-based terminal interface
Explore the docs »
Report Bug
·
Request Feature
Table of Contents
A comprehensive Python tool for performing regression analysis with an interactive curses-based terminal interface. Designed for statistical analysts needing exploratory regression analysis with publication-quality outputs and detailed statistical interpretation.
Version: 3.1-full-curses (Updated: 2026-02-14)
-
Multiple Data Loading Methods:
- Download data from URLs (CSV or JSON format) with CTRL-V paste support
- Interactive curses-based file browser for local files
- Support for CSV files with or without headers
- Automatic format detection and validation
-
Interactive Data Cleaning:
- Inspect data with pagination and statistics view
- Create custom filters (missing values, thresholds, ranges, outliers, custom queries)
- View and manage active filters with preview
- Transform columns to datetime format with validation
-
Regression Analysis:
- Simple Linear Regression (OLS)
- Multiple Regression (OLS)
- Logistic Regression with ROC curves
-
Comprehensive Visualizations:
- Q-Q plots for normality assessment
- Residual plots with histograms
- Correlation heatmaps
- Influence plots (Cook's distance and leverage)
- Simple regression plots (scatter, histograms, box plots, bar charts) for 2-variable analysis
- ROC curves and confusion matrices for logistic regression
- Prediction scatter plots
-
Detailed Statistical Reports:
- Full statsmodels output
- Interpretation of 25+ key statistics:
- R², Adjusted R², F-statistic
- Coefficients and standardized beta weights
- P-values with significance assessment
- Durbin-Watson (autocorrelation)
- Skewness/Kurtosis with convention choice
- VIF and Condition Number (multicollinearity)
- Cook's distance and leverage (influential observations)
- Jarque-Bera and Omnibus (normality tests)
- Breusch-Pagan (heteroscedasticity)
- AIC/BIC (model comparison)
- ROC AUC and classification metrics (logistic)
- Reports saved to text files with timestamps
-
Advanced Options:
- Kurtosis convention selection (Excess/Standard)
- Robust standard errors (HC0, HC1, HC3)
- Confidence level selection (90%, 95%, 99%)
- Classification threshold control (0.3, 0.5, 0.7)
-
Session Management: Save and resume analysis sessions with full state preservation
pip install pandas statsmodels matplotlib seaborn numpy scipy requests- Clone the repo
git clone https://github.com/derezed88/stats.git
- Navigate to the project directory
cd stats - Make the script executable
chmod +x regression.py
python regression.py
# or
./regression.pyThe tool will launch an interactive terminal-based interface with a main menu.
- Choose option 1 from main menu
- Enter a session name (optional, auto-generated if blank)
Option A: From URL (Option 4)
- Paste URL to CSV or JSON file
- Review data metrics (rows, columns, memory usage)
- Confirm to proceed
- Choose filename to save locally
Option B: From File (Option 5)
- Choose between interactive file browser or manual path entry
- File Browser Mode: Navigate directories with arrow keys, select files with Enter
- Manual Mode: Type or paste file path
- Review data metrics
- Optionally copy to data directory (skipped if already in data/)
CSV Without Headers Support:
- Automatically detects if CSV lacks header row
- Prompts you to specify column names interactively
- Creates virtual headers that persist through analysis
- Session metadata tracks user-specified headers
- Choose specific columns for analysis, or 'all' for all columns
- These columns will be available for filtering and regression
Data Inspection:
- Interactive curses-based viewer with immediate response
- Navigate with arrow keys
- Toggle between data and statistics views
- Jump to specific rows
Add Filters:
- Remove missing values
- Filter by thresholds (>, <, ==)
- Filter by ranges
- Remove outliers (beyond N standard deviations)
- Custom pandas queries
Transform Columns to Datetime:
- Select column to transform
- Specify new column name (default:
{column}_datetime) - Optional format string (e.g., '%Y-%m', '%Y-%m-%d')
- Automatic validation with rollback on failure
- Persists through filter operations
- Plots show human-readable dates instead of Unix timestamps
Filter Management:
- View active filters
- Remove individual filters
- Clear all filters
Step 1: Select Variables
- Choose dependent variable (Y)
- Choose independent variable(s) (X) - comma-separated
Step 2: Choose Regression Type
- Linear Regression (OLS)
- Multiple Regression (OLS)
- Logistic Regression
Step 3: Configure Advanced Options (if desired)
- Kurtosis convention (Excess vs Standard)
- Robust standard errors (None, HC0, HC1, HC3)
- Confidence level (90%, 95%, 99%) - for OLS
- Classification threshold (0.3, 0.5, 0.7) - for Logistic
Step 4: View Results
- Results displayed on screen
- Automatically generates:
- OLS Regression:
- Q-Q plot (normality assessment)
- Residual plots (fitted vs residuals, histogram)
- Correlation heatmap
- Influence plot (Cook's distance, leverage)
- Simple regression plots (for 2-variable analysis):
- Scatter plot with regression line and confidence/prediction bands
- Distribution histograms with KDE
- Box plots for outlier detection
- Bar plot of mean Y by binned X values
- Logistic Regression:
- ROC curve with AUC
- Confusion matrix with classification metrics
- Prediction scatter plot
- Comprehensive text report with interpretations
- OLS Regression:
- View statistical output
- Read plain-English interpretations of:
- Model Fit: R², Adjusted R², Pseudo R² (logistic)
- Overall Significance: F-statistic, LLR p-value (logistic)
- Coefficients: Effect size, direction, and standardized beta weights
- Statistical Significance: P-values with assessment
- Assumptions:
- Autocorrelation (Durbin-Watson)
- Normality (Jarque-Bera, Omnibus, Skewness, Kurtosis)
- Heteroscedasticity (Breusch-Pagan)
- Multicollinearity: VIF, Condition Number
- Influential Observations: Cook's distance, Leverage
- Model Comparison: AIC, BIC
- Classification Performance (logistic): ROC AUC, Accuracy, Precision, Recall, F1-Score, Specificity
- Save your work to resume later
- All settings, selected columns, filters, and results are preserved
- Pickled session files stored in
sessions/directory
Choose between two reporting standards:
- Excess (Fisher's) - Normal distribution = 0 (scipy default, recommended)
- Standard (Pearson's) - Normal distribution = 3 (textbook convention)
Adjust for heteroscedasticity:
- None - Classical OLS standard errors (default)
- HC0 (White) - Basic heteroscedasticity-consistent
- HC1 - HC0 with degrees-of-freedom correction
- HC3 - Conservative, best for small samples
Reports show both classical and robust SE/p-values when enabled.
Select for confidence and prediction bands in simple regression plots:
- 90% - Wider acceptance region
- 95% - Standard significance level (default)
- 99% - Stricter, wider bands
Control positive class prediction threshold:
- 0.3 - More sensitive, predicts more positives
- 0.5 - Standard balanced threshold (default)
- 0.7 - More specific, predicts fewer positives
Affects confusion matrix and all classification metrics.
- Transform text columns to proper datetime format
- Specify format string or use auto-detection
- Creates new column preserving original
- Plots automatically format datetime axes with human-readable labels
- Supports various datetime formats (year, year-month, full dates, timestamps)
↑ork- Previous page↓orK- Next pagePage Up/Down- Scroll 5 pages at a timeHome/End- Jump to first/last pages- Toggle statistics viewd- Toggle data viewj- Jump to specific row (opens input dialog)q- Quit inspection
↑/↓- Navigate up/downEnter- Select file or enter directory←/Backspace- Go up one directorya- Toggle showing all files (default: only CSV/JSON)Home/End- Jump to first/last itemPage Up/Down- Scroll by pageq- Quit browser
↑/↓- Navigate optionsHome/End- Jump to first/lastPage Up/Down- Scroll menuEnter- Select optionq- Cancel (if allowed)
↑/↓ork/j- Navigate listSpace- Toggle selection (multi-select mode)a- Select alln- Deselect allEnter- Confirm selectionq- Cancel
The tool automatically creates these directories:
./
├── data/ # Downloaded data files
├── plots/ # Generated PNG plots (300 DPI)
├── reports/ # Statistical reports (TXT)
└── sessions/ # Saved session files (PKL)
reports/regression_report_YYYYMMDD_HHMMSS.txt
reports/shape_report_YYYYMMDD_HHMMSS.txt
Contains:
- Session metadata
- Applied filters
- Full statsmodels summary
- Statistical interpretations
- All diagnostic test results
OLS Regression:
plots/qq_plot_YYYYMMDD_HHMMSS.png
plots/residual_plot_YYYYMMDD_HHMMSS.png
plots/correlation_YYYYMMDD_HHMMSS.png
plots/influence_plot_YYYYMMDD_HHMMSS.png
plots/YYYYMMDD_HHMMSS_simple_regression_plots.png # 2 variables only
Logistic Regression:
plots/roc_curve_YYYYMMDD_HHMMSS.png
plots/confusion_matrix_YYYYMMDD_HHMMSS.png
plots/prediction_scatter_YYYYMMDD_HHMMSS.png
Simple Regression Plots (generated only when analyzing 2 variables):
- Top-left (Scatter): Relationship between X and Y with fitted regression line, confidence bands, and prediction bands
- Top-right (Histograms): Distribution of both variables with KDE on dual y-axes
- Bottom-left (Box plots): Side-by-side box plots for outlier detection
- Bottom-right (Bar chart): Mean Y value for binned X ranges with sample counts
Test with public datasets:
CSV Examples:
https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csvhttps://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csvhttps://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv
JSON Examples:
- Any REST API returning JSON arrays
Filter type: 1
Column: Age
Filter type: 2
Column: Price
Threshold: 100
Filter type: 6
Column: Income
Standard deviations: 3
Filter type: 7
Query: Age > 18 and Income < 100000
"Could not parse as CSV or JSON"
- Verify URL is accessible
- Check that URL points directly to CSV/JSON file
- Ensure proper file format
"No valid data points after removing missing values"
- Review your filters - they may be too restrictive
- Use data inspection to check for missing values
- Consider removing or relaxing filters
"High multicollinearity"
- Independent variables are highly correlated
- Check VIF values in the report
- Consider removing some predictors
- Use VIF analysis to identify problematic variables
Analysis errors
- Check for non-numeric data in regression variables
- Ensure sufficient data points (at least 30 recommended, 10-20 per predictor)
- Verify dependent variable has variance
- Start with data inspection before creating filters to understand your data
- Transform datetime columns early if you have date/time data in text format
- Apply filters incrementally - add one at a time and inspect results
- Save sessions frequently to preserve your work
- Check Q-Q plots to verify normality assumptions
- Review VIF values to detect multicollinearity
- Use robust SEs if Breusch-Pagan test suggests heteroscedasticity
- Compare AIC/BIC when trying different model specifications
- Examine Cook's distance to identify influential observations
- For logistic regression, try different classification thresholds to optimize for your use case
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Open source - feel free to modify and extend!
Mark Jimenez - @properTweetment - xb12pilot@gmail.com
Project Link: https://github.com/derezed88/stats