Our project aims to create a simple and convenient visualization tool, Betas, for data scientists and data analysts to analyze model performance with visualizations in Python. Users can simply run a single line of code to generate custom plots for analyzing a linear regression model with assumptions diagnostics, computing model scores in binary classification, and for evaluating the performance of principal component analysis (PCA) and clustering algorithms. This tool also helps users to fit machine learning models to datasets without a detailed understanding of how the models work. Betas package is pip installable and easy to use by following our example IPython notebooks, in which we are using the Spam dataset and College dataset as demonstration. In addition, we have two interactive web dashboards designed for model diagnostics in linear regression and binary classification.
Joel Stremmel
Yiming Liu
Cathy Jia
Mengying Bi
Arjun Singh
Data Set 1: The Spam data (Source)
Data Set 2: The Breast Cancer data (Source)
Data Set 3: The College data (Source)
Data Set 4: The Auto data (Source)
Programming Languages
Python Packages
numpy >= 1.13.1
pandas >= 0.23.1
matplotlib >= 2.0.2
seaborn >= 0.9.0
scipy <= 1.2.0
scikit-learn >= 0.20.2
statsmodels >= 0.9.0
dash >= 0.43.0
bokeh >= 1.0.4
This package has the following structure. See betas library documentation for details.
betas/
|- betas/
|- README.md
|- __init__.py
|- binary_score_diagnostics.py
|- binary_score_plot.py
|- clustering_evaluate.py
|- download.js
|- pca_evaluate.py
|- regression_analysis_plot.py
|- regression_diagnostics.py
|- setup.cfg
|- test_analysis_plot.py
|- test_binary_score_plot.py
|- test_clustering_evaluate.py
|- test_pca_evaluate.py
|- data/
|- college.csv
|- spam.data.txt
|- spam.traintest.txt
|- spam_score_label.csv
|- dist/
|- betas-v1.3.tar.gz
|- docs/
|- Component_Specification.pdf
|- Final_Presentation.pdf
|- Functional_Specification.pdf
|- Project_Summary.pdf
|- Technology_Review.pdf
|- logo_black.png
|- logo_white.png
|- examples/
|- demo_regression_analysis_plot.ipynb
|- demo_binary_score_plot.ipynb
|- demo_clustering_evaluate.ipynb
|- demo_pca_evaluate.ipynb
|- LICENSE.txt
|- README.md
|- environment.yml
|- requirements.txt
|- setup.py
pip install betas