Automated Learning for Insightful Comparison and Evaluation (ALICE) merges conventional feature selection and the concept of inter-rater agreeability in a simple, user-friendly manner to seek insights into black box Machine Learning models. Currently supports (and has been tested on) Scikit-Learn and Keras models.
Authors: Bachana Anasashvili, Vahidin Jeleskovic
Paper: arXiv
Results included from the repository are from three experiments on the Telco Customer Churn dataset:
- Mulit-Layer Perceptron (MLP) vs. Logistic Regression (Logit)
- Multi-Layer Perceptron (MLP) vs. Random Forest Classifier (RFC)
- Random Forest Classifier (RFC) vs. Logistic Regression (Logit)
Notebooks
customer_churn_test.ipynb
- Jupyter Notebook for experiments and use demonstration / instructionsresults_analysis.ipynb
- Jupyter Notebook demonstrating experiment results and plotscustomer_churn_dataprocessing.ipynb
- Jupyter Notebook for transparency of data cleaning and manipualtion
Folders
alice
- Code modules for the frameworkclean_data
- Saved train-test setstest_results
- Saved experiment resultstest_results/experiment_results_20240301_1/experiment_results_20240301_1.json
- MLP vs. Logit Experimenttest_results/experiment_results_20240302_1/experiment_results_20240302_1.json
- MLP vs. RFC Experimenttest_results/experiment_results_20240302_2/experiment_results_20240302_2.json
- RFC vs. Logit Experiment
Files
class_telco.pkl
- Processed and cleaned Telecom customer churn dataset for classificationreg_telco.pkl
- Processed and cleaned Telecom customer churn dataset for regressionTelco_customer_churn.xlsx
- Raw datarequirements.txt
- Required python libraries and their versions
Note that the results may not be exactly reproducible due to tha nature of neural networks, random forests and their optimization.
For re-running the experiments, or testing the framework:
- Make sure to set up a virtual Python environment and install the required packages
! pip install -r requirements.txt
- Run the
customer_churn_test.ipynb
Notebook.- Re-running sections 1, 2, 3, and 4 are mandatory to be able to re-run the experiments.
- Experiments are contained under section 5. Given the computationally costly nature of the models, section 7 includes two simpler models, Logistic Regression and a Decision Tree Classifier, for those who want to quickly test the functionalities of the framework.