Skip to content

cakcora/multiversePy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

multiversePy

Setup

  1. Install requirements: pip install -r requirements.txt
  2. Download datasets into respective folders in /data/
  3. Run preprocessing script: python preprocess.py for downloaded datasets
  4. Run main program: python main.py

Todo

  • Clean the datasets
  • Save clean versions of datasets
  • Look into Compute Canada
  • UCI data downloading
  • Scale features if too big
  • Create configs for datasets (start with Huseyin's 18 datasets)
  • Run two level RF analysis on the datasets
  • Record entropy for major poisoning levels
  • Record performance (AUC, Bias, LogLoss) of the first level RF trained on test data
  • Use the fewest neurons and the fewest neural network layers to reach RF performance (or use the same number of neurons and layers for all datasets and compare performance results?)
  • Based on RF breaking point and NN simplicity, explain data in global terms (global explanations) or in terms of salient data points (local explanations). Both are open research problems.
  • Global explanations can be managed by using functional data depth on entropy lines? Reporting breaking points in performance wrt. the poisoning rate?
  • Local explanations (which data points' removal cause the biggest drop in datasets)

Dataset Info

Dataset Links

Ignored Datasets

Dataset Reason
haberman too few features
temp of major cities not classification
wisconsin cancer duplicate

Other Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published