Skip to content

AIM-Project/AIM-Manuscript

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

Project Description:

We present code, data, and supplementary figures and documents used in the preparation of the manuscript "Defining the AIM: An Abstraction for Improving Machine Learning Prediction". We illustrate the need for abstraction describing Machine Learning pipelines to facilitate the comparison, improvement, and study of ML results by focusing on the famous ALL/AML dataset [1]. We define an abstraction layer for leaderboard style competitions to improve ML results.

Repository Contents:

  • LiteratureSearch folder:
    This folder contains two notebooks, one giving the results of our literature analysis (LiteratureSearchResults.ipynb) and the other presenting ML pipelines for the articles (SummaryofMLpipelines.ipynb).
  • ReproducingMLpipelines folder:
    This folder contains 12 notebooks, 5 for each article we studied in depth, 5 for the comparison of the articles' methods (Table 1 in the manuscript), and 2 for comparison summaries. We also included the intermediate .Rdata file we created in the folder.
  • See the ReproducingMLpipeline example folder for reproducible containers (Singularity and Docker) to run the pipeline.

Data and Associated Repos:

  • Data in the Golub et al. paper[1]: The datasets used in [1] with training dataset(38 by 7129) and testing dataset(34 by 7129).
  • Data Version 2: leukemia data in R package spikeslab(72 by 3571). We have shown that this data is a transformed version of the original data.
  • Data Version 3: 'golub' data in R package multtest. In which, 'golub' is the training dataset (38 by 3051) and 'golub.cl' is the test dataset (34 by 3051). We also have shown that this data is another transformed dataset based on the original data.
  • We use the data in [1] (also here and in the LiteratureSearch folder) to reproduce results in the papers.
  • Associated Repos
    Previous work

If you have any questions, please contact us vcs@stodden.net and xwu64@illinois.edu.

About

This repo contains supporting documents for the manuscript "Defining the AIM: An Abstraction for Improving Machine Learning Prediction"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages