Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Project Description:

We present code, data, and supplementary figures and documents used in the preparation of the manuscript "Defining the AIM: An Abstraction for Improving Machine Learning Prediction". We illustrate the need for abstraction describing Machine Learning pipelines to facilitate the comparison, improvement, and study of ML results by focusing on the famous ALL/AML dataset [1]. We define an abstraction layer for leaderboard style competitions to improve ML results.

Repository Contents:

  • LiteratureSearch folder:
    This folder contains two notebooks, one giving the results of our literature analysis (LiteratureSearchResults.ipynb) and the other presenting ML pipelines for the articles (SummaryofMLpipelines.ipynb).
  • ReproducingMLpipelines folder:
    This folder contains 12 notebooks, 5 for each article we studied in depth, 5 for the comparison of the articles' methods (Table 1 in the manuscript), and 2 for comparison summaries. We also included the intermediate .Rdata file we created in the folder.
  • See the ReproducingMLpipeline example folder for reproducible containers (Singularity and Docker) to run the pipeline.

Data and Associated Repos:

  • Data in the Golub et al. paper[1]: The datasets used in [1] with training dataset(38 by 7129) and testing dataset(34 by 7129).
  • Data Version 2: leukemia data in R package spikeslab(72 by 3571). We have shown that this data is a transformed version of the original data.
  • Data Version 3: 'golub' data in R package multtest. In which, 'golub' is the training dataset (38 by 3051) and '' is the test dataset (34 by 3051). We also have shown that this data is another transformed dataset based on the original data.
  • We use the data in [1] (also here and in the LiteratureSearch folder) to reproduce results in the papers.
  • Associated Repos
    Previous work

If you have any questions, please contact us and


This repo contains supporting documents for the manuscript "Defining the AIM: An Abstraction for Improving Machine Learning Prediction"







No releases published


No packages published