Skip to content

This project aims to replicate and extend the comprehensive evaluation of supervised learning algorithms, inspired by the study conducted by Rich Caruana and Alexandru Niculescu-Mizil (CNM06). While the original study compared ten algorithms, this project focuses on three popular ones.

Notifications You must be signed in to change notification settings

DilrajS/Supervised-Learning-Algorithms-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Supervised Learning Algorithm Comparisons

AI

This project aims to replicate and extend the comprehensive evaluation of supervised learning algorithms, inspired by the study conducted by Rich Caruana and Alexandru Niculescu-Mizil (CNM06). While the original study compared ten algorithms, this project focuses on three popular ones.

Background

The last extensive evaluation of supervised learning algorithms was conducted in the 90s. Since then, the landscape of machine learning has evolved significantly. This project revisits the topic, drawing inspiration from the CNM06 study.

Objective

Replicate the results of the CNM06 study, but with a focus on three algorithms: k-nearest neighbors, logistic regression, and decision trees. Various performance metrics are used for evaluation.

Original Study

An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics (Empirical Comparison)

Data Sets Used

Methodology

The methodology closely follows the original Cornell study (Empirical Comparison). Three datasets from the UCI Machine Learning Repository were chosen. Each dataset underwent preprocessing as described in the CNM06 paper. For each classifier-dataset combination, three trials were conducted, totaling 27 trials. The hyperparameter tuning process and specific settings for each algorithm are detailed in the CNM06 paper.

Experiment

A 5-fold cross-validation was performed on a training set of size 5000 for each dataset. The modeling process involved using pipelines for each algorithm on the training set. The best hyperparameters were chosen based on the performance during cross-validation. The accuracy of each model was computed for each trial, and the overall performance was determined by averaging the accuracies. Detailed performance metrics and comparisons can be found in the provided charts.

Results

How to Run Locally

  1. Visit Google Colab (Internet connection and Gmail account required).
  2. Select "GitHub" in the open window and paste the project URL.
  3. Download the desired dataset(s) and upload them to Google Colab using the 'Files' icon on the left sidebar.
  4. Click 'Runtime' and select 'Run all'.

About

This project aims to replicate and extend the comprehensive evaluation of supervised learning algorithms, inspired by the study conducted by Rich Caruana and Alexandru Niculescu-Mizil (CNM06). While the original study compared ten algorithms, this project focuses on three popular ones.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published