Skip to content

JeffT13/binaryclassification

Repository files navigation

binaryclassification

This is a project done for DS-GA 1001 on the MAGIC Telescopic Dataset

Collaborators: Jeffrey Tumminia

Goal: Build a classifier for identifying gamma rays from noisy simulated data.

Learning Objective: Compare and constrast different supervised classification algorithms in effectiveness and efficiency.

Challenges:

  • Difficult classification dataset
  • Low computational capacity
  • Limited subject matter knowledge in feature analysis

Procedure:

DataView

  1. process csv dataset, create stratified training/val/test sets

FeatureAnalysis

  1. Compare relevance and information gain of features using DT, Univariate ROC and Correlation Matrix

FeatureEngineering

  1. Build non-linear combinations of features and perform feature analysis on new features

Baseline Models

  1. Build DT, DF, LR , SVM, Grad Boost and KNN Models as baselines for comparison

CrossVal & Parameter Tuning

  1. Perform XVal tuning for DT, RF, and LR

Tuned+Voter

  1. Show performace of tuned DT and LR + Build a voting classifier to improve overall classification.

See paper for more details and references:

About

Using magic dataset to test supervised binary classification models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published