Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
img
 
 
src
 
 
 
 

Binary Classification with Apache Spark / HDFS

data source

The goal of the competition is to predict which parts will fail quality control

My goal is to utilize the hadoop ecosystem to handle a large dataset and establish a pipeline for machine learning

munge :

  • Aggregate columns using RDD transformations
  • Create a column that indicates which of those column aggregations are outliers.

fit_predict :

  • Model data with Spark Machine Learning package
  • Predict on test data

munge_fit_predict :

  • Run this as is to use the toy data set example

About

Binary classification of products passage or failure of quality control

Resources

Releases

No releases published

Packages

No packages published