Skip to content

I participated in an afternoon hacking challenge with two teammates during which we were challenged to create a model that predicted salary information based on 1994 US Census Data. We were only allowed to use 3 features to build out our model. After 4 hours, we built out an XGBoost model that received an 84% accuracy score.

thedatasleuth/Fast-Good-Cheap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

title type creator
Data Science Venn Diagram
exercise
name city
Alexander Combs
NYC

Data Science Venn Diagram

This afternoon we are going to have a team-based competition. The goal is to create the best performing model on a hold-out sample of data. Simple right?

Well, there is a catch.

This will be a constrained optimization. To understand what that means, let's take a look at the Project Management Venn Diagram, below.

The idea is that for any project you can have any two of these. You can have good work done cheap, but it will take a long time. You can have good work done fast, but it won't be cheap. Or you can have work done fast and on the cheap, but it won't be good.

Today we will apply this concept to data science.

You will be given a dataset and teams will be randomly assigned to one constraint: samples, features or algorithm.

Team Samples

  • Your choice of algorithm
  • Your choice of features
  • The class's sample constraint

Team Features

  • Your choice of algorithm
  • The class's feature constraint
  • Your choice of samples

Team Algorithm

  • The class's algorithm constraint
  • Your choice of features
  • Your choice of samples

You should be aware that the class's constraint while not the worst-case scenario, will be highly unfavorable to you.

Your team will have until 5:30pm to build the best model possible under those constraints.

The feature data can be downloaded here and the target can be downloaded here. Descriptions of the data can be found here and here.

At 5:10, you will be given a holdout test set, you should return via Slack (@adam.blomberg) your predictions for each row in that set. This will simply be a csv with a single column of 1s and 0s without a header.

The task is to predict if a person's income is in excess of $50,000 given certain profile information.

Good luck!

About

I participated in an afternoon hacking challenge with two teammates during which we were challenged to create a model that predicted salary information based on 1994 US Census Data. We were only allowed to use 3 features to build out our model. After 4 hours, we built out an XGBoost model that received an 84% accuracy score.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published