Predict how many medals a country will win at the Olympics based on past performance using automated feature engineering
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data/olympic_games_data
images
.gitignore
BaselineSolution.ipynb
LICENSE
PredictOlympicMedals.ipynb
README.md
__init__.py
requirements.txt
utils.py

README.md

Investigating Medals at the Olympic Games using Featuretools

Featuretools

Goals | Installation | Featuretools Basics | Baselines using Featuretools | Results

Overview of Results

We make predictions for the medals won at various points throughout history. Using just the average number of medals won has an average AUC score of 0.74. When we use automated feature engineering, we can generate hundred of features and improve the score to 0.95 on average. Because the model is so accurate, we can see clear evidence of historical events that occur outside of our data.

Goals

Featuretools is a framework to perform automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning.

The notebooks here show how Featuretools:

  • Simplifies data science-related code
  • Enables us to ask innovative questions
  • Avoids classic label-leakage problems
  • Exhaustively generates hundreds of features

We do so by investigating the medals won by each country at each historical Olympic Games (dataset pulled from Kaggle). The dataset contains each medal won at each Olympic Games, including the medaling athlete, their gender, and their country and sport.

I'll generate a model using Featuretools that predicts whether or not a country will score more than 10 medals at the next Olympics. While it's possible to have some predictive accuracy without machine learning, feature engineering is necessary to improve the score.

Installation

pip install -r requirements.txt

The Olympic Games dataset is found here. Copy the three csv files into the data/olympic_games_data directory

Detailed Description of Notebooks

Featuretools Basics: FeaturetoolsPredictiveModeling.ipynb

In this notebook, I'll explain how to use out-of-the-box methods from Featuretools to transform the raw Olympics dataset into a machine-learning-ready feature matrix. Along the way, I'll build a machine learning model and explore which features were the most predictive.

Baselines using Featuretools: BaselineSolution.ipynb

Machine learning performance scores should never be taken at face value. To have any merit, they must be compared against a simple baseline model to see how much improvement they produced. In this notebook, I'll construct a baseline solution leveraging Featuretools to easily build a custom feature.

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Contact

Any questions can be directed to help@featurelabs.com