Predict how many medals a country will win at the Olympics based on past performance using automated feature engineering
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Investigating Medals at the Olympic Games using Featuretools


Goals | Installation | Featuretools Basics | Baselines using Featuretools | Results

Overview of Results

We make predictions for the medals won at various points throughout history. Using just the average number of medals won has an average AUC score of 0.74. When we use automated feature engineering, we can generate hundred of features and improve the score to 0.95 on average. Because the model is so accurate, we can see clear evidence of historical events that occur outside of our data.


Featuretools is a framework to perform automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning.

The notebooks here show how Featuretools:

  • Simplifies data science-related code
  • Enables us to ask innovative questions
  • Avoids classic label-leakage problems
  • Exhaustively generates hundreds of features

We do so by investigating the medals won by each country at each historical Olympic Games (dataset pulled from Kaggle). The dataset contains each medal won at each Olympic Games, including the medaling athlete, their gender, and their country and sport.

I'll generate a model using Featuretools that predicts whether or not a country will score more than 10 medals at the next Olympics. While it's possible to have some predictive accuracy without machine learning, feature engineering is necessary to improve the score.


pip install -r requirements.txt

The Olympic Games dataset is found here. Copy the three csv files into the data/olympic_games_data directory

Detailed Description of Notebooks

Featuretools Basics: FeaturetoolsPredictiveModeling.ipynb

In this notebook, I'll explain how to use out-of-the-box methods from Featuretools to transform the raw Olympics dataset into a machine-learning-ready feature matrix. Along the way, I'll build a machine learning model and explore which features were the most predictive.

Baselines using Featuretools: BaselineSolution.ipynb

Machine learning performance scores should never be taken at face value. To have any merit, they must be compared against a simple baseline model to see how much improvement they produced. In this notebook, I'll construct a baseline solution leveraging Featuretools to easily build a custom feature.

Feature Labs


Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.


Any questions can be directed to