Skip to content


Repository files navigation

Predict how many medals a country will win at the Olympics using automated feature engineering


We will investigate the medals won by each country at each historical Olympic Games (dataset pulled from Kaggle). The dataset contains each medal won at each Olympic Games, including the medaling athlete, their gender, and their country and sport.

We will generate a model using Featuretools that predicts whether or not a country will score more than 10 medals at the next Olympics. While it's possible to have some predictive accuracy without machine learning, feature engineering is necessary to improve the score.


  • We make predictions for the medals won at various points throughout history. Using just the average number of medals won has an average AUC score of 0.79.
  • Use automated feature engineering, to generate hundred of features and improve the score to 0.95 on average

Running the tutorial

  1. Clone the repo

    git clone
  2. Install the requirements

    pip install -r requirements.txt

    You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation

  3. Download the data

    You can download the data directly from Kaggle.

    After downloading the data Copy the three csv files into the structure directory data/olympic_games_data/ in the root of this repository.

  4. Run the notebooks:

    jupyter notebook

Feature Labs


Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.


Any questions can be directed to


Predict how many medals a country will win at the Olympics based on past performance using automated feature engineering







No releases published


No packages published