Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Making predictions from a DataShop dataset


In this tutorial, we show how to predict whether a student will succesfully answer a problem using a dataset from CMU DataShop. While online courses are logistically efficient, the structure can make it more difficult for a teacher to understand how students are learning in their class. To try to fill in those gaps, we can apply machine learning.

However, building an accurate machine learning model requires extracting information called features. Finding the right features is a crucial component of both finding a satisfactory answer and of interpreting the dataset as a whole. The process of feature engineering is made simple by Featuretools.

If you're running the notebook yourself, please download the geometry dataset into the data folder in this repository. You will only need the .txt file. The infrastructure in that notebook will work with any DataShop dataset, but you will need to change the filename to the dataset you'd like to load.


  • Show how to import a DataShop dataset into featuretools
  • Demonstrate efficacy of automatic feature generation by training a machine learning model
  • Give an example of how Featuretools can reveal and help answer interesting questions

Here is a plot of two automatically generated features:

Example image

This is an image of the average time spent on a problem versus the success rate on a given problem. There is an interactive version of this plot which lets you hover over individual points to see the problem and problem step. Notice that the success rate on problems that take longer is uniformly lower for this dataset.

Running the tutorial

  1. Clone the repo

    git clone
  2. Install the requirements

    pip install -r requirements.txt

    You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation

  3. Download the data

    You can download the geometry dataset from the datashop website (free account required). Follow this instructions to download the data. Take the .txt file from the zipped download and place it in the data folder in this repository.

  4. Run the Tutorial notebook:

    jupyter notebook

    Note: The notebook relies on a datashop_to_entityset function which is described in depth in the entityset_function notebook.

Feature Labs


Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.


Any questions can be directed to


Predict whether a student will correctly answer a problem based on past performance using automated feature engineering




No releases published


No packages published