Skip to content
Predict whether a student will correctly answer a problem based on past performance using automated feature engineering
Branch: master
Clone or download
Latest commit bb81e3a Feb 5, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data adding input dataa Jan 2, 2019
Demo - DataShop.ipynb fixing warnings messages Feb 5, 2019
LICENSE Add files via upload Dec 11, 2018
README.md Update README.md Feb 2, 2019
entityset_function.ipynb plotting entityset Feb 1, 2019
requirements.txt plotting entityset Feb 1, 2019
utils.py clean code Jan 21, 2019

README.md

Making predictions from a DataShop dataset

Featuretools

In this tutorial, we show how to predict whether a student will succesfully answer a problem using a dataset from CMU DataShop. While online courses are logistically efficient, the structure can make it more difficult for a teacher to understand how students are learning in their class. To try to fill in those gaps, we can apply machine learning.

However, building an accurate machine learning model requires extracting information called features. Finding the right features is a crucial component of both finding a satisfactory answer and of interpreting the dataset as a whole. The process of feature engineering is made simple by Featuretools.

If you're running the notebook yourself, please download the geometry dataset into the data folder in this repository. You will only need the .txt file. The infrastructure in that notebook will work with any DataShop dataset, but you will need to change the filename to the dataset you'd like to load.

Highlights

  • Show how to import a DataShop dataset into featuretools
  • Demonstrate efficacy of automatic feature generation by training a machine learning model
  • Give an example of how Featuretools can reveal and help answer interesting questions

Here is a plot of two automatically generated features:

Example image

This is an image of the average time spent on a problem versus the success rate on a given problem. There is an interactive version of this plot which lets you hover over individual points to see the problem and problem step. Notice that the success rate on problems that take longer is uniformly lower for this dataset.

Running the tutorial

  1. Clone the repo

    git clone https://github.com/Featuretools/predict-correct-answer.git
    
  2. Install the requirements

    pip install -r requirements.txt
    

    You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation

  3. Download the data

    You can download the geometry dataset from the datashop website (free account required). Follow this instructions to download the data. Take the .txt file from the zipped download and place it in the data folder in this repository.

  4. Run the Tutorial notebook:

    jupyter notebook
    

    Note: The notebook relies on a datashop_to_entityset function which is described in depth in the entityset_function notebook.

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Contact

Any questions can be directed to help@featurelabs.com

You can’t perform that action at this time.