Skip to content

Predict whether a student will correctly answer a problem based on past performance using automated feature engineering

License

Notifications You must be signed in to change notification settings

bukosabino/predict-correct-answer

 
 

Repository files navigation

Making predictions from a DataShop dataset

Featuretools

In this tutorial, we show how to predict whether a student will succesfully answer a problem using a dataset from CMU DataShop. While online courses are logistically efficient, the structure can make it more difficult for a teacher to understand how students are learning in their class. To try to fill in those gaps, we can apply machine learning.

However, building an accurate machine learning model requires extracting information called features. Finding the right features is a crucial component of both finding a satisfactory answer and of interpreting the dataset as a whole. The process of feature engineering is made simple by Featuretools.

If you're running the notebook yourself, please download the geometry dataset into the data folder in this repository. You will only need the .txt file. The infrastructure in that notebook will work with any DataShop dataset, but you will need to change the filename to the dataset you'd like to load.

Highlights

  • Show how to import a DataShop dataset into featuretools
  • Demonstrate efficacy of automatic feature generation by training a machine learning model
  • Give an example of how Featuretools can reveal and help answer interesting questions

Here is a plot of two automatically generated features:

Example image

This is an image of the average time spent on a problem versus the success rate on a given problem. There is an interactive version of this plot which lets you hover over individual points to see the problem and problem step. Notice that the success rate on problems that take longer is uniformly lower for this dataset.

Running the tutorial

  1. Clone the repo

    git clone https://github.com/Featuretools/predict-correct-answer.git
    
  2. Install the requirements

    pip install -r requirements.txt
    

    You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation

  3. Download the data

    You can download the geometry dataset from the datashop website (free account required). Follow this instructions to download the data. Take the .txt file from the zipped download and place it in the data folder in this repository.

  4. Run the Tutorial notebook:

    jupyter notebook
    

    Note: The notebook relies on a datashop_to_entityset function which is described in depth in the entityset_function notebook.

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Contact

Any questions can be directed to help@featurelabs.com

About

Predict whether a student will correctly answer a problem based on past performance using automated feature engineering

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 66.5%
  • Jupyter Notebook 33.3%
  • Python 0.2%