Making predictions from a DataShop dataset
In this tutorial, we show how to predict whether a student will succesfully answer a problem using a dataset from CMU DataShop. While online courses are logistically efficient, the structure can make it more difficult for a teacher to understand how students are learning in their class. To try to fill in those gaps, we can apply machine learning.
However, building an accurate machine learning model requires extracting information called features. Finding the right features is a crucial component of both finding a satisfactory answer and of interpreting the dataset as a whole. The process of feature engineering is made simple by Featuretools.
If you're running the notebook yourself, please download the geometry dataset into the
data folder in this repository. You will only need the
.txt file. The infrastructure in that notebook will work with any DataShop dataset, but you will need to change the filename to the dataset you'd like to load.
- Show how to import a DataShop dataset into featuretools
- Demonstrate efficacy of automatic feature generation by training a machine learning model
- Give an example of how Featuretools can reveal and help answer interesting questions
Here is a plot of two automatically generated features:
This is an image of the average time spent on a problem versus the success rate on a given problem. There is an interactive version of this plot which lets you hover over individual points to see the problem and problem step. Notice that the success rate on problems that take longer is uniformly lower for this dataset.
Running the tutorial
Clone the repo
git clone https://github.com/Featuretools/predict-correct-answer.git
Install the requirements
pip install -r requirements.txt
You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation
Download the data
You can download the geometry dataset from the datashop website (free account required). Follow this instructions to download the data. Take the
.txtfile from the zipped download and place it in the
datafolder in this repository.
Run the Tutorial notebook:
Note: The notebook relies on a
datashop_to_entitysetfunction which is described in depth in the entityset_function notebook.
Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.
Any questions can be directed to email@example.com