Automated vs Manual Feature Engineering Comparison. Implemented using Featuretools.
Switch branches/tags
Nothing to show
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
Engine Life Updated functions in notebook Nov 16, 2018
Loan Repayment Updated dask Sep 4, 2018
Retail Spending Images of data Aug 20, 2018
images Updated images Aug 9, 2018
.gitignore Update .gitignore Jul 26, 2018 Update Aug 19, 2018
requirements.txt Updated requirements: Aug 3, 2018

Manual vs Automated Feature Engineering Comparison

The traditional process of manual feature engineering requires building one feature at a time by hand informed by domain knowledge. This is tedious, time-consuming, error prone, and perhaps most importantly, specific to each dataset, which means the code will have to be re-written for each problem.

Automated feature engineering with Featuretools allows one to create thousands of features automatically from a set of related tables using a framework that can be easily applied to any problem.



Featuretools offers us the following benefits:

  1. Up to 10x reduction in development time
  2. Better predictive performance
  3. Interpretable features with real-world significance
  4. Fits into existing machine learning pipelines
  5. Ensures data is valid in time-series problems

Automated feature engineering will change the way you do machine learning by allowing you to develop better predictive models in a fraction of the time as the traditional approach.


For the highlights of the project, check out "Why Automated Feature Engineering Will Change the Way You Do Machine Learning" on Towards Data Science (Link)


Each of the 3 projects in this repository demonstrates different benefits of using automated feature enginering.

  1. Loan Repayment Prediction: Build Better Models Faster

Given a dataset of 58 millions rows spread across 7 tables and the task of predicting whether or not a client will default on a loan, Featuretools delivered a better predictive model in a fraction of the time as manual feature engineering. The features built by Featuretools are also human-intrepretable and can give us insight into the problem:

  1. Retail Spending Prediction: Ensure Models Use Valid Data

When we have time-series data, we traditionally have to be extremely careful about making sure our model only trains on valid data. Often, a model will work in development only to completely fail in deployment because the training data was not properly filtered based on the time. Featuretools can take care of time filters automatically, allowing us to focus on other aspects of the machine learning pipeline and delivering better overall predictive models:

  1. Engine Life Prediction: Automatically Create Meaningful Features

In this problem of predicting how long an engine will run until it fails, we observe that Featuretools creates meaningful features which can inform our thinking about real-world problems as seen in the most important features:

Scaling with Dask

For an example of how Featuretools can scale - either on a single machine or a cluster - see the Featuretools on Dask notebook.

Feature Labs

Feature Labs

Featuretools was created by the developers at Feature Labs. If building impactful data science pipelines is important to you or your business, please get in touch.


Any questions can be directed to