Skip to content
An open source python library for automated feature engineering
Python Other
  1. Python 99.4%
  2. Other 0.6%
Branch: master
Clone or download
rwedge Run unit tests in windows environment (#790)
* use windows orb

* add executor to windows job

* use windows-based container image for python

* remove unnecessary flag

* get working directory

* try mounting a folder

* make virtual environment

* separate pulling image into a different step

* activate virtual environment and upgrade pip

* try fixing env activation

* enclose command in quotes so all commands are executed in container

* upgrade pip

* use powershell -Command

* activate virtual environment

* split command onto multiple lines; install ft and run pytest

* add \ marks to end of lines

* remove \

* check current directory contents

* mount featuretools folder into docker container

* check if python is installed on vm

* try to install conda

* try without -C

* enclose variable values in quotes

* use quotes to get values of variables

* use backslash in filepath

* simplify web request

* try multi-line

* just get url

* no variables

* store response in variable

* add ping cmd

* disable progress ui

* use variables / check directory for file

* add back in hash checking and installation

* debugging and install dir change

* throw error if hash comparison fails

* run installer in second command

* fix installer name

* back to start-process

* add AddToPath argument

* run in same window

* run installer via cmd prompt

* use full path to exe

* remove quotes from command

* check if powershell installed miniconda

* remove powershell install attempt

* activate conda in powershell

* aboslute path to conda hook script

* install featuretools and run tests

* run pytest; hide conda progress bar

* format step names

* install graphviz

* Install graphviz instead of python-graphviz

* switch back to python-graphviz

* remove colons from temp tar filename

* use double quotes to check error messages

* rename task now that only graphviz is installed

* remove exact file path in error message

* try raw string pattern

* avoid using match due to windows paths using escape character

* add encoding kwarg when loading features using s3fs

* test multiple windows versions

* flip slash direction

* fix uri

* use back slash instead of escape character

* add back original tests

* fix py37 windows test name

* run windows tests in parallel

* update changelog
Latest commit cd674b3 Dec 3, 2019

README.md

Featuretools

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to Know about Machine Learning

Circle CI Coverage Status PyPI version Anaconda-Server Badge StackOverflow PyPI - Downloads

Featuretools is a python library for automated feature engineering. See the documentation for more information.

Installation

Install with pip

python -m pip install featuretools

or from the Conda-forge channel on conda:

conda install -c conda-forge featuretools

Add-ons

You can install add-ons individually or all at once by running

python -m pip install featuretools[complete]

Update checker - Receive automatic notifications of new Featuretools releases

python -m pip install featuretools[update_checker]

TSFresh Primitives - Use 60+ primitives from tsfresh within Featuretools

python -m pip install featuretools[tsfresh]

Example

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

>> import featuretools as ft
>> es = ft.demo.load_mock_customer(return_entityset=True)
>> es.plot()

Featuretools can automatically create a single table of features for any "target entity"

>> feature_matrix, features_defs = ft.dfs(entityset=es, target_entity="customers")
>> feature_matrix.head(5)
            zip_code  COUNT(transactions)  COUNT(sessions)  SUM(transactions.amount) MODE(sessions.device)  MIN(transactions.amount)  MAX(transactions.amount)  YEAR(join_date)  SKEW(transactions.amount)  DAY(join_date)                   ...                     SUM(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.MEAN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                                                                                                                                                                                  ...
1              60091                  131               10                  10236.77               desktop                      5.60                    149.95             2008                   0.070041               1                   ...                                                     169.77                                 0.610052                                   41.95                               791.976505                              175.939423                                 9.299023                                 -0.377150                                5.857976                                        1                                -0.395358
2              02139                  122                8                   9118.81                mobile                      5.81                    149.15             2008                   0.028647              20                   ...                                                     114.85                                 0.492531                                   42.96                               596.243506                              230.333502                                10.925037                                  0.962350                                7.420480                                        1                                -0.470007
3              02139                   78                5                   5758.24               desktop                      6.78                    147.73             2008                   0.070814              10                   ...                                                      64.98                                 0.645728                                   21.77                               369.770121                              471.048551                                 9.819148                                 -0.244976                               12.537259                                        1                                -0.630425
4              60091                  111                8                   8205.28               desktop                      5.73                    149.56             2008                   0.087986              30                   ...                                                      83.53                                 0.516262                                   17.27                               584.673126                              322.883448                                13.065436                                 -0.548969                               12.738488                                        1                                -0.497169
5              02139                   58                4                   4571.37                tablet                      5.91                    148.17             2008                   0.085883              19                   ...                                                      73.09                                 0.830112                                   27.46                               313.448942                              198.522508                                 8.950528                                  0.098885                                5.599228                                        1                                -0.396571

[5 rows x 69 columns]

We now have a feature vector for each customer that can be used for machine learning. See the documentation on Deep Feature Synthesis for more examples.

Demos

Predict Next Purchase

Repository | Notebook

In this demonstration, we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems. For more advanced users, we show how to scale that pipeline to a large dataset using Dask.

For more examples of how to use Featuretools, check out our demos page.

Testing & Development

The Featuretools community welcomes pull requests. Instructions for testing and development are available here.

Support

The Featuretools community is happy to provide support to users of Featuretools. Project support can be found in four places depending on the type of question:

  1. For usage questions, use Stack Overflow with the featuretools tag.
  2. For bugs, issues, or feature requests start a Github issue.
  3. For discussion regarding development on the core library, use Slack.
  4. For everything else, the core developers can be reached by email at help@featuretools.com.

Citing Featuretools

If you use Featuretools, please consider citing the following paper:

James Max Kanter, Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. IEEE DSAA 2015.

BibTeX entry:

@inproceedings{kanter2015deep,
  author    = {James Max Kanter and Kalyan Veeramachaneni},
  title     = {Deep feature synthesis: Towards automating data science endeavors},
  booktitle = {2015 {IEEE} International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19-21, 2015},
  pages     = {1--10},
  year      = {2015},
  organization={IEEE}
}

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

You can’t perform that action at this time.