Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
7 contributors

Users who have contributed to this file

@kmax12 @gsheni @rwedge @Seth-Rothschild @bphi @frances-h @bentona
196 lines (124 sloc) 6.3 KB
.. currentmodule:: featuretools


What is Featuretools?

Featuretools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

.. ipython:: python

    import featuretools as ft


Load Mock Data

.. ipython:: python

    data = ft.demo.load_mock_customer()


Prepare data

In this toy dataset, there are 3 tables. Each table is called an entity in Featuretools.

  • customers: unique customers who had sessions
  • sessions: unique sessions and associated attributes
  • transactions: list of events in this session
.. ipython:: python

    customers_df = data["customers"]
    customers_df

    sessions_df = data["sessions"]
    sessions_df.sample(5)

    transactions_df = data["transactions"]
    transactions_df.sample(5)

First, we specify a dictionary with all the entities in our dataset.

.. ipython:: python

    entities = {
       "customers" : (customers_df, "customer_id"),
       "sessions" : (sessions_df, "session_id", "session_start"),
       "transactions" : (transactions_df, "transaction_id", "transaction_time")
    }


Second, we specify how the entities are related. When two entities have a one-to-many relationship, we call the "one" enitity, the "parent entity". A relationship between a parent and child is defined like this:

(parent_entity, parent_variable, child_entity, child_variable)

In this dataset we have two relationships

.. ipython:: python

    relationships = [("sessions", "session_id", "transactions", "session_id"),
                     ("customers", "customer_id", "sessions", "customer_id")]


Note

To manage setting up entities and relationships, we recommend using the :class:`EntitySet <featuretools.EntitySet>` class which offers convenient APIs for managing data like this. See :doc:`loading_data/using_entitysets` for more information.

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities, a list of relationships, and the "target_entity" to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.

Let's first create a feature matrix for each customer in the data

.. ipython:: python

    feature_matrix_customers, features_defs = ft.dfs(entities=entities,
                                                     relationships=relationships,
                                                     target_entity="customers")
    feature_matrix_customers

We now have dozens of new features to describe a customer's behavior.

Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.

.. ipython:: python

    feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
                                                    relationships=relationships,
                                                    target_entity="sessions")
    feature_matrix_sessions.head(5)


What's next?

Table of contents

.. toctree::
   :maxdepth: 1
   :caption: Getting Started

   self
   getting_started/install
   loading_data/using_entitysets
   automated_feature_engineering/afe
   automated_feature_engineering/primitives
   automated_feature_engineering/handling_time

.. toctree::
   :maxdepth: 1
   :caption: Guides

   guides/tuning_dfs
   guides/specifying_primitive_options
   guides/performance
   guides/parallel
   guides/deployment
   guides/advanced_custom_primitives

.. toctree::
   :maxdepth: 1
   :caption: Resources and References

   frequently_asked_questions
   help
   featuretools_enterprise
   usage_tips/limitations
   usage_tips/glossary
   ecosystem
   api_reference
   changelog


.. toctree::
   :maxdepth: 1
   :caption: Hide from Table of Contents
   :hidden:

   feature_engineering_language/feature-types
   guides/save_progress_example

Other links

You can’t perform that action at this time.