Skip to content

Latest commit

 

History

History
197 lines (125 loc) · 6.36 KB

index.rst

File metadata and controls

197 lines (125 loc) · 6.36 KB

featuretools

What is Featuretools?

Featuretools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

python

import featuretools as ft

Load Mock Data

python

data = ft.demo.load_mock_customer()

Prepare data

In this toy dataset, there are 3 tables. Each table is called an entity in Featuretools.

  • customers: unique customers who had sessions
  • sessions: unique sessions and associated attributes
  • transactions: list of events in this session

python

customers_df = data["customers"] customers_df

sessions_df = data["sessions"] sessions_df.sample(5)

transactions_df = data["transactions"] transactions_df.sample(5)

First, we specify a dictionary with all the entities in our dataset.

python

entities = {

"customers" : (customers_df, "customer_id"), "sessions" : (sessions_df, "session_id", "session_start"), "transactions" : (transactions_df, "transaction_id", "transaction_time")

}

Second, we specify how the entities are related. When two entities have a one-to-many relationship, we call the "one" enitity, the "parent entity". A relationship between a parent and child is defined like this:

(parent_entity, parent_variable, child_entity, child_variable)

In this dataset we have two relationships

python

relationships = [("sessions", "session_id", "transactions", "session_id"),

("customers", "customer_id", "sessions", "customer_id")]

Note

To manage setting up entities and relationships, we recommend using the EntitySet <featuretools.EntitySet> class which offers convenient APIs for managing data like this. See loading_data/using_entitysets for more information.

Run Deep Feature Synthesis

A minimal input to DFS is a set of entities, a list of relationships, and the "target_entity" to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.

Let's first create a feature matrix for each customer in the data

python

feature_matrix_customers, features_defs = ft.dfs(entities=entities,

relationships=relationships, target_entity="customers")

feature_matrix_customers

We now have dozens of new features to describe a customer's behavior.

Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.

python

feature_matrix_sessions, features_defs = ft.dfs(entities=entities,

relationships=relationships, target_entity="sessions")

feature_matrix_sessions.head(5)

What's next?

  • Learn about loading_data/using_entitysets
  • Apply automated feature engineering with automated_feature_engineering/afe
  • Explore runnable demos based on real world use cases
  • Can't find what you're looking for? Ask for help

Table of contents

self getting_started/install loading_data/using_entitysets automated_feature_engineering/afe automated_feature_engineering/primitives variables automated_feature_engineering/handling_time

guides/tuning_dfs guides/specifying_primitive_options guides/performance guides/using_dask_entitysets guides/deployment guides/advanced_custom_primitives

frequently_asked_questions help usage_tips/limitations usage_tips/glossary ecosystem api_reference Primitive Reference <https://primitives.featurelabs.com/> changelog

feature_engineering_language/feature-types guides/save_progress_example

Other links

  • genindex
  • search