featuretools
Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.
Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.
python
import featuretools as ft
python
data = ft.demo.load_mock_customer()
In this toy dataset, there are 3 tables. Each table is called an entity
in Featuretools.
- customers: unique customers who had sessions
- sessions: unique sessions and associated attributes
- transactions: list of events in this session
python
customers_df = data["customers"] customers_df
sessions_df = data["sessions"] sessions_df.sample(5)
transactions_df = data["transactions"] transactions_df.sample(5)
First, we specify a dictionary with all the entities in our dataset.
python
- entities = {
"customers" : (customers_df, "customer_id"), "sessions" : (sessions_df, "session_id", "session_start"), "transactions" : (transactions_df, "transaction_id", "transaction_time")
}
Second, we specify how the entities are related. When two entities have a one-to-many relationship, we call the "one" enitity, the "parent entity". A relationship between a parent and child is defined like this:
(parent_entity, parent_variable, child_entity, child_variable)
In this dataset we have two relationships
python
- relationships = [("sessions", "session_id", "transactions", "session_id"),
("customers", "customer_id", "sessions", "customer_id")]
Note
To manage setting up entities and relationships, we recommend using the EntitySet <featuretools.EntitySet>
class which offers convenient APIs for managing data like this. See loading_data/using_entitysets
for more information.
A minimal input to DFS is a set of entities, a list of relationships, and the "target_entity" to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.
Let's first create a feature matrix for each customer in the data
python
- feature_matrix_customers, features_defs = ft.dfs(entities=entities,
relationships=relationships, target_entity="customers")
feature_matrix_customers
We now have dozens of new features to describe a customer's behavior.
One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. For example, if we wanted to build features for sessions.
python
- feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
relationships=relationships, target_entity="sessions")
feature_matrix_sessions.head(5)
- Learn about
loading_data/using_entitysets
- Apply automated feature engineering with
automated_feature_engineering/afe
- Explore runnable demos based on real world use cases
- Can't find what you're looking for? Ask for
help
self getting_started/install loading_data/using_entitysets automated_feature_engineering/afe automated_feature_engineering/primitives variables automated_feature_engineering/handling_time
guides/tuning_dfs guides/specifying_primitive_options guides/performance guides/using_dask_entitysets guides/deployment guides/advanced_custom_primitives
frequently_asked_questions help usage_tips/limitations usage_tips/glossary ecosystem api_reference Primitive Reference <https://primitives.featurelabs.com/> changelog
feature_engineering_language/feature-types guides/save_progress_example
genindex
search