.. featuretools documentation main file, created by
   sphinx-quickstart on Thu May 19 20:40:30 2016.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. currentmodule:: featuretools


# What is Featuretools?



.. image:: _static/images/featuretools_nav2.svg
   :width: 500 px
   :alt: Featuretools
   :align: center

**Featuretools** is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.


.. _quick-start:

## 5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

In [4]:
import featuretools as ft

### Load Mock Data

In [5]:
data = ft.demo.load_mock_customer()

### Prepare data


In this toy dataset, there are 3 DataFrames.

- **customers**: unique customers who had sessions
- **sessions**: unique sessions and associated attributes
- **transactions**: list of events in this session


In [6]:
customers_df = data["customers"]
customers_df

Unnamed: 0,customer_id,zip_code,join_date,date_of_birth
0,1,60091,2011-04-17 10:48:33,1994-07-18
1,2,13244,2012-04-15 23:31:04,1986-08-18
2,3,13244,2011-08-13 15:42:34,2003-11-21
3,4,60091,2011-04-08 20:08:14,2006-08-15
4,5,60091,2010-07-17 05:27:50,1984-07-28


In [7]:
sessions_df = data["sessions"]
sessions_df.sample(5)

Unnamed: 0,session_id,customer_id,device,session_start
13,14,1,tablet,2014-01-01 03:28:00
6,7,3,tablet,2014-01-01 01:39:40
1,2,5,mobile,2014-01-01 00:17:20
28,29,1,mobile,2014-01-01 07:10:05
24,25,3,desktop,2014-01-01 05:59:40


In [8]:
transactions_df = data["transactions"]
transactions_df.sample(5)

Unnamed: 0,transaction_id,session_id,transaction_time,product_id,amount
74,232,5,2014-01-01 01:20:10,1,139.2
231,27,17,2014-01-01 04:10:15,2,90.79
434,36,31,2014-01-01 07:50:10,3,62.35
420,56,30,2014-01-01 07:35:00,3,72.7
54,444,4,2014-01-01 00:58:30,4,43.59


First, we specify a dictionary with all the DataFrames in our dataset. The DataFrames are passed in with their index column and time index column if one exists for the DataFrame.

In [10]:
dataframes = {
   "customers" : (customers_df, "customer_id"),
   "sessions" : (sessions_df, "session_id", "session_start"),
   "transactions" : (transactions_df, "transaction_id", "transaction_time")
}

Second, we specify how the DataFrames are related. When two DataFrames have a one-to-many relationship, we call the "one" DataFrame, the "parent DataFrame". A relationship between a parent and child is defined like this:
    
    (parent_dataframe, parent_column, child_dataframe, child_column)

In this dataset we have two relationships

In [11]:
relationships = [("sessions", "session_id", "transactions", "session_id"),
                 ("customers", "customer_id", "sessions", "customer_id")]

### Run Deep Feature Synthesis

A minimal input to DFS is a dictionary of DataFrames, a list of relationships, and the "target_dataframe" to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature definitions.

Let's first create a feature matrix for each customer in the data

In [13]:
feature_matrix_customers, features_defs = ft.dfs(dataframes=dataframes,
                                                 relationships=relationships,
                                                 target_dataframe_name="customers")
feature_matrix_customers

Unnamed: 0_level_0,zip_code,COUNT(sessions),MODE(sessions.device),NUM_UNIQUE(sessions.device),COUNT(transactions),MAX(transactions.amount),MEAN(transactions.amount),MIN(transactions.amount),MODE(transactions.product_id),NUM_UNIQUE(transactions.product_id),...,STD(sessions.SKEW(transactions.amount)),STD(sessions.SUM(transactions.amount)),SUM(sessions.MAX(transactions.amount)),SUM(sessions.MEAN(transactions.amount)),SUM(sessions.MIN(transactions.amount)),SUM(sessions.NUM_UNIQUE(transactions.product_id)),SUM(sessions.SKEW(transactions.amount)),SUM(sessions.STD(transactions.amount)),MODE(transactions.sessions.device),NUM_UNIQUE(transactions.sessions.device)
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,60091,8,mobile,3,126,139.43,71.631905,5.81,4,5,...,0.589386,279.510713,1057.97,582.193117,78.59,40,-0.476122,312.745952,mobile,3
2,13244,7,desktop,3,93,146.81,77.422366,8.73,4,5,...,0.509798,251.609234,931.63,548.905851,154.6,35,-0.27764,258.700528,desktop,3
3,13244,6,desktop,3,93,149.15,67.06043,5.89,1,5,...,0.429374,219.02142,847.63,405.237462,66.21,29,2.286086,257.299895,desktop,3
4,60091,8,mobile,3,109,149.95,80.070459,5.73,2,5,...,0.387884,235.992478,1157.99,649.657515,131.51,37,0.002764,356.125829,mobile,3
5,60091,6,mobile,3,79,149.02,80.375443,7.55,5,5,...,0.415426,402.775486,839.76,472.231119,86.49,30,0.014384,259.873954,mobile,3


We now have dozens of new features to describe a customer's behavior.


### Change target dataframe
One of the reasons DFS is so powerful is that it can create a feature matrix for *any* DataFrame in our EntitySet. For example, if we wanted to build features for sessions.

In [17]:
dataframes = {
   "customers" : (customers_df.copy(), "customer_id"),
   "sessions" : (sessions_df.copy(), "session_id", "session_start"),
   "transactions" : (transactions_df.copy(), "transaction_id", "transaction_time")
}

In [18]:
feature_matrix_sessions, features_defs = ft.dfs(dataframes=dataframes,
                                                relationships=relationships,
                                                target_dataframe_name="sessions")
feature_matrix_sessions.head(5)

Unnamed: 0_level_0,customer_id,device,COUNT(transactions),MAX(transactions.amount),MEAN(transactions.amount),MIN(transactions.amount),MODE(transactions.product_id),NUM_UNIQUE(transactions.product_id),SKEW(transactions.amount),STD(transactions.amount),...,customers.STD(transactions.amount),customers.SUM(transactions.amount),customers.DAY(date_of_birth),customers.DAY(join_date),customers.MONTH(date_of_birth),customers.MONTH(join_date),customers.WEEKDAY(date_of_birth),customers.WEEKDAY(join_date),customers.YEAR(date_of_birth),customers.YEAR(join_date)
session_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,2,desktop,16,141.66,76.813125,20.91,3,5,0.295458,41.600976,...,37.705178,7200.28,18,15,8,4,0,6,1986,2012
2,5,mobile,10,135.25,74.696,9.32,5,5,-0.16055,45.893591,...,44.09563,6349.66,28,17,7,7,5,5,1984,2010
3,4,mobile,15,147.73,88.6,8.7,1,5,-0.324012,46.240016,...,45.068765,8727.68,15,8,8,4,1,4,2006,2011
4,1,mobile,25,129.0,64.5572,6.29,5,5,0.234349,40.187205,...,40.442059,9025.62,18,17,7,4,0,6,1994,2011
5,4,mobile,11,139.2,70.638182,7.43,5,5,0.336381,48.918663,...,45.068765,8727.68,15,8,8,4,1,4,2006,2011


Understanding Feature Output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In general, Featuretools references generated features through the feature name. In order to make features easier to understand, Featuretools offers two additional tools, :func:`featuretools.graph_feature` and :func:`featuretools.describe_feature`, to help explain what a feature is and the steps Featuretools took to generate it. [let's look at this example feature]


In [None]:
    feature = features_defs[18]
    feature



Feature lineage graphs
""""""""""""""""""""""
Feature lineage graphs visually walk through feature generation. Starting from the base data, they show step by step the primitives applied and intermediate features generated to create the final feature.


In [None]:
    ft.graph_feature(feature)


.. graphviz:: getting_started/graphs/demo_feat.dot

Feature descriptions
""""""""""""""""""""
Featuretools can also automatically generate English sentence descriptions of features. Feature descriptions help to explain what a feature is, and can be further improved by including manually defined custom definitions. See :doc:`/guides/feature_descriptions` for more details on how to customize automatically generated feature descriptions.


In [None]:
    ft.describe_feature(feature)


.. Technical problems it solves
.. ----------------------------

.. * Automatically creates features that require human intuition and expertise. Read more in :ref:`deep-feature-synthesis`
.. * Carefully handles time for predictive analytics use cases. Read more in :ref:`handling-time`.
.. * Creating feature engineering primitives that can be reused across datasets. Read more in :ref:`primitives`.





.. Featuretools can automatically identifying the best transformations, as well as dealing with time.
.. * It can be customized to address feature engineering use cases and is general enough to work across domains. It structures the process of transforming raw data into feature vectors ready for machine learning.
.. * It enables quick iteration through a unified interface to define prediction problems and feature transformations. It supports binary, multi-class, and regression predictions, as well as unsupervised learning approaches such as clustering or anomaly detection.
.. with and without experience building predictive models. Most functions in the library have sensible defaults to make it easy to run end to end with little configuration, but it does not intend to be a black box. The lower level API of Featuretools is a great way for those new to predictive modeling to learn how to build models from raw data, while enabling experts to maintain full control of how the framework handles their data.


What's next?
------------

* Learn about :doc:`getting_started/using_entitysets`
* Apply automated feature engineering with :doc:`getting_started/afe`
* Explore `runnable demos <https://www.featuretools.com/demos>`__ based on real world use cases
* Can't find what you're looking for? Ask for :doc:`resources/help`




Table of contents
-----------------

.. toctree::
   :maxdepth: 1

   install

.. toctree::
   :maxdepth: 2

   getting_started/getting_started_index
   guides/guides_index

.. toctree::
   :maxdepth: 1
   :caption: Resources and References

   resources/resources_index
   api_reference
   Primitives <https://primitives.featurelabs.com/>
   release_notes

Other links
------------
* :ref:`genindex`
* :ref:`search`
