<a href="https://colab.research.google.com/github/featuretools/colab_notebook/blob/master/getting_started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is Featuretools?
<img src="https://docs.featuretools.com/_images/featuretools-logo.png" alt="Featuretools" width="500"/>

**Featuretools** is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.


## Using Google Colab with Featuretools

* Google [Colaboratory](http://colab.research.google.com) is a cloud service based on Jupyter Notebooks.
* Using [Featuretools](http://www.featuretools.com) is simple as installing with pip and upgrading to the latest version. We need to make sure the correct version for pandas, and numpy are installed. 

In [None]:
!pip install featuretools --upgrade

### You will now need to restart the Google runtime in toolbar.
* **Runtime -> Restart Runtime...**

In [None]:
import featuretools as ft
import pandas as pd
import numpy as np

print('featuretools == %s' % ft.__version__)
print('pandas == %s' % pd.__version__)
print('numpy == %s' % np.__version__)

* Verify `featuretools` version is >= 0.6.0
* Verify `pandas` version is >= 0.23.0
* Verify `numpy` version is >= 1.13.3

## 5 Minute Quick Start

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

In [None]:
import featuretools as ft

## Load Mock Data

In [None]:
data = ft.demo.load_mock_customer()

## Prepare data
In this toy dataset, there are 3 tables. Each table is called an ``entity`` in Featuretools.

* **customers**: unique customers who had sessions
* **sessions**: unique sessions and associated attributes
* **transactions**: list of events in this session
* **products**: list of products involved in the transactions.

In [None]:
customers_df = data["customers"]
customers_df

In [None]:
sessions_df = data["sessions"]
sessions_df.sample(5)

In [None]:
transactions_df = data["transactions"]
transactions_df.sample(5)

In [None]:
products_df = data['products']
products_df

First, we specify a name for the EntitySet.

In [None]:
es = ft.EntitySet(id="transactions")

Second, we specify the entity dataframes, along with the applicable attributes, such as the index for the entity.

In [None]:
es = es.entity_from_dataframe(entity_id="transactions",
                              dataframe=transactions_df,
                              index="transaction_id",
                              time_index="transaction_time",
                              variable_types={"product_id": ft.variable_types.Categorical})

es = es.entity_from_dataframe(entity_id="products",
                              dataframe=products_df,
                              index="product_id")

es = es.entity_from_dataframe(entity_id="sessions",
                              dataframe=sessions_df,
                              index="session_id",
                              time_index="session_start")

es = es.entity_from_dataframe(entity_id="customers",
                              dataframe=customers_df,
                              index="customer_id",
                              time_index="join_date",
                              variable_types={"zip_code": ft.variable_types.Categorical})

Third, we specify how the entities are related. When two entities have a one-to-many relationship, we call the "one" enitity, the "parent entity". A relationship between a parent and child is defined like this:
```python
(parent_entity, child_entity)
```

In this dataset we have two relationships

In [None]:
relationships =[ft.Relationship(es["products"]["product_id"], es["transactions"]["product_id"]),
                ft.Relationship(es["sessions"]["session_id"], es["transactions"]["session_id"]),
                ft.Relationship(es["customers"]["customer_id"], es["sessions"]["customer_id"])]

## Run Deep Feature Synthesis

A minimal input to DFS is a entityset, and the "target_entity" to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature defintions.

Let's first create a feature matrix for each customer in the data

In [None]:
feature_matrix_customers, features_defs = ft.dfs(entityset=es,
                                                 target_entity="customers")
feature_matrix_customers

We now have dozens of new features to describe a customer's behavior.


## Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for *any* entity in our data. For example, if we wanted to build features for sessions.

In [None]:
feature_matrix_sessions, features_defs = ft.dfs(entityset=es,
                                                target_entity="sessions")
feature_matrix_sessions.head(5)

## What's next?


* Learn about [Representing Data with Entityset](https://docs.featuretools.com/loading_data/using_entitysets.html)
* Apply automated feature engineering with [Deep Feature Synthesis](https://docs.featuretools.com/automated_feature_engineering/afe.html)
* Can't find what you're looking for? Ask for [Help](https://docs.featuretools.com/help.html)