In [1]:
import featuretools as ft

### Representing data with EntitySets
Representing data with EntitySets is a fundamental concept in Feature Engineering using the featuretools library. An EntitySet is essentially a data structure that holds multiple tables (DataFrames) and captures relationships between them. It provides a way to organize and manage your data for feature engineering purposes.

#### EntitySet
An EntitySet is a collection of dataframes and the relationships between them. They are useful for preparing raw, structured datasets for feature engineering. While many functions in Featuretools take dataframes and relationships as separate arguments, it is recommended to create an EntitySet, so you can more easily manipulate your data as needed.

### The Raw Data
Below we have two tables of data (represented as Pandas DataFrames) related to customer transactions. The first is a merge of transactions, sessions, and customers so that the result looks like something you might see in a log file:

In [4]:
# Import Raw data
data = ft.demo.load_mock_customer()
transaction_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
transaction_df.head()

Unnamed: 0,transaction_id,session_id,transaction_time,product_id,amount,customer_id,device,session_start,zip_code,join_date,date_of_birth
0,298,1,2014-01-01 00:00:00,5,127.64,2,desktop,2014-01-01,13244,2012-04-15 23:31:04,1986-08-18
1,2,1,2014-01-01 00:01:05,2,109.48,2,desktop,2014-01-01,13244,2012-04-15 23:31:04,1986-08-18
2,308,1,2014-01-01 00:02:10,3,95.06,2,desktop,2014-01-01,13244,2012-04-15 23:31:04,1986-08-18
3,116,1,2014-01-01 00:03:15,4,78.92,2,desktop,2014-01-01,13244,2012-04-15 23:31:04,1986-08-18
4,371,1,2014-01-01 00:04:20,3,31.54,2,desktop,2014-01-01,13244,2012-04-15 23:31:04,1986-08-18


And the second dataframe is a list of products involved in those transactions.

In [5]:
products_df = data["products"]
products_df.head()

Unnamed: 0,product_id,brand
0,1,B
1,2,B
2,3,B
3,4,B
4,5,A


### Creating an EntitySet
First, we initialize an EntitySet. If you’d like to give it a name, you can optionally provide an id to the constructor.

In [None]:
es = ft.EntitySet(id="customer_data")

### Adding dataframes

To get started, we add the transactions dataframe to the EntitySet. In the call to add_dataframe, we specify three important parameters:
* The index parameter specifies the column that uniquely identifies rows in the dataframe.
* The time_index parameter tells Featuretools when the data was created.
* The logical_types parameter indicates that “product_id” should be interpreted as a Categorical column, even though it is just an integer in the underlying data.

In [10]:
from woodwork.logical_type import Categorical, PostalCode

ModuleNotFoundError: No module named 'woodwork.logical_type'