# Transitioning to Featuretools Version 1.0

Featuretools version 1.0 incorporates many significant changes that impact the way EntitySets are created, how primitives are defined, and in some cases the resulting feature matrix that is created. This document will provide an overview of the significant changs, helping existing Featuretools users transition to version 1.0.

## Background and Introduction

### Why make these changes?
Over time it became clear that several open source libraries were defining custom type systems to define variable types for columns in a dataframe. Having custom type systems in individual libraries makes it difficult to communicate typing information between the libraries.

To help resolve this problem, a new libary called [Woodwork](https://woodwork.alteryx.com/en/stable/) was developed to help implement a standard typing system that can be shared by multiple libraries. These changes are being made to replace the existing custom type system defined in older versions of Featuretools, with the typing system defined byu Woodwork. The use of Woodwork in Featuretools will allow typing in a feature matrix created by Featuretools to be used seamlessly for creating machine learning models using [EvalML](https://evalml.alteryx.com/en/stable/), which also uses Woodwork to manage typing information.

### What has changed?
Previous releases of Featuretools used a custom type system to define the types for each column in a dataframe. This typing information was stored in a `Variable`, and the types for all columns in a dataframe were stored in an `Entity`. The biggest change with the integration of Woodwork into Featuretools is that both the `Entity` and `Variable` classes have been removed from Featuretools.

The previous `Entity` class has been replaced by a Woodwork dataframe. The Woodwork dataframe now stores all of the column typing information that was previously stored in the `Entity`.

Column typing information which was previously stored as a Featuretools `Variable` is now stored on the Woodwork dataframe as a `ColumnSchema` object. The typing information for the column is stored as a combination of a Woodwork `LogicalType` and one or more `semantic_tags` inside the `ColumnSchema` object. For more information on how Woodwork manages typing information, refer to the [Woodwork Understanding Types and Tags](https://woodwork.alteryx.com/en/stable/guides/logical_types_and_semantic_tags.html) guide.

### What do these changes mean for users?
The removal of these classes required several methods that were previously called from an `Entity` object to be moved to the `EntitySet` object instead. This change also impacts the way relationships, features and primitives are defined, requiring different parameters than were previously required. Also, in some cases, because the Woodwork typing system is not identical to the old Featuretools typing system, the feature matrix that is returned can be slightly different as a result of columns being identified as different types.

All of these changes, and more, will be reviewed in more detail throughout this document, providing examples of both the old and new API where possible.

## Removal of `Entity` Class and Updates to `EntitySet`

In previous versions of Featuretools and EntitySet was created by adding multiple entities and then defining relationships between variables (columns) in different entities. Starting in Featuretools version 1.0, EntitySets are now created by adding multiple dataframes and defining relationships between columns in the dataframes. While conceptually similar, there are some minor differences in the process.

### Adding dataframes to an EntitySet

When adding dataframes to an EntitySet, users can pass in a Woodwork dataframe, or a regular dataframe without Woodwork typing information. As before, Featuretools support creating EntitySets from pandas, Dask and Koalas dataframes. If users supply a dataframe that has Woodwork typing information initialized, Featuretools will simply use this typing information directly, If users supply a dataframe without Woodwork initialized, Featuretools will initialize Woodwork on the dataframe, peforming type inference for any column that does not have typing information specified.

Below are some examples to illustrate this process. First we will create two small dataframes to use for the example.

In [3]:
import featuretools as ft
import pandas as pd
import woodwork as ww

In [4]:
orders_df = pd.DataFrame({
    'order_id': [0, 1, 2],
    'order_date': ['2021-01-02', '2021-01-03', '2021-01-04']
})
items_df = pd.DataFrame({
    'id': [0, 1, 2, 3, 4],
    'order_id': [0, 1, 1, 2, 2],
    'item_price': [29.95, 4.99, 10.25, 20.50, 15.99],
    'on_sale': [False, True, False, True, False]
})

With older versions of Featuretools,users would first create an EntitySet object, and then add dataframes to the EntitySet, by calling `entity_from_dataframe` as shown below.

```python
es = ft.EntitySet('old_es')

es.entity_from_dataframe(dataframe=orders_df,
                         entity_id='orders',
                         index='order_id',
                         time_index='order_date')
es.entity_from_dataframe(dataframe=items_df,
                         entity_id='items',
                         index='id')
es
```

```
Entityset: old_es
  Entities:
    orders [Rows: 3, Columns: 2]
    items [Rows: 5, Columns: 3]
  Relationships:
    No relationships
```

With Featuretools v1.0, the steps for adding a dataframe to an EntitySet are the same, but some of the details have changed. First, create an EntitySet as before. To add the dataframe call `EntitySet.add_dataframe` in place of the previous `EntitySet.entity_from_dataframe` call. Note that the name of the dataframe is specified in the `dataframe_name` argument, which was previously called `entity_id`.

In [5]:
es = ft.EntitySet('new_es')

es.add_dataframe(dataframe=orders_df,
                 dataframe_name='orders',
                 index='order_id',
                 time_index='order_date')

Entityset: new_es
  DataFrames:
    orders [Rows: 3, Columns: 2]
  Relationships:
    No relationships

You can also define the name, index and time index by first initializing Woodwork on the dataframe and then passing the Woodwork initialized dataframe directly to the `add_dataframe` call. For this example we will initialize Woodwork on `items_df`, setting the dataframe name as `items` and specifying that the index should be the `id` column.

In [6]:
items_df.ww.init(name='items', index='id')
items_df.ww

Unnamed: 0_level_0,Physical Type,Logical Type,Semantic Tag(s)
Column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
id,int64,Integer,['index']
order_id,int64,Integer,['numeric']
item_price,float64,Double,['numeric']
on_sale,bool,Boolean,[]


With Woodwork initialized, we no longer need to specify values for the `dataframe_name` or `index` arguments when calling `add_dataframe` as Featuretools will simply use the values that were already specified when Woodwork was initialized.

In [7]:
es.add_dataframe(dataframe=items_df)

Entityset: new_es
  DataFrames:
    orders [Rows: 3, Columns: 2]
    items [Rows: 5, Columns: 4]
  Relationships:
    No relationships

### Accessing column typing information

Previously column variable type information could be accessed for an entire entity through `Entity.variable_types` or for an individual column by selecting the individual column first through es['entity_id']['col_id']`.

```python
es['items'].variable_types
```
```
{'id': featuretools.variable_types.variable.Index,
 'order_id': featuretools.variable_types.variable.Numeric,
 'item_price': featuretools.variable_types.variable.Numeric}
```
```python
es['items']['item_price']
```
```
<Variable: item_price (dtype = numeric)>
```

With the updated version of Featuretools, the logical types and semantic tags for all of the columns in a single dataframe can be viewed through the `.ww` namespace on the dataframe. First, select the dataframe from the entityset with `es['dataframe_name']` and then access the typing information by chaining a `.ww` call on the end as shown below.

In [8]:
es['items'].ww

Unnamed: 0_level_0,Physical Type,Logical Type,Semantic Tag(s)
Column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
id,int64,Integer,['index']
order_id,int64,Integer,['numeric']
item_price,float64,Double,['numeric']
on_sale,bool,Boolean,[]


The logical type and semantic tags for a single column can be obtained from the Woodwork columns dictionary stored on the dataframe, returning a `Woodwork.ColumnSchema` object that stores the typing information:

In [9]:
es['items'].ww.columns['item_price']

<ColumnSchema (Logical Type = Double) (Semantic Tags = ['numeric'])>

### Type inference and updating column types

Featuretools will attempt to infer types for any columns that do not have types defined by the user. Prior to version 1.0, Featuretools implemented custom type inference code to determine what variable type should be assigned to each column. You could see the inferred variable types by viewing the contents of the `Entity.variable_types` dictionary.

Starting in Featuretools 1.0, column type inference is being handled by Woodwork. Any columns that do not have a logical type assigned by the user when adding a dataframe to an entityset, will have their logical types inferred by Woodwork. As before, type inference can be skipped for any columns in a dataframe by passing the appropriate logical types in a dictionary when calling `EntitySet.add_dataframe`.

As an example, we can create a new dataframe and add it to an entityset, specifying the logical type for the users full name as the Woodwork `PersonFullName` logical type.

In [10]:
users_df = pd.DataFrame({
    'id': [0, 1, 2],
    'name': ['John Doe', 'Rita Book', 'Teri Dactyl']
})

In [11]:
es.add_dataframe(dataframe=users_df,
                 dataframe_name='users',
                 index='id',
                 logical_types={'name': 'PersonFullName'})

es['users'].ww

Unnamed: 0_level_0,Physical Type,Logical Type,Semantic Tag(s)
Column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
id,int64,Integer,['index']
name,string,PersonFullName,[]


Looking at the typing information above, we can see that the logical type for the `name` column was set to `PersonFullName` as we specified.

Situations where type inference identifies a column as having the incorrect logical type will occur. In these situations, the logical type can be updated using the Woodwork `set_types` method. Let's say we want the the `order_id` column of the `orders` dataframe to have a `Categorical` logical type instead of the `Integer` type that was inferred. Previously, this would have accomplished through the `Entity.convert_variable_type` method.

```python
from featuretools.variable_types import Categorical

es['items'].convert_variable_type(variable_id='order_id', new_type=Categorical)
```

Now, we can perform this same update using Woodwork:

In [12]:
es['items'].ww.set_types(logical_types={'order_id': 'Categorical'})
es['items'].ww

Unnamed: 0_level_0,Physical Type,Logical Type,Semantic Tag(s)
Column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
id,int64,Integer,['index']
order_id,category,Categorical,['category']
item_price,float64,Double,['numeric']
on_sale,bool,Boolean,[]


#### Mapping from old Featuretools variable types to Woodwork ColumnSchemas

Types defined by Woodwork differ from the old variable types that were defined by Featuretools prior to version 1.0. While there is not a direct mapping from the old variable types to the new Woodwork types defined by `ColumnSchema` objects, the approximate mapping are shown below.

**Featuretools Variable** -> **Woodwork Column Schema**
- `Boolean` -> `ColumnSchema(logical_type=Boolean)` or `ColumnSchema(logical_type=BooleanNullable)`
- `Discrete` -> `ColumnSchema(semantic_tags={'category'})`
- `Categorical` -> `ColumnSchema(logical_type=Categorical)`
- `CountryCode` -> `ColumnSchema(logical_type=CountryCode)`
- `Id` -> `ColumnSchema(semantic_tags={'foreign_key'})`
- `SubRegionCode` -> `ColumnSchema(logical_type=SubRegionCode)`
- `ZIPCode` -> `ColumnSchema(logical_type=PostalCode)`
- `Ordinal` -> `ColumnSchema(logical_type=Ordinal)`
- `Datetime` -> `ColumnSchema(logical_type=Datetime)`
- `DateOfBirth` -> `ColumnSchema(logical_type=Datetime, semantic_tags={'date_of_birth'})`
- `TimeIndex` -> `ColumnSchema(semantic_tags={'time_index'})`
- `DatetimeTimeIndex` -> `ColumnSchema(logical_type=Datetime, semantic_tags={'time_index'})`
- `NumericTimeIndex` -> `ColumnSchema(logical_type=Integer, semantic_tags={'time_index'})` or `ColumnSchema(logical_type=Double, semantic_tags={'time_index'})`
- `EmailAddress` -> `ColumnSchema(logical_type=EmailAddress)`
- `FilePath` -> `ColumnSchema(logical_type=Filepath)`
- `FullName` -> `ColumnSchema(logical_type=PersonFullName)`
- `IPAddress` -> `ColumnSchema(logical_type=IPAddress)`
- `Index` -> `ColumnSchema(semantic_tags={'index'})`
- `LatLong` -> `ColumnSchema(logical_type=LatLong)`
- `NaturalLanguage` -> `ColumnSchema(logical_type=NaturalLanguage)`
- `Numeric` -> `ColumnSchema(semantic_tags={'numeric'})`
- `PhoneNumber` -> `ColumnSchema(logical_type=PhoneNumber)`
- `Timedelta` -> `ColumnSchema(logical_type=Timedelta)`
- `URL` -> `ColumnSchema(logical_type=URL)`
- `Unknown` -> `ColumnSchema(logical_type=Unknown)`

### Adding interesting values

Interesting values can be added to all dataframes in an entityset, a single dataframe in an entityset, or to a single column of a dataframe in an entityset.

To add interesting values for all of the dataframes in an entityset, simply call `EntitySet.add_interesting_values`, optionally specifying the maximum number of values to add for each column. This remains unchanged from older versions of Featuretools to the 1.0 release.

Adding values for a single dataframe or for a single column has changed, however. Previously to add interesting values for an entity, users would call `Entity.add_interesting_values()`:
```python
es['items'].add_interesting_values()
```

Now, in order to specify interesting values for a single dataframe, you call `add_interesting_values` on the entityset, and pass the name of the dataframe for which you want interesting values added:

In [13]:
es.add_interesting_values(dataframe_name='items')

Previously, to manually add interesting values for a column, you would simply assign them to the variable:

```python
es['items']['order_id'] = [1, 2]
```

Now, this is done through `EntitySet` add interesting values, passing in the name of the dataframe and a dictionary mapping column names to the interesting values to assign for that column. For example, to assign the interesting values of `[1, 2]` to the `order_id` column of the `items` dataframe, use the following approach:

In [14]:
es.add_interesting_values(dataframe_name='items',
                          values={'order_id': [1, 2]})

Interesting values for multiple columns in the same dataframe can be assigned by adding more entries to the dictionary passed to the `values` parameter.

Accessing interesting values has change as well. Previously interesting values could be viewed from the variable:
```python
es['items']['order_id'].interesting_values
```

Interesting values are now stored in the Woodwork metadata for the columns in a dataframe:

In [15]:
es['items'].ww.columns['order_id'].metadata['interesting_values']

[1, 2]

### Setting a secondary time index

In earlier versions of Featuretools, a secondary time index could be set on an Entity by calling `Entity.set_secondary_time_index`. 
```python
es_flight = ft.demo.load_flight(nrows=100)

arr_time_columns = ['arr_delay', 'dep_delay', 'carrier_delay', 'weather_delay',
                    'national_airspace_delay', 'security_delay',
                    'late_aircraft_delay', 'canceled', 'diverted',
                    'taxi_in', 'taxi_out', 'air_time', 'dep_time']
es_flight['trip_logs'].set_secondary_time_index({'arr_time': arr_time_columns})
```

Since the `Entity` class has been removed in Featuretools 1.0, this now needs to be done through the entityset instead:

In [16]:
es_flight = ft.demo.load_flight(nrows=100)

arr_time_columns = ['arr_delay', 'dep_delay', 'carrier_delay', 'weather_delay',
                    'national_airspace_delay', 'security_delay',
                    'late_aircraft_delay', 'canceled', 'diverted',
                    'taxi_in', 'taxi_out', 'air_time', 'dep_time']
es_flight.set_secondary_time_index(dataframe_name='trip_logs',
                                   secondary_time_index={'arr_time': arr_time_columns})

Downloading data ...


Previously interesting values could be accessed directly from the entity with `es_flight['trip_logs'].secondary_time_index`. Starting in Featuretools 1.0 the secondary time index and the associated columns are stored in the Woodwork dataframe metadata and can be accessed as shown below.

In [17]:
es_flight['trip_logs'].ww.metadata['secondary_time_index']

{'arr_time': ['arr_delay',
  'dep_delay',
  'carrier_delay',
  'weather_delay',
  'national_airspace_delay',
  'security_delay',
  'late_aircraft_delay',
  'canceled',
  'diverted',
  'taxi_in',
  'taxi_out',
  'air_time',
  'dep_time',
  'arr_time']}

### Defining and adding relationships

In earlier versions of Featuretools, relationships were defined by creating a `Relationship` object, which took two `Variables` as inputs. To define a relationship between the orders entity and the items entity, we would first create a `Relationship` object and then add it to the entityset:

```python
relationship = ft.Relationship(es['orders']['order_id'], es['items']['order_id'])
es.add_relationship(relationship)
```

With Featuretools 1.0, the process is similar, but there are two different ways to add the relationship to the entityset. One way is to pass the dataframe and column names to `EntitySet.add_relationship`, and another is to pass a previously created relationship object to the `relationship` keyword argument. Both approaches are demonstrated below.

In [18]:
# Undo change from above and change child column logical type to match parent
es['items'].ww.set_types(logical_types={'order_id': 'Integer'})

es.add_relationship(parent_dataframe_name='orders',
                    parent_column_name='order_id',
                    child_dataframe_name='items',
                    child_column_name='order_id')

Entityset: new_es
  DataFrames:
    orders [Rows: 3, Columns: 2]
    items [Rows: 5, Columns: 4]
    users [Rows: 3, Columns: 2]
  Relationships:
    items.order_id -> orders.order_id

In [19]:
es.relationships = []

Alternatively, we can first create a `Relationship` object and pass that to `EntitySet.add_relationship`. When defining a relationship object we need to pass in the entityset to which it belongs along with the names for the parent dataframe and parent column and the name of the child dataframe and child column.

In [20]:
relationship = ft.Relationship(entityset=es,
                               parent_dataframe_name='orders',
                               parent_column_name='order_id',
                               child_dataframe_name='items',
                               child_column_name='order_id')
es.add_relationship(relationship=relationship)

Entityset: new_es
  DataFrames:
    orders [Rows: 3, Columns: 2]
    items [Rows: 5, Columns: 4]
    users [Rows: 3, Columns: 2]
  Relationships:
    items.order_id -> orders.order_id

### Updating data for a dataframe in an EntitySet

Previously to update the data associated with an entity, users could call `Entity.update_data` and pass in the new dataframe. As an example, let's update the data in our `users` entity:
```python
new_users_df = pd.DataFrame({
    'id': [3, 4],
    'name': ['Anne Teak', 'Art Decco']
})

es['users'].update_data(df=new_users_df)
```

To accomplish this task with Featuretools 1.0, we will use the `EntitySet.update_dataframe` method instead:

In [21]:
new_users_df = pd.DataFrame({
    'id': [0, 1],
    'name': ['Anne Teak', 'Art Decco']
})

es.update_dataframe(dataframe_name='users', df=new_users_df)
es['users']

Unnamed: 0,id,name
0,0,Anne Teak
1,1,Art Decco


## Defining features

The syntax for defining features has changed slightly in Featuretools 1.0. Previously, identify features could be defined simply by passing in the variable that should be use to build the feature.

```python
feature = ft.Feature(es['items']['item_price'])
```

Starting with Featuretools 1.0, a similar syntax can be used, but because `es['items']` will now return a Woodwork dataframe instead of an `Entity`, we need to update the syntax slightly to access the Woodwork column. To update, simply add `.ww` between the dataframe name selector and the column selector as shown below.

In [22]:
feature = ft.Feature(es['items'].ww['item_price'])

Exception: Unrecognized feature initialization

## Defining primitives

In earlier versions of Featuretools primitive input and return types were defined by specifying the appropriate `Varable` class. Starting in version 1.0, the input and return types are defined by Woodwork `ColumnSchema` objects. 

To illustrate this change, let's look closer at the `Age` transform primitive. This primitive takes datetime representing a date of birth and returns a numeric value corresponding to a person's age. In previous versions of Featuretools, the input type was defined by specifying the `DateOfBirth` variable type and the return type was specified by the `Numeric` variable type:

```python
input_types = [DateOfBirth]
return_type = Numeric
```

Woodwork does not have a specific `DateOfBirth` logical type, but rather identifies a column as a date of birth column by specifying the logical type as `Datetime` with a semantic tag of `date_of_birth`. There is also no `Numeric` logical type in Woodwork, but rather Woodwork identifies all columns that can be used for numeric operations with the semantic tag of `numeric`. With these in mind, we can redefine the `Age` input types and return types with `ColumnSchema` objects as follows:

```python
input_types = [ColumnSchema(logical_type=Datetime, semantic_tags={'date_of_birth'})]
return_type = ColumnSchmea(semantic_tags={'numeric'})
```

Aside from changing the way input and return types are defined, the rest of the process for defining primitives remains unchanged.

There are many differences between Featuretools `Variable` types and the corresponding types defined by Woodwork. For a complete overview of how to redefine `Variable` objects as `ColumnSchema` objects, see the mappings provided earlier in this document.

## Changes to Deep Feature Synthesis and Calculate Feature Matrix

The argument names for both `featuretools.dfs` and `featuretools.calculate_feature_matrix` have changed slightly in Featuretools 1.0. In prior versions, users could generate a list of features using the default primitives and options like this:

```python
features = ft.dfs(entityset=es,
                  target_entity='items',
                  features_only=True)
```

In Featuretools 1.0, the `target_entity` argument has been renamed to `target_dataframe_name`, but otherwise this basic call remains the same.



In [23]:
features = ft.dfs(entityset=es,
                  target_dataframe_name='items',
                  features_only=True)
features

[<Feature: order_id>,
 <Feature: item_price>,
 <Feature: on_sale>,
 <Feature: orders.COUNT(items)>,
 <Feature: orders.MAX(items.item_price)>,
 <Feature: orders.MEAN(items.item_price)>,
 <Feature: orders.MIN(items.item_price)>,
 <Feature: orders.PERCENT_TRUE(items.on_sale)>,
 <Feature: orders.SKEW(items.item_price)>,
 <Feature: orders.STD(items.item_price)>,
 <Feature: orders.SUM(items.item_price)>,
 <Feature: orders.DAY(order_date)>,
 <Feature: orders.MONTH(order_date)>,
 <Feature: orders.WEEKDAY(order_date)>,
 <Feature: orders.YEAR(order_date)>]

In addition the `dfs` argument `ignore_entities` was renamed to `ignore_dataframes` and `ignore_variables` was renamed to `ignore_columns`. Similarly if specifying primitive options all references to `entities` should be replaced with `dataframes` and references to `variables` should be replaced with columns. For example, the primitive option of `include_groupby_entities` is now `include_groupby_dataframes` and `include_variables` is now `include_columns`.

The basic call to `featuretools.calculate_feature_matrix` remains unchanged if passing in an entityset along with a list of features to caluculate. However, users calling `calculate_feature_matrix` by passing in a list of `entities` and `relationships` should note that the `entities` argument has been renamed to `dataframes` and the values in the dictionary values should now include Woodwork logical types instead of Featuretools `Variable` classes.

In [25]:
feature_matrix = ft.calculate_feature_matrix(features=features, entityset=es)
feature_matrix

Unnamed: 0_level_0,order_id,item_price,on_sale,orders.COUNT(items),orders.MAX(items.item_price),orders.MEAN(items.item_price),orders.MIN(items.item_price),orders.PERCENT_TRUE(items.on_sale),orders.SKEW(items.item_price),orders.STD(items.item_price),orders.SUM(items.item_price),orders.DAY(order_date),orders.MONTH(order_date),orders.WEEKDAY(order_date),orders.YEAR(order_date)
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,0,29.95,False,1,29.95,29.95,29.95,0.0,,,29.95,2,1,5,2021
1,1,4.99,True,2,10.25,7.62,4.99,0.5,,3.719382,15.24,3,1,6,2021
2,1,10.25,False,2,10.25,7.62,4.99,0.5,,3.719382,15.24,3,1,6,2021
3,2,20.5,True,2,20.5,18.245,15.99,0.5,,3.189052,36.49,4,1,0,2021
4,2,15.99,False,2,20.5,18.245,15.99,0.5,,3.189052,36.49,4,1,0,2021


In addition to the changes in argument names, there are a couple other changes to the returned feature matrix that users should be aware of. First, because of slight differences in the way Woodwork defines column types compared to how the prior Featuretools implementation did, there can be some differences in the features that are generated between old and new versions. The most notable impact is in the way foreign key columns are handled. Previously, Featuretools treats all foreign key (previously `Id`) columns as categorical columns, and would generate appropriate features from these columns. Starting in version 1.0, foreign key columns are not constrained to be categorical, and if they are another type such as `Integer`, features will not be generated from these columns.

Also, because Woodwork's type inference process differs from the previous Featuretools type inference process an entityset may have column types identified differently. This difference in column types could impact the features that are generated. If it is important to have the same set of features, check all of the logical types in the entityset dataframes and update them to the expected types if there are columns that have been inferred as unexpected types.

Finally, the feature matrix calculated by Featuretools will now have Woodwork initialized. This means that users can view feature matrix column typing information through the Woodwork namespace as follows.

In [26]:
feature_matrix.ww

WoodworkNotInitError: Woodwork not initialized for this DataFrame. Initialize by calling DataFrame.ww.init

WoodworkNotInitError: Woodwork not initialized for this DataFrame. Initialize by calling DataFrame.ww.init

- Feature matrix will now be returned as a dataframe with Woodwork initialized. Each column will contain a designation indicating whether the feature was engineered or in the original data in the ColumnAccessor.origin attribute.

Featuretools now labels features by whether they were originally in the dataframes, or whether they were created by Featuretools. This information is stored in the Woodwork `origin` attribute for the column. Columns that were in the original data will be labeled with `source` and features that were created by Featuretools will be labeled with `engineered`.

As a demonstration of how to access this information, let's look at the `item_price` feature which was in the original data as well as the `orders.MEAN(items.item_price)` which was created by Featuretools.

In [27]:
feature_matrix.ww['item_price'].ww.origin

WoodworkNotInitError: Woodwork not initialized for this DataFrame. Initialize by calling DataFrame.ww.init

In [28]:
feature_matrix.ww['orders.MEAN(items.item_price)'].ww.origin

WoodworkNotInitError: Woodwork not initialized for this DataFrame. Initialize by calling DataFrame.ww.init

## Other changes

In addition to the changes outlined above, there are several other smaller changes in Featuretools 1.0 of which existing users should be aware.

Column ordering of an dataframe in an EntitySet might be different than it was before. Previously, Featuretools would reorder the columns such that the index column would always be the first column in the dataframe. This behavior has been removed, and the index column is no longer guaranteed to be the first column in the dataframe. Now the index column will remain the in the position it was when the dataframe was added to the entityset.

For `LatLong` columns, older versions of Featuretools would replace single `nan` values in the columns with a tuple `(nan, nan)`. This is no longer the case, and single `nan` values will now remain in the `LatLong` column. Based on the behavior in Woodwork, any values of `(nan, nan)` in a `LatLong` column will be replaced with a single `nan` value.

Since Featuretools no longer defines `Variable` objects with relationships between them, the `featuretools.variable_types.graph_variable_types` function has been removed.

Similarly, the `featuretools.variable_types.list_variable_types` utility function has been deprecated and replaced with two corresponding Woodwork functions: `woodwork.list_logical_types` and `woodwork.list_semantic_tags`. Starting in Featuretools 1.0, the Woodwork utility functions should be used to obtain information on the logical types and semantic tags that can be applied to dataframe columns.