# Transitioning to Featuretools Version 1.0.0

Featuretools version 1.0.0 incorporates many significant changes that impact the way EntitySets are created, how primitives are defined, and in some cases the resulting feature matrix that is created. This document will provide an overview of the significant changs, helping existing Featuretools users transition to version 1.0.0.

### Why make these changes?
Over time it became clear that several open source libraries were defining custom type systems to define variable types for columns in a dataframe. Having custom type systems in individual libraries makes it difficult to communicate typing information between the libraries.

To help resolve this problem, a new libary called [Woodwork](https://woodwork.alteryx.com/en/stable/) was developed to help implement a standard typing system that can be shared by multiple libraries. These changes are being made to replace the existing custom type system defined in older versions of Featuretools, with the typing system defined byu Woodwork. The use of Woodwork in Featuretools will allow typing in a feature matrix created by Featuretools to be used seamlessly for creating machine learning models using [EvalML](https://evalml.alteryx.com/en/stable/), which also uses Woodwork to manage typing information.

### What has changed?
Previous releases of Featuretools used a custom type system to define the types for each column in a dataframe. This typing information was stored in a `Variable`, and the types for all columns in a dataframe were stored in an `Entity`. The biggest change with the integration of Woodwork into Featuretools is that both the `Entity` and `Variable` classes have been removed from Featuretools.

The previous `Entity` class has been replaced by a Woodwork dataframe. The Woodwork dataframe now stores all of the column typing information that was previously stored in the `Entity`.

Column typing information which was previously stored as a Featuretools `Variable` is now stored on the Woodwork dataframe as a `ColumnSchema` object. The typing information for the column is stored as a combination of a Woodwork `LogicalType` and one or more `semantic_tags` inside the `ColumnSchema` object. For more information on how Woodwork manages typing information, refer to the [Woodwork Understanding Types and Tags](https://woodwork.alteryx.com/en/stable/guides/logical_types_and_semantic_tags.html) guide.

### What do these changes mean for users?
The removal of these classes required several methods that were previously called from an `Entity` object to be moved to the `EntitySet` object instead. This change also impacts the way relationships, features and primitives are defined, requiring different parameters than were previously required. Also, in some cases, because the Woodwork typing system is not identical to the old Featuretools typing system, the feature matrix that is returned can be slightly different as a result of columns being identified as different types.

All of these changes, and more, will be reviewed in more detail throughout this document, providing examples of both the old and new API where possible.

## Removal of `Entity` Class and Updates to `EntitySet`

In previous versions of Featuretools and EntitySet was created by adding multiple entities and then defining relationships between variables (columns) in different entities. Starting in Featuretools version 1.0.0, EntitySets are now created by adding multiple dataframes and defining relationships between columns in the dataframes. While conceptually similar, there are some minor differences in the process.

### Adding dataframes to an EntitySet

When adding dataframes to an EntitySet, users can pass in a Woodwork dataframe, or a regular dataframe without Woodwork typing information. As before, Featuretools support creating EntitySets from pandas, Dask and Koalas dataframes. If users supply a dataframe that has Woodwork typing information initialized, Featuretools will simply use this typing information directly, If users supply a dataframe without Woodwork initialized, Featuretools will initialize Woodwork on the dataframe, peforming type inference for any column that does not have typing information specified.

Below are some examples to illustrate this process. First we will create two small dataframes to use for the example.

In [15]:
import featuretools as ft
import pandas as pd
import woodwork as ww

In [22]:
orders_df = pd.DataFrame({
    'order_id': [0, 1, 2],
    'order_date': ['2021-01-02', '2021-01-03', '2021-01-04']
})
items_df = pd.DataFrame({
    'id': [0, 1, 2, 3, 4],
    'order_id': [0, 1, 1, 2, 2],
    'item_price': [29.95, 4.99, 10.25, 20.05, 15.99]
})

With older versions of Featuretools,users would first create an EntitySet object, and then add dataframes to the EntitySet, by calling `entity_from_dataframe` as shown below.

```python
es = ft.EntitySet('old_es')

es.entity_from_dataframe(dataframe=orders_df,
                         entity_id='orders',
                         index='order_id',
                         time_index='order_date')
es.entity_from_dataframe(dataframe=items_df,
                         entity_id='items',
                         index='id')
es
```

```
Entityset: old_es
  Entities:
    orders [Rows: 3, Columns: 2]
    items [Rows: 5, Columns: 3]
  Relationships:
    No relationships
```

With Featuretools v1.0.0, the steps for adding a dataframe to an EntitySet are the same, but some of the details have changed. First, create an EntitySet as before. To add the dataframe call `EntitySet.add_dataframe` in place of the previous `EntitySet.entity_from_dataframe` call. Note that the name of the dataframe is specified in the `dataframe_name` argument, which was previously called `entity_id`.

In [23]:
es = ft.EntitySet('new_es')

es.add_dataframe(dataframe=orders_df,
                 dataframe_name='orders',
                 index='order_id',
                 time_index='order_date')

Entityset: new_es
  DataFrames:
    orders [Rows: 3, Columns: 2]
  Relationships:
    No relationships

You can also define the name, index and time index by first initializing Woodwork on the dataframe and then passing the Woodwork initialized dataframe directly to the `add_dataframe` call. For this example we will initialize Woodwork on `items_df`, setting the dataframe name as `items` and specifying that the index should be the `id` column.

In [24]:
items_df.ww.init(name='items', index='id')
items_df.ww

Unnamed: 0_level_0,Physical Type,Logical Type,Semantic Tag(s)
Column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
id,int64,Integer,['index']
order_id,int64,Integer,['numeric']
item_price,float64,Double,['numeric']


With Woodwork initialized, we no longer need to specify values for the `dataframe_name` or `index` arguments when calling `add_dataframe` as Featuretools will simply use the values that were already specified when Woodwork was initialized.

In [20]:
es.add_dataframe(dataframe=items_df)

Entityset: new_es
  DataFrames:
    orders [Rows: 3, Columns: 2]
    items [Rows: 5, Columns: 3]
  Relationships:
    No relationships

### Accessing column typing information
