You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose we have an entityset with a parent entity E1 and a child entity E2 and we're building features on E2 with Deep Feature Synthesis. If E1 has a categorical variable, it seems that the direct feature of that categorical will be automatically generated and used in resulting feature matrix. However, that feature won't be used for any stacked features.
This becomes a problem when we want a primitive of multiple variables from different tables. The following example shows an entityset where a user might expect the feature CAT_PRIMITIVE(values_row_1, transactions.categorical) to be generated.
import pandas as pd
import featuretools as ft
import featuretools.variable_types as vtypes
datadict = {'values_row_1': [1, 1, 2],
'transaction_id': [1, 2, 3],
'categorical': ['cat', 'cat', 'lion']}
data = pd.DataFrame(datadict)
variable_types = {'values_row_1': vtypes.Numeric,
'transaction_id': vtypes.Categorical,
'categorical': vtypes.Categorical
}
es = ft.EntitySet('mock_entityset')
es.entity_from_dataframe(entity_id='values',
dataframe=data,
index='my_index',
variable_types=variable_types,
)
es.normalize_entity(base_entity_id='values',
new_entity_id='transactions',
index='transaction_id',
additional_variables=['categorical']
)
from featuretools.primitives import make_trans_primitive
def cat_primitive(value, categorical):
return [x for x in categorical]
Prim = make_trans_primitive(cat_primitive,
input_types=[vtypes.Numeric, vtypes.Categorical],
return_type=vtypes.Numeric)
fm, features = ft.dfs(entityset=es,
target_entity='values',
agg_primitives=[],
trans_primitives=[Prim])
features
The text was updated successfully, but these errors were encountered:
This is connected to the way that DFS builds features. Direct features are created after the transform features, so they're not available for stacking. In particular, if we look at the _run_dfs function in deep feature synthesis, step 3 is to build transform features while step 5 builds direct features from other entities.
Suppose we have an entityset with a parent entity
E1
and a child entityE2
and we're building features onE2
with Deep Feature Synthesis. IfE1
has a categorical variable, it seems that the direct feature of that categorical will be automatically generated and used in resulting feature matrix. However, that feature won't be used for any stacked features.This becomes a problem when we want a primitive of multiple variables from different tables. The following example shows an entityset where a user might expect the feature
CAT_PRIMITIVE(values_row_1, transactions.categorical)
to be generated.The text was updated successfully, but these errors were encountered: