Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primitive stacking on direct features #81

Closed
Seth-Rothschild opened this issue Jan 31, 2018 · 1 comment · Fixed by #623
Closed

Primitive stacking on direct features #81

Seth-Rothschild opened this issue Jan 31, 2018 · 1 comment · Fixed by #623

Comments

@Seth-Rothschild
Copy link
Contributor

Seth-Rothschild commented Jan 31, 2018

Suppose we have an entityset with a parent entity E1 and a child entity E2 and we're building features on E2 with Deep Feature Synthesis. If E1 has a categorical variable, it seems that the direct feature of that categorical will be automatically generated and used in resulting feature matrix. However, that feature won't be used for any stacked features.

This becomes a problem when we want a primitive of multiple variables from different tables. The following example shows an entityset where a user might expect the feature CAT_PRIMITIVE(values_row_1, transactions.categorical) to be generated.

import pandas as pd
import featuretools as ft
import featuretools.variable_types as vtypes
datadict = {'values_row_1': [1, 1, 2],
            'transaction_id': [1, 2, 3],
            'categorical': ['cat', 'cat', 'lion']}

data = pd.DataFrame(datadict)

variable_types = {'values_row_1': vtypes.Numeric,
                  'transaction_id': vtypes.Categorical,
                  'categorical': vtypes.Categorical
                  }

es = ft.EntitySet('mock_entityset')

es.entity_from_dataframe(entity_id='values',
                         dataframe=data,
                         index='my_index',
                         variable_types=variable_types,
                         )

es.normalize_entity(base_entity_id='values',
                    new_entity_id='transactions',
                    index='transaction_id',
                    additional_variables=['categorical']
                    )

from featuretools.primitives import make_trans_primitive

def cat_primitive(value, categorical):
    return [x for x in categorical]

Prim = make_trans_primitive(cat_primitive,
                            input_types=[vtypes.Numeric, vtypes.Categorical],
                            return_type=vtypes.Numeric)

fm, features = ft.dfs(entityset=es, 
                      target_entity='values',
                      agg_primitives=[],
                      trans_primitives=[Prim])

features
@Seth-Rothschild
Copy link
Contributor Author

This is connected to the way that DFS builds features. Direct features are created after the transform features, so they're not available for stacking. In particular, if we look at the _run_dfs function in deep feature synthesis, step 3 is to build transform features while step 5 builds direct features from other entities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant