Skip to content

Fix multi-output features not created when there is no child data#834

Merged
rwedge merged 6 commits into
alteryx:masterfrom
jeffzi:fix-empty-multi-output-primitives
Dec 18, 2019
Merged

Fix multi-output features not created when there is no child data#834
rwedge merged 6 commits into
alteryx:masterfrom
jeffzi:fix-empty-multi-output-primitives

Conversation

@jeffzi
Copy link
Copy Markdown
Contributor

@jeffzi jeffzi commented Dec 5, 2019

Fix multi-ouput features not created when there is no child data

When there is no child data, calculate_feature_matrix raises a KeyError because multi-output features are not created. The expected behaviour is to have those features represented by columns filled by numpy.nan, as it is the case with regular features.

Here is minimal reproducible example:

import numpy as np
import pandas as pd

import featuretools as ft
from featuretools.primitives import NMostCommon

parent_df = pd.DataFrame({"id": [1]})
child_df = pd.DataFrame({"id": [1, 2, 3],
                         "parent_id": [1, 1, 1],
                         "time_index": pd.date_range(start='1/1/2018', periods=3),
                         "cat": ['a', 'a', 'b']})

es = ft.EntitySet(id="blah")
es.entity_from_dataframe(entity_id="parent", dataframe=parent_df, index="id")
es.entity_from_dataframe(entity_id="child", dataframe=child_df, index="id", time_index="time_index")
es.add_relationship(ft.Relationship(es["parent"]["id"], es["child"]["parent_id"]))

n_most_common = ft.Feature(es["child"]['cat'], parent_entity=es["parent"], primitive=NMostCommon)

# cutoff time before all rows
# We expect N_MOST_COMMON features to be np.nan
ft.calculate_feature_matrix(entityset=es, 
                            features=[n_most_common],
                            cutoff_time=pd.Timestamp("12/31/2017"))
#> [ ... ]
#> KeyError: "None of [Index(['N_MOST_COMMON(child.cat)[0]', 'N_MOST_COMMON(child.cat)[1]',\n       'N_MOST_COMMON(child.cat)[2]'],\n      dtype='object')] are in the [columns]"

Created on 2019-12-05 by the reprexpy package

This PR fixes the bug and adds a test for multi-output feature in test_empty_child_dataframe

Comment thread docs/source/changelog.rst Outdated
* Enhancements
* Fixes
* Raise error when given wrong input for ignore_variables (:pr:`826`)
* Fix multi-ouput features not created when there is no child data (:pr:`#834`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the '#' character should be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I fixed it.

@jeffzi jeffzi changed the title Fix multi-ouput features not created when there is no child data Fix multi-output features not created when there is no child data Dec 6, 2019
@rwedge
Copy link
Copy Markdown
Contributor

rwedge commented Dec 9, 2019

I think the PR looks good, once the PR fixing the issue with sklearn and the tests goes through I think this will be good to go.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #834 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #834      +/-   ##
==========================================
+ Coverage   98.15%   98.16%   +<.01%     
==========================================
  Files         117      117              
  Lines       10848    10851       +3     
==========================================
+ Hits        10648    10652       +4     
+ Misses        200      199       -1
Impacted Files Coverage Δ
...mputational_backend/test_feature_set_calculator.py 100% <100%> (ø) ⬆️
...s/computational_backends/feature_set_calculator.py 98.55% <100%> (ø) ⬆️
...computational_backends/calculate_feature_matrix.py 98.58% <0%> (+0.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb82feb...849778a. Read the comment docs.

@rwedge rwedge merged commit a48dbae into alteryx:master Dec 18, 2019
@rwedge rwedge mentioned this pull request Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants