Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-output features not created when there is no child data #834

Merged
merged 6 commits into from Dec 18, 2019

Conversation

jeffzi
Copy link
Contributor

@jeffzi jeffzi commented Dec 5, 2019

Fix multi-ouput features not created when there is no child data

When there is no child data, calculate_feature_matrix raises a KeyError because multi-output features are not created. The expected behaviour is to have those features represented by columns filled by numpy.nan, as it is the case with regular features.

Here is minimal reproducible example:

import numpy as np
import pandas as pd

import featuretools as ft
from featuretools.primitives import NMostCommon

parent_df = pd.DataFrame({"id": [1]})
child_df = pd.DataFrame({"id": [1, 2, 3],
                         "parent_id": [1, 1, 1],
                         "time_index": pd.date_range(start='1/1/2018', periods=3),
                         "cat": ['a', 'a', 'b']})

es = ft.EntitySet(id="blah")
es.entity_from_dataframe(entity_id="parent", dataframe=parent_df, index="id")
es.entity_from_dataframe(entity_id="child", dataframe=child_df, index="id", time_index="time_index")
es.add_relationship(ft.Relationship(es["parent"]["id"], es["child"]["parent_id"]))

n_most_common = ft.Feature(es["child"]['cat'], parent_entity=es["parent"], primitive=NMostCommon)

# cutoff time before all rows
# We expect N_MOST_COMMON features to be np.nan
ft.calculate_feature_matrix(entityset=es, 
                            features=[n_most_common],
                            cutoff_time=pd.Timestamp("12/31/2017"))
#> [ ... ]
#> KeyError: "None of [Index(['N_MOST_COMMON(child.cat)[0]', 'N_MOST_COMMON(child.cat)[1]',\n       'N_MOST_COMMON(child.cat)[2]'],\n      dtype='object')] are in the [columns]"

Created on 2019-12-05 by the reprexpy package

This PR fixes the bug and adds a test for multi-output feature in test_empty_child_dataframe

@@ -6,14 +6,15 @@ Changelog
* Enhancements
* Fixes
* Raise error when given wrong input for ignore_variables (:pr:`826`)
* Fix multi-ouput features not created when there is no child data (:pr:`#834`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the '#' character should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I fixed it.

@jeffzi jeffzi changed the title Fix multi-ouput features not created when there is no child data Fix multi-output features not created when there is no child data Dec 6, 2019
@rwedge
Copy link
Collaborator

rwedge commented Dec 9, 2019

I think the PR looks good, once the PR fixing the issue with sklearn and the tests goes through I think this will be good to go.

@codecov-io
Copy link

Codecov Report

Merging #834 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #834      +/-   ##
==========================================
+ Coverage   98.15%   98.16%   +<.01%     
==========================================
  Files         117      117              
  Lines       10848    10851       +3     
==========================================
+ Hits        10648    10652       +4     
+ Misses        200      199       -1
Impacted Files Coverage Δ
...mputational_backend/test_feature_set_calculator.py 100% <100%> (ø) ⬆️
...s/computational_backends/feature_set_calculator.py 98.55% <100%> (ø) ⬆️
...computational_backends/calculate_feature_matrix.py 98.58% <0%> (+0.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb82feb...849778a. Read the comment docs.

rwedge
rwedge approved these changes Dec 18, 2019
@rwedge rwedge merged commit a48dbae into alteryx:master Dec 18, 2019
@rwedge rwedge mentioned this pull request Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants