Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-output features not created when there is no child data #834

Merged
merged 6 commits into from Dec 18, 2019

Conversation

@jeffzi
Copy link
Contributor

jeffzi commented Dec 5, 2019

Fix multi-ouput features not created when there is no child data

When there is no child data, calculate_feature_matrix raises a KeyError because multi-output features are not created. The expected behaviour is to have those features represented by columns filled by numpy.nan, as it is the case with regular features.

Here is minimal reproducible example:

import numpy as np
import pandas as pd

import featuretools as ft
from featuretools.primitives import NMostCommon

parent_df = pd.DataFrame({"id": [1]})
child_df = pd.DataFrame({"id": [1, 2, 3],
                         "parent_id": [1, 1, 1],
                         "time_index": pd.date_range(start='1/1/2018', periods=3),
                         "cat": ['a', 'a', 'b']})

es = ft.EntitySet(id="blah")
es.entity_from_dataframe(entity_id="parent", dataframe=parent_df, index="id")
es.entity_from_dataframe(entity_id="child", dataframe=child_df, index="id", time_index="time_index")
es.add_relationship(ft.Relationship(es["parent"]["id"], es["child"]["parent_id"]))

n_most_common = ft.Feature(es["child"]['cat'], parent_entity=es["parent"], primitive=NMostCommon)

# cutoff time before all rows
# We expect N_MOST_COMMON features to be np.nan
ft.calculate_feature_matrix(entityset=es, 
                            features=[n_most_common],
                            cutoff_time=pd.Timestamp("12/31/2017"))
#> [ ... ]
#> KeyError: "None of [Index(['N_MOST_COMMON(child.cat)[0]', 'N_MOST_COMMON(child.cat)[1]',\n       'N_MOST_COMMON(child.cat)[2]'],\n      dtype='object')] are in the [columns]"

Created on 2019-12-05 by the reprexpy package

This PR fixes the bug and adds a test for multi-output feature in test_empty_child_dataframe

@@ -6,14 +6,15 @@ Changelog
* Enhancements
* Fixes
* Raise error when given wrong input for ignore_variables (:pr:`826`)
* Fix multi-ouput features not created when there is no child data (:pr:`#834`)

This comment has been minimized.

Copy link
@rwedge

rwedge Dec 5, 2019

Collaborator

the '#' character should be removed

This comment has been minimized.

Copy link
@jeffzi

jeffzi Dec 6, 2019

Author Contributor

My bad, I fixed it.

@jeffzi jeffzi changed the title Fix multi-ouput features not created when there is no child data Fix multi-output features not created when there is no child data Dec 6, 2019
@rwedge

This comment has been minimized.

Copy link
Collaborator

rwedge commented Dec 9, 2019

I think the PR looks good, once the PR fixing the issue with sklearn and the tests goes through I think this will be good to go.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Dec 18, 2019

Codecov Report

Merging #834 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #834      +/-   ##
==========================================
+ Coverage   98.15%   98.16%   +<.01%     
==========================================
  Files         117      117              
  Lines       10848    10851       +3     
==========================================
+ Hits        10648    10652       +4     
+ Misses        200      199       -1
Impacted Files Coverage Δ
...mputational_backend/test_feature_set_calculator.py 100% <100%> (ø) ⬆️
...s/computational_backends/feature_set_calculator.py 98.55% <100%> (ø) ⬆️
...computational_backends/calculate_feature_matrix.py 98.58% <0%> (+0.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb82feb...849778a. Read the comment docs.

@rwedge
rwedge approved these changes Dec 18, 2019
@rwedge rwedge merged commit a48dbae into FeatureLabs:master Dec 18, 2019
4 checks passed
4 checks passed
codecov/patch 100% of diff hit (target 98.15%)
Details
codecov/project 98.16% (+<.01%) compared to cb82feb
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details
@rwedge rwedge mentioned this pull request Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.