Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure relationship variables are the same underlying dtype #155

Merged
merged 5 commits into from
May 24, 2018

Conversation

bschreck
Copy link
Contributor

If a parent entity has an index with integer data, and the linking variable in the child has object/string data, this will convert the child variable to integers. This is necessary for the new version of pandas (0.23), because merges on mixed dtypes are not allowed.

@bschreck bschreck requested a review from rwedge May 24, 2018 20:03
@codecov-io
Copy link

codecov-io commented May 24, 2018

Codecov Report

Merging #155 into master will increase coverage by 0.03%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
+ Coverage   92.59%   92.62%   +0.03%     
==========================================
  Files          72       72              
  Lines        7722     7729       +7     
==========================================
+ Hits         7150     7159       +9     
+ Misses        572      570       -2
Impacted Files Coverage Δ
featuretools/tests/testing_utils/mock_ds.py 87.21% <100%> (+0.19%) ⬆️
featuretools/entityset/entity.py 90.8% <100%> (ø) ⬆️
featuretools/tests/entityset_tests/test_es.py 98.9% <100%> (+3.24%) ⬆️
featuretools/entityset/base_entityset.py 90.76% <87.5%> (-0.26%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee28119...2ab6766. Read the comment docs.

if ((is_object_dtype(parent_e.df[parent_v]) or
is_string_dtype(parent_e.df[parent_v])) and
is_numeric_dtype(child_e.df[child_v])):
parent_e.df[parent_v] = pd.to_numeric(self[parent_e.id].df[parent_v])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self[parent_e.id] here can be changed to parent_e.df

target_entity="sessions",
agg_primitives=["mean", "sum", "mode"],
trans_primitives=["month", "hour"],
max_depth=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little weird that this test has no assertions. Maybe assert that the normalize successfully converted both parent and child values to matching types? Or maybe the test should go in test_es:test_add_relationship_converts_types

@bschreck bschreck merged commit b75bc92 into master May 24, 2018
@rwedge rwedge mentioned this pull request May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants