Skip to content

Fix issue with missing instances and categorical entity index #1050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 15, 2020

Conversation

thehomebrewnerd
Copy link
Contributor

Fixes #1046

Changed order of operations in merging default dataframe into feature matrix to allow combination of categorical and integer indexes. Added test case to cover this situation.

@thehomebrewnerd thehomebrewnerd requested review from rwedge and gsheni July 6, 2020 13:30
@codecov
Copy link

codecov bot commented Jul 6, 2020

Codecov Report

Merging #1050 into main will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1050   +/-   ##
=======================================
  Coverage   98.35%   98.35%           
=======================================
  Files         126      126           
  Lines       13082    13082           
=======================================
  Hits        12867    12867           
  Misses        215      215           
Impacted Files Coverage Δ
...computational_backends/calculate_feature_matrix.py 99.07% <100.00%> (+<0.01%) ⬆️
...s/computational_backends/feature_set_calculator.py 98.68% <100.00%> (+<0.01%) ⬆️
...utational_backend/test_calculate_feature_matrix.py 98.20% <100.00%> (+0.01%) ⬆️
...mputational_backend/test_feature_set_calculator.py 97.93% <100.00%> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c00c83...0479b93. Read the comment docs.

@thehomebrewnerd
Copy link
Contributor Author

@rwedge Added checks to make sure that the feature matrix has a categorical index, if the target entity has a categorical index. If the user supplies an instance id that is not in the target entity, the index categories will be different in the feature matrix (to account for adding the missing ids), but the index dtype will be maintained.

This change required some updates to a few tests which had errors due to an invalid indexing approach being used for a series with a categorical index. For example, df[col][0] returns the first value in the series if the index is not categorical, but results in an error if the index is categorical. Changed to df[col].values[0] which works for both.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@thehomebrewnerd thehomebrewnerd merged commit 8c1a9ab into main Jul 15, 2020
@thehomebrewnerd thehomebrewnerd deleted the issue1046 branch July 15, 2020 21:32
@rwedge rwedge mentioned this pull request Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid instance ID causes TypeError (non-category item)
3 participants