Skip to content

Improve performance of all feature calculations #224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Aug 24, 2018
Merged

Conversation

kmax12
Copy link
Contributor

@kmax12 kmax12 commented Aug 21, 2018

Feature calculation makes several calls to Entity.query_by_values. This pull request optimizes these calls using a pandas merge instead of isin.

This is still a work in progress as we benchmark it across different datasets.

@kmax12 kmax12 changed the base branch from master to agg-functions August 21, 2018 19:03
@kmax12 kmax12 changed the title WIP Improve performance of all feature calculations (WIP) Improve performance of all feature calculations Aug 22, 2018
@codecov-io
Copy link

codecov-io commented Aug 22, 2018

Codecov Report

Merging #224 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #224      +/-   ##
==========================================
- Coverage   93.64%   93.62%   -0.03%     
==========================================
  Files          71       71              
  Lines        7679     7654      -25     
==========================================
- Hits         7191     7166      -25     
  Misses        488      488
Impacted Files Coverage Δ
featuretools/tests/entityset_tests/test_es.py 99.34% <ø> (-0.01%) ⬇️
...utational_backend/test_calculate_feature_matrix.py 99.27% <ø> (-0.02%) ⬇️
featuretools/entityset/entityset.py 93.59% <ø> (-0.16%) ⬇️
featuretools/entityset/entity.py 89.71% <100%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8462046...383ef17. Read the comment docs.

@kmax12 kmax12 changed the title (WIP) Improve performance of all feature calculations Improve performance of all feature calculations Aug 23, 2018
@kmax12 kmax12 changed the base branch from agg-functions to master August 24, 2018 17:08
@kmax12 kmax12 merged commit 676b7cc into master Aug 24, 2018
@rwedge rwedge mentioned this pull request Aug 28, 2018
@kmax12 kmax12 deleted the isin-to-merge branch October 2, 2018 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants