Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve duplicate features check in DFS #538

Merged
merged 6 commits into from May 10, 2019
Merged

Conversation

CJStadler
Copy link
Contributor

The previous version becomes very slow as the number of features
increases. Because it looked up the count for every element its run time
was quadratic. The implementation in this commit should be linear.

Is this check even necessary though?

The previous version becomes very slow as the number of features
increases. Because it looked up the count for every element its run time
was quadratic. The implementation in this commit should be linear.
@codecov
Copy link

codecov bot commented May 9, 2019

Codecov Report

Merging #538 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
- Coverage   96.26%   96.26%   -0.01%     
==========================================
  Files         114      114              
  Lines        9258     9256       -2     
==========================================
- Hits         8912     8910       -2     
  Misses        346      346
Impacted Files Coverage Δ
featuretools/synthesis/deep_feature_synthesis.py 96.75% <100%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f8a94f...2cfe24e. Read the comment docs.

@rwedge
Copy link
Contributor

rwedge commented May 9, 2019

Logic seems good.

Is this check even necessary though?

@kmax12 thoughts? Does the hash check in DeepFeatureSynthesis._handle_feature make this check redundant?

@kmax12
Copy link
Contributor

kmax12 commented May 10, 2019

@rwedge i believe you're right. looks like _handle_new_feature won't add a duplicate feature and all features get added by calling that, so this check is unnecessary.

rwedge
rwedge previously approved these changes May 10, 2019
@rwedge rwedge self-requested a review May 10, 2019 15:11
@rwedge rwedge dismissed their stale review May 10, 2019 15:13

Jumped the gun

rwedge
rwedge previously approved these changes May 10, 2019
Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

For possible compatibility issues
@CJStadler CJStadler merged commit 429adb0 into master May 10, 2019
@CJStadler CJStadler deleted the duplicate-features-check branch May 10, 2019 16:45
@rwedge rwedge mentioned this pull request May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants