Skip to content

Improve duplicate features check in DFS#538

Merged
CJStadler merged 6 commits intomasterfrom
duplicate-features-check
May 10, 2019
Merged

Improve duplicate features check in DFS#538
CJStadler merged 6 commits intomasterfrom
duplicate-features-check

Conversation

@CJStadler
Copy link
Contributor

The previous version becomes very slow as the number of features
increases. Because it looked up the count for every element its run time
was quadratic. The implementation in this commit should be linear.

Is this check even necessary though?

CJStadler added 2 commits May 9, 2019 15:59
The previous version becomes very slow as the number of features
increases. Because it looked up the count for every element its run time
was quadratic. The implementation in this commit should be linear.
@codecov
Copy link

codecov bot commented May 9, 2019

Codecov Report

Merging #538 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
- Coverage   96.26%   96.26%   -0.01%     
==========================================
  Files         114      114              
  Lines        9258     9256       -2     
==========================================
- Hits         8912     8910       -2     
  Misses        346      346
Impacted Files Coverage Δ
featuretools/synthesis/deep_feature_synthesis.py 96.75% <100%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f8a94f...2cfe24e. Read the comment docs.

@rwedge
Copy link
Contributor

rwedge commented May 9, 2019

Logic seems good.

Is this check even necessary though?

@kmax12 thoughts? Does the hash check in DeepFeatureSynthesis._handle_feature make this check redundant?

@kmax12
Copy link
Contributor

kmax12 commented May 10, 2019

@rwedge i believe you're right. looks like _handle_new_feature won't add a duplicate feature and all features get added by calling that, so this check is unnecessary.

rwedge
rwedge previously approved these changes May 10, 2019
@rwedge rwedge self-requested a review May 10, 2019 15:11
@rwedge rwedge dismissed their stale review May 10, 2019 15:13

Jumped the gun

rwedge
rwedge previously approved these changes May 10, 2019
Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

CJStadler added 2 commits May 10, 2019 12:25
For possible compatibility issues
@CJStadler CJStadler merged commit 429adb0 into master May 10, 2019
@CJStadler CJStadler deleted the duplicate-features-check branch May 10, 2019 16:45
@rwedge rwedge mentioned this pull request May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants