Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DeepFeatureSynthesis to support multiple paths #600

Merged
merged 18 commits into from Jun 21, 2019

Conversation

CJStadler
Copy link
Contributor

For entitysets with multiple paths between two entities DFS will now
create features through all paths. Entitysets with cycles in the
directed graph are not supported.

  • Use feature.unique_name instead of hash in dicts
  • Change EntitySet.get_backward_entities to yield paths.

Once #572 is merged the base branch should be changed to master.

@CJStadler CJStadler changed the base branch from feature-trie to master June 18, 2019 12:52
For entitysets with multiple paths between two entities DFS will now
create features through all paths. Entitysets with cycles in the
directed graph are not supported.

- Use feature.unique_name instead of hash in dicts
- Change EntitySet.get_backward_entities to yield paths.
@codecov

This comment has been minimized.

@codecov
Copy link

codecov bot commented Jun 18, 2019

Codecov Report

Merging #600 into master will decrease coverage by 0.15%.
The diff coverage is 98.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #600      +/-   ##
==========================================
- Coverage   97.27%   97.11%   -0.16%     
==========================================
  Files         118      118              
  Lines        9589     9545      -44     
==========================================
- Hits         9328     9270      -58     
- Misses        261      275      +14
Impacted Files Coverage Δ
featuretools/tests/entityset_tests/test_es.py 100% <ø> (ø) ⬆️
...uretools/tests/entityset_tests/test_es_metadata.py 98.13% <100%> (-0.11%) ⬇️
featuretools/entityset/entityset.py 95.5% <100%> (+0.35%) ⬆️
featuretools/tests/testing_utils/__init__.py 100% <100%> (ø) ⬆️
featuretools/tests/testing_utils/features.py 100% <100%> (ø) ⬆️
...retools/tests/entityset_tests/test_relationship.py 100% <100%> (ø) ⬆️
...ols/tests/synthesis/test_deep_feature_synthesis.py 100% <100%> (ø) ⬆️
featuretools/entityset/relationship.py 98.68% <100%> (+0.19%) ⬆️
featuretools/synthesis/deep_feature_synthesis.py 96.06% <95.55%> (-0.69%) ⬇️
featuretools/__main__.py 0% <0%> (-50%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 177089f...4b49d5e. Read the comment docs.

featuretools/entityset/entityset.py Outdated Show resolved Hide resolved
featuretools/synthesis/deep_feature_synthesis.py Outdated Show resolved Hide resolved
featuretools/synthesis/deep_feature_synthesis.py Outdated Show resolved Hide resolved
Previously was always doing a deep search.
- Add RelationshipPath.entities
- Change EntitySet.get_forward_entities to yield entity_ids and paths.
@CJStadler CJStadler requested a review from kmax12 June 18, 2019 20:11
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. Just one comment for now.

featuretools/entityset/entityset.py Outdated Show resolved Hide resolved
This was not exposed in the dfs method so it always used the default
value of 2, it was undocumented, and it was unclear if it provided any
value.

Also remove EntitySet.find_path as it is no longer used.
This condition can never be True because it were True then the
condition in _run_dfs would have been True and would have returned.
@CJStadler
Copy link
Contributor Author

Back to you @kmax12. I made once change in 3634849 that you might want to look at.

Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes, otherwise I think this is good to go

docs/source/changelog.rst Outdated Show resolved Hide resolved
@CJStadler CJStadler requested a review from kmax12 June 20, 2019 20:17
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CJStadler CJStadler merged commit 321a0f4 into master Jun 21, 2019
@CJStadler CJStadler deleted the multipath-dfs-no-cycles branch June 21, 2019 16:42
@rwedge rwedge mentioned this pull request Jul 3, 2019
johnnyheineken pushed a commit to johnnyheineken/featuretools that referenced this pull request Jul 7, 2019
For entitysets with multiple paths between two entities DFS will now
create features through all paths. Entitysets with cycles in the
directed graph are not supported.

- Use feature.unique_name instead of hash in dicts
- Change EntitySet.get_backward_entities  and get_forward_entities
  to yield paths.
- Add RelationshipPath which represents a directed path through
  relationships.
- Add RelationshipPath.entities
- Warn if dfs produces a feature multiple times
- Remove max_hlevel from DFS (alteryx#608).
- Remove EntitySet.find_path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants