Allow user to pass in desired feature return types#372
Allow user to pass in desired feature return types#372kmax12 merged 19 commits intoalteryx:masterfrom
Conversation
…_synthesis.build_features
Codecov Report
@@ Coverage Diff @@
## master #372 +/- ##
==========================================
+ Coverage 95.99% 96.09% +0.09%
==========================================
Files 92 92
Lines 7996 8021 +25
==========================================
+ Hits 7676 7708 +32
+ Misses 320 313 -7
Continue to review full report at Codecov.
|
|
There were some initial strange linting errors here, which are now fixed. (strange, because the lines in question were from the master branch.) Now this seems to be failing on pandas 0.24.0. Once #383 fixes that, I'll merge master back in. |
|
I fixed a bunch of indentation errors that were being picked up by the linter update, and now codecov/patch is knocking the coverage score for it. Many of the fixes it's penalizing were on commented lines. Would you like me to fix these, or can we go ahead and merge? On the original PR, the coverage was unchanged. |
|
@RogerTangos thanks for investigating the linting issues and resolving. we still need to review this before it's ready to merge. we're currently preparing the next release of featuretools (likely to go out this week), but after that we will return to review this. |
kmax12
left a comment
There was a problem hiding this comment.
Thank for working on this. It is looking good overall. The main thing missing is a test case so we can verify behavior works as intended.
| variable_type=[Numeric, | ||
| Categorical, | ||
| Ordinal], | ||
| variable_type=allowed_variable_types, |
There was a problem hiding this comment.
I think we actually shouldn't be filtering any features based on type here since happens on line 216.
That means the correct change here would be to actually to update _features_by_type to not filter based on variable_type if variable_type == None.
There was a problem hiding this comment.
Good point. I'll update this.
|
|
||
| def _run_dfs(self, entity, entity_path, all_features, max_depth): | ||
| def _run_dfs(self, entity, entity_path, all_features, max_depth, | ||
| allowed_variable_types=None): |
There was a problem hiding this comment.
once we remove allowed_variable_types from _build_forward_features, we no longer need this parameter to _run_dfs
| dask_kwargs=None, | ||
| verbose=False): | ||
| verbose=False, | ||
| allowed_variable_types=None): |
There was a problem hiding this comment.
rename to return_variable_types per comment in deep_feature_synthesis.py
| dask_kwargs=None, | ||
| verbose=False): | ||
| verbose=False, | ||
| allowed_variable_types=None): |
There was a problem hiding this comment.
please add test cases for this change before we can merge
|
Thanks @kmax12. It's the end of the day here, so I'll get to these updates and the tests tomorrow. Thank you for the review. |
|
@kmax12 - requested changes made. |
| self._handle_new_feature(new_f, all_features) | ||
|
|
||
| def _features_by_type(self, all_features, entity, variable_type, max_depth): | ||
| def _features_by_type( |
There was a problem hiding this comment.
to keep the style of our other code - let's keep this as one line or break up the arguments over two lines
|
Updates to everything but line 212. I did actually write some code for this, but IMO, it made the file harder to understand, not easier. If you have an opinion on how that should be done, feel free to copy past and I'll add it. |
|
Looks good. Merging! |
* Make S3 dependencies optional * Allow user to pass in desired feature return types (#372) * change method var variable_types to prevent overloading * add and pass allowed_variable_types var to _run_dfs and _build_forward_features * allowed_variable_types can be passed to dfs * correct docstrings for allowed_variable_types in dfs and deep_feature_synthesis.build_features * fix linting issues. * fix linting errors * fix another linting error * fix F632 linter error * add test. rename build_features return_variable_types arg. * remove redundant return_variable_types attr from deep_feature_synthesis._run_dfs() * add datetime tests. minor formatting change. * remove bad line break * Update deep_feature_synthesis.py * Make S3 dependencies optional * Move boto3 back to regular dependencies * Move s3fs usage into NoCredentialsError * Make S3 dependencies optional * Move boto3 back to regular dependencies * Move s3fs usage into NoCredentialsError
dfs()can be passed a list ofallowed_variable_types,Noneor'all'. This filters returned features based on their output types.Nonedefaults to[Numeric, Discrete, Boolean]