Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to pass in desired feature return types #372

Merged
merged 19 commits into from
Feb 6, 2019

Conversation

RogerTangos
Copy link
Contributor

@RogerTangos RogerTangos commented Jan 15, 2019

  • dfs() can be passed a list of allowed_variable_types, None or 'all'. This filters returned features based on their output types.
  • None defaults to [Numeric, Discrete, Boolean]
  • Docstrings updated accordingly
  • Completes work begun (but unexposed in the api) in ff70cbe, but
  • Tests changes

@codecov
Copy link

codecov bot commented Jan 15, 2019

Codecov Report

Merging #372 into master will increase coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #372      +/-   ##
==========================================
+ Coverage   95.99%   96.09%   +0.09%     
==========================================
  Files          92       92              
  Lines        7996     8021      +25     
==========================================
+ Hits         7676     7708      +32     
+ Misses        320      313       -7
Impacted Files Coverage Δ
featuretools/synthesis/deep_feature_synthesis.py 96.57% <100%> (+2%) ⬆️
...ols/tests/dfs_tests/test_deep_feature_synthesis.py 98.49% <100%> (+0.12%) ⬆️
featuretools/synthesis/dfs.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de2ff0e...c10c4b3. Read the comment docs.

@RogerTangos
Copy link
Contributor Author

There were some initial strange linting errors here, which are now fixed. (strange, because the lines in question were from the master branch.)

Now this seems to be failing on pandas 0.24.0. Once #383 fixes that, I'll merge master back in.

@RogerTangos
Copy link
Contributor Author

I fixed a bunch of indentation errors that were being picked up by the linter update, and now codecov/patch is knocking the coverage score for it. Many of the fixes it's penalizing were on commented lines.

Would you like me to fix these, or can we go ahead and merge? On the original PR, the coverage was unchanged.

@kmax12
Copy link
Contributor

kmax12 commented Jan 30, 2019

@RogerTangos thanks for investigating the linting issues and resolving. we still need to review this before it's ready to merge. we're currently preparing the next release of featuretools (likely to go out this week), but after that we will return to review this.

@RogerTangos RogerTangos changed the title Feature return types Allow user to pass in desired feature return types Feb 4, 2019
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for working on this. It is looking good overall. The main thing missing is a test case so we can verify behavior works as intended.

featuretools/synthesis/deep_feature_synthesis.py Outdated Show resolved Hide resolved
variable_type=[Numeric,
Categorical,
Ordinal],
variable_type=allowed_variable_types,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually shouldn't be filtering any features based on type here since happens on line 216.

That means the correct change here would be to actually to update _features_by_type to not filter based on variable_type if variable_type == None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll update this.

@@ -266,7 +268,8 @@ def _filter_features(self, features):

return f_keep

def _run_dfs(self, entity, entity_path, all_features, max_depth):
def _run_dfs(self, entity, entity_path, all_features, max_depth,
allowed_variable_types=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we remove allowed_variable_types from _build_forward_features, we no longer need this parameter to _run_dfs

@@ -31,7 +31,8 @@ def dfs(entities=None,
chunk_size=None,
n_jobs=1,
dask_kwargs=None,
verbose=False):
verbose=False,
allowed_variable_types=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to return_variable_types per comment in deep_feature_synthesis.py

@@ -31,7 +31,8 @@ def dfs(entities=None,
chunk_size=None,
n_jobs=1,
dask_kwargs=None,
verbose=False):
verbose=False,
allowed_variable_types=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test cases for this change before we can merge

@RogerTangos
Copy link
Contributor Author

Thanks @kmax12. It's the end of the day here, so I'll get to these updates and the tests tomorrow. Thank you for the review.

@RogerTangos
Copy link
Contributor Author

@kmax12 - requested changes made.

@@ -606,7 +608,9 @@ def _build_agg_features(self, all_features,

self._handle_new_feature(new_f, all_features)

def _features_by_type(self, all_features, entity, variable_type, max_depth):
def _features_by_type(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep the style of our other code - let's keep this as one line or break up the arguments over two lines

@RogerTangos
Copy link
Contributor Author

Updates to everything but line 212. I did actually write some code for this, but IMO, it made the file harder to understand, not easier. If you have an opinion on how that should be done, feel free to copy past and I'll add it.

@CLAassistant
Copy link

CLAassistant commented Feb 5, 2019

CLA assistant check
All committers have signed the CLA.

@kmax12
Copy link
Contributor

kmax12 commented Feb 6, 2019

Looks good. Merging!

@kmax12 kmax12 merged commit d0bad62 into alteryx:master Feb 6, 2019
@RogerTangos RogerTangos deleted the feature-return-types branch February 6, 2019 08:43
kmax12 pushed a commit that referenced this pull request Feb 6, 2019
* Make S3 dependencies optional

* Allow user to pass in desired feature return types (#372)

* change method var variable_types to prevent overloading

* add and pass allowed_variable_types var to _run_dfs and _build_forward_features

* allowed_variable_types can be passed to dfs

* correct docstrings for allowed_variable_types in dfs and deep_feature_synthesis.build_features

* fix linting issues.

* fix linting errors

* fix another linting error

* fix F632 linter error

* add test. rename build_features return_variable_types arg.

* remove redundant return_variable_types attr from deep_feature_synthesis._run_dfs()

* add datetime tests. minor formatting change.

* remove bad line break

* Update deep_feature_synthesis.py

* Make S3 dependencies optional

* Move boto3 back to regular dependencies

* Move s3fs usage into NoCredentialsError

* Make S3 dependencies optional

* Move boto3 back to regular dependencies

* Move s3fs usage into NoCredentialsError
@rwedge rwedge mentioned this pull request Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants