Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entropy aggregation primitive #779

Merged
merged 13 commits into from
Oct 28, 2019
Merged

Entropy aggregation primitive #779

merged 13 commits into from
Oct 28, 2019

Conversation

twdobson
Copy link
Contributor

Entropy aggregation primitive

For details on entropy please see:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.entropy.html

@twdobson
Copy link
Contributor Author

twdobson commented Oct 18, 2019

Hi @rwedge, had issues with assigning github account to my commits with my previous PR, so I redid the PR.

Previously when running 'make html' the Entropy aggregation primitive was not automatically generating documentation. Is there any extra step that I am missing? In addition, currently I am getting the following error when I run 'make html' now:
"Exception occurred:
File "/usr/local/Caskroom/miniconda/base/envs/dev_featuretools/lib/python3.6/site-packages/IPython/sphinxext/ipython_directive.py", line 586, in process_input
raise RuntimeError('Non Expected warning in {} line {}'.format(filename, lineno))
RuntimeError: Non Expected warning in /usr/local/Caskroom/miniconda/base/envs/dev_featuretools/lib/python3.6/site-packages/featuretools/wrappers/sklearn.py:docstring of featuretools.wrappers.DFSTransformer.__init__ line 101"

@rwedge
Copy link
Contributor

rwedge commented Oct 18, 2019

My mistake, you should add entropy to the Aggregation Primitives section of docs/source/api_reference.rst so it gets included in the docs

That docs error is strange, does you environment warn you when running this code:

import featuretools as ft
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.ensemble import ExtraTreesClassifier

@twdobson
Copy link
Contributor Author

There were some issues with the packages in the environment. Have a dedicated featuretools env now, so that should be sorted.

Apologies for the silly questions, still getting the hang of the contributing and getting everything setup. I am getting the following error when trying to make the docs:
"
Configuration error:
There is a programmable error in your configuration file:

Traceback (most recent call last):
File "/usr/local/Caskroom/miniconda/base/envs/featuretools_development/lib/python3.7/site-packages/sphinx/config.py", line 361, in eval_config_file
execfile_(filename, namespace)
File "/usr/local/Caskroom/miniconda/base/envs/featuretools_development/lib/python3.7/site-packages/sphinx/util/pycompat.py", line 86, in execfile_
exec(code, _globals)
File "/Users/timothydobson/work/repos/featuretools_dev/docs/source/conf.py", line 87, in
version = featuretools.version
AttributeError: module 'featuretools' has no attribute 'version'
"

@twdobson
Copy link
Contributor Author

I also fail the "featuretools/tests/cli_tests/test_cli.py " test locally - however, when I install featuretools into my env this issue (and configuration error) disappear but then the documentation does not build properly. I think it's because everything is imported from the version installed in the env and not from the source code i'm working off.

@rwedge
Copy link
Contributor

rwedge commented Oct 21, 2019

If you run

pip install -e /path/to/featuretools/repo

featuretools will get installed in "editable" mode and changes you make to the source code will be reflected when you import featuretools again

I would try re-installing featuretools in the env in editable mode

@twdobson
Copy link
Contributor Author

@rwedge, thanks for the help - editable mode did the trick.

@twdobson
Copy link
Contributor Author

Hey @kmax12, @rwedge, I think the PR is ready for review! Thanks for all the help, let me know of any other changes.

@rwedge
Copy link
Contributor

rwedge commented Oct 24, 2019

@twdobson code looks good, just thinking about what range we would want to use for the scipy dependency

@twdobson
Copy link
Contributor Author

twdobson commented Oct 24, 2019

@rwedge, thanks.

Would the oldest version of scipy that has entropy, with the required parameters, work?

Alternatively, we could implement the function via numpy, the formula is simple: Entropy = -sum(pk * log(pk), axis=0), where pk = probability of categorical, which is the output from value_counts(normalize =True)

@rwedge
Copy link
Contributor

rwedge commented Oct 25, 2019

@twdobson it looks like scipy 0.11.0 is when the base parameter was added to the entropy function but our minimum scikit-learn version requires scipy 0.13.3 or higher so that may be a more practical requirement

@twdobson
Copy link
Contributor Author

@rwedge, sounds good. I have updated the PR to reflect this.

Quick two questions:

  1. how did you go about figuring the which version to use? Did you use pipdeptree?
  2. Once new enhancements are made for feature generation, is there a process for testing uplift or importance across different response sets?

@twdobson
Copy link
Contributor Author

Let me know if there is anything else that needs to be updated.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good

As to your questions:

  1. I installed featuretools into a clean environment and checked the pip output to see which package required scipy. I did try using pipdeptree after you mentioned it, it also came up with scipy 0.13.3 being necessary for sklearn 0.20.4
  2. We don't test how new primitives impact model performance or feature importance

@rwedge rwedge merged commit 35319fb into alteryx:master Oct 28, 2019
@rwedge rwedge mentioned this pull request Oct 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants