Skip to content

Entropy aggregation primitive#779

Merged
rwedge merged 13 commits into
alteryx:masterfrom
twdobson:master
Oct 28, 2019
Merged

Entropy aggregation primitive#779
rwedge merged 13 commits into
alteryx:masterfrom
twdobson:master

Conversation

@twdobson

Copy link
Copy Markdown
Contributor

Entropy aggregation primitive

For details on entropy please see:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.entropy.html

@twdobson

twdobson commented Oct 18, 2019

Copy link
Copy Markdown
Contributor Author

Hi @rwedge, had issues with assigning github account to my commits with my previous PR, so I redid the PR.

Previously when running 'make html' the Entropy aggregation primitive was not automatically generating documentation. Is there any extra step that I am missing? In addition, currently I am getting the following error when I run 'make html' now:
"Exception occurred:
File "/usr/local/Caskroom/miniconda/base/envs/dev_featuretools/lib/python3.6/site-packages/IPython/sphinxext/ipython_directive.py", line 586, in process_input
raise RuntimeError('Non Expected warning in {} line {}'.format(filename, lineno))
RuntimeError: Non Expected warning in /usr/local/Caskroom/miniconda/base/envs/dev_featuretools/lib/python3.6/site-packages/featuretools/wrappers/sklearn.py:docstring of featuretools.wrappers.DFSTransformer.__init__ line 101"

@rwedge

rwedge commented Oct 18, 2019

Copy link
Copy Markdown
Contributor

My mistake, you should add entropy to the Aggregation Primitives section of docs/source/api_reference.rst so it gets included in the docs

That docs error is strange, does you environment warn you when running this code:

import featuretools as ft
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.ensemble import ExtraTreesClassifier

@twdobson

Copy link
Copy Markdown
Contributor Author

There were some issues with the packages in the environment. Have a dedicated featuretools env now, so that should be sorted.

Apologies for the silly questions, still getting the hang of the contributing and getting everything setup. I am getting the following error when trying to make the docs:
"
Configuration error:
There is a programmable error in your configuration file:

Traceback (most recent call last):
File "/usr/local/Caskroom/miniconda/base/envs/featuretools_development/lib/python3.7/site-packages/sphinx/config.py", line 361, in eval_config_file
execfile_(filename, namespace)
File "/usr/local/Caskroom/miniconda/base/envs/featuretools_development/lib/python3.7/site-packages/sphinx/util/pycompat.py", line 86, in execfile_
exec(code, _globals)
File "/Users/timothydobson/work/repos/featuretools_dev/docs/source/conf.py", line 87, in
version = featuretools.version
AttributeError: module 'featuretools' has no attribute 'version'
"

@twdobson

Copy link
Copy Markdown
Contributor Author

I also fail the "featuretools/tests/cli_tests/test_cli.py " test locally - however, when I install featuretools into my env this issue (and configuration error) disappear but then the documentation does not build properly. I think it's because everything is imported from the version installed in the env and not from the source code i'm working off.

@rwedge

rwedge commented Oct 21, 2019

Copy link
Copy Markdown
Contributor

If you run

pip install -e /path/to/featuretools/repo

featuretools will get installed in "editable" mode and changes you make to the source code will be reflected when you import featuretools again

I would try re-installing featuretools in the env in editable mode

@twdobson

Copy link
Copy Markdown
Contributor Author

@rwedge, thanks for the help - editable mode did the trick.

@twdobson

Copy link
Copy Markdown
Contributor Author

Hey @kmax12, @rwedge, I think the PR is ready for review! Thanks for all the help, let me know of any other changes.

@rwedge

rwedge commented Oct 24, 2019

Copy link
Copy Markdown
Contributor

@twdobson code looks good, just thinking about what range we would want to use for the scipy dependency

@twdobson

twdobson commented Oct 24, 2019

Copy link
Copy Markdown
Contributor Author

@rwedge, thanks.

Would the oldest version of scipy that has entropy, with the required parameters, work?

Alternatively, we could implement the function via numpy, the formula is simple: Entropy = -sum(pk * log(pk), axis=0), where pk = probability of categorical, which is the output from value_counts(normalize =True)

@rwedge

rwedge commented Oct 25, 2019

Copy link
Copy Markdown
Contributor

@twdobson it looks like scipy 0.11.0 is when the base parameter was added to the entropy function but our minimum scikit-learn version requires scipy 0.13.3 or higher so that may be a more practical requirement

@twdobson

Copy link
Copy Markdown
Contributor Author

@rwedge, sounds good. I have updated the PR to reflect this.

Quick two questions:

  1. how did you go about figuring the which version to use? Did you use pipdeptree?
  2. Once new enhancements are made for feature generation, is there a process for testing uplift or importance across different response sets?

@twdobson

Copy link
Copy Markdown
Contributor Author

Let me know if there is anything else that needs to be updated.

@rwedge rwedge left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good

As to your questions:

  1. I installed featuretools into a clean environment and checked the pip output to see which package required scipy. I did try using pipdeptree after you mentioned it, it also came up with scipy 0.13.3 being necessary for sklearn 0.20.4
  2. We don't test how new primitives impact model performance or feature importance

@rwedge rwedge merged commit 35319fb into alteryx:master Oct 28, 2019
@rwedge rwedge mentioned this pull request Oct 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants