Entropy aggregation primitive#779
Conversation
|
Hi @rwedge, had issues with assigning github account to my commits with my previous PR, so I redid the PR. Previously when running 'make html' the Entropy aggregation primitive was not automatically generating documentation. Is there any extra step that I am missing? In addition, currently I am getting the following error when I run 'make html' now: |
|
My mistake, you should add entropy to the Aggregation Primitives section of That docs error is strange, does you environment warn you when running this code: |
|
There were some issues with the packages in the environment. Have a dedicated featuretools env now, so that should be sorted. Apologies for the silly questions, still getting the hang of the contributing and getting everything setup. I am getting the following error when trying to make the docs: Traceback (most recent call last): |
|
I also fail the "featuretools/tests/cli_tests/test_cli.py " test locally - however, when I install featuretools into my env this issue (and configuration error) disappear but then the documentation does not build properly. I think it's because everything is imported from the version installed in the env and not from the source code i'm working off. |
|
If you run featuretools will get installed in "editable" mode and changes you make to the source code will be reflected when you import featuretools again I would try re-installing featuretools in the env in editable mode |
Then re-add twdobson to changelog
|
@rwedge, thanks for the help - editable mode did the trick. |
|
@twdobson code looks good, just thinking about what range we would want to use for the scipy dependency |
|
@rwedge, thanks. Would the oldest version of scipy that has entropy, with the required parameters, work? Alternatively, we could implement the function via numpy, the formula is simple: Entropy = -sum(pk * log(pk), axis=0), where pk = probability of categorical, which is the output from value_counts(normalize =True) |
|
@twdobson it looks like scipy 0.11.0 is when the |
|
@rwedge, sounds good. I have updated the PR to reflect this. Quick two questions:
|
|
Let me know if there is anything else that needs to be updated. |
rwedge
left a comment
There was a problem hiding this comment.
PR looks good
As to your questions:
- I installed featuretools into a clean environment and checked the pip output to see which package required scipy. I did try using pipdeptree after you mentioned it, it also came up with scipy 0.13.3 being necessary for sklearn 0.20.4
- We don't test how new primitives impact model performance or feature importance
Entropy aggregation primitive
For details on entropy please see:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.entropy.html