Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primitives as strings in DFS parameters #129

Merged
merged 15 commits into from Apr 11, 2018
Merged

Primitives as strings in DFS parameters #129

merged 15 commits into from Apr 11, 2018

Conversation

bschreck
Copy link
Contributor

@bschreck bschreck commented Apr 4, 2018

dfs() and DeepFeatureSynthesis() can now accept strings (instead of Python classes) for all primitives built into Featuretools.

API looks like this:

ft.dfs(entityset=es, target_entity='test',
          agg_primitives=['sum'],
          trans_primitives=['percentile'],
          where_primitives=['sum', 'percentile'])

@bschreck bschreck requested review from Seth-Rothschild and kmax12 and removed request for Seth-Rothschild April 4, 2018 03:35
@Seth-Rothschild
Copy link
Contributor

Very excited for this. I can take a last pass through the docs once you're all set.

@bschreck
Copy link
Contributor Author

bschreck commented Apr 5, 2018

@Seth-Rothschild ready for you to do that

transform_primitives = get_transform_primitives()
agg_primitives = get_aggregation_primitives()
transform_df = pd.DataFrame({'name': list(transform_primitives.keys()),
'description': [prim.__doc__.split("\n")[0] for prim in transform_primitives.values()]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives different results between docstrings that are

"""
Words go here
"""
"""Words go here
"""

and

""" Words go here
"""

We have all three but should probably settle on one.

Copy link
Contributor

@kmax12 kmax12 Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Seth-Rothschild we use the Google style docstrings for all public functions. for some private functions, we may be inconsistent and should fix those as we see it.

here's the Google style docstrings guide: http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a version which has nonempty first line for all of the primitives. It looks like there's some extra space being added by something in the description (as shown in this image).

EDIT: It's actually right aligned. Pandas style doesn't work with a non-unique index, but if we give it a unique index we can use .style.set_properties(**{'text-align': 'left'})

image

@codecov-io
Copy link

codecov-io commented Apr 5, 2018

Codecov Report

Merging #129 into master will decrease coverage by 0.11%.
The diff coverage is 81.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #129      +/-   ##
==========================================
- Coverage   88.47%   88.36%   -0.12%     
==========================================
  Files          73       73              
  Lines        7558     7598      +40     
==========================================
+ Hits         6687     6714      +27     
- Misses        871      884      +13
Impacted Files Coverage Δ
featuretools/synthesis/dfs.py 100% <ø> (ø) ⬆️
featuretools/primitives/aggregation_primitives.py 92.57% <ø> (ø) ⬆️
featuretools/primitives/transform_primitive.py 97.77% <ø> (ø) ⬆️
featuretools/primitives/binary_transform.py 85.71% <ø> (ø) ⬆️
featuretools/primitives/api.py 100% <ø> (ø) ⬆️
.../feature_function_tests/test_transform_features.py 87.84% <100%> (ø) ⬆️
...ols/tests/dfs_tests/test_deep_feature_synthesis.py 98.32% <100%> (+0.05%) ⬆️
featuretools/primitives/cum_transform_feature.py 97.63% <100%> (ø) ⬆️
...ols/tests/feature_function_tests/test_agg_feats.py 98.51% <100%> (ø) ⬆️
featuretools/primitives/utils.py 85.36% <30%> (-7.88%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59fc551...5528b45. Read the comment docs.

where_primitives (list[:class:`.primitives.AggregationPrimitive`], optional):
only add where clauses to these types of Aggregation
Features.
Default: ["sum", \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - I dont think these all need their own line. can we reform to be 80 chars?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 114 chars (agg defaults) and 104 chars (trans defaults). Keep it on one line anyways? Even removing " Default:" isn't enough to get it under 80.

if isinstance(p, basestring):
prim_obj = agg_prim_dict.get(p.lower(), None)
if prim_obj is None:
prim_obj = trans_prim_dict.get(p.lower(), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's only apply to aggregations, no need to check transform too

:class:`Mode <.primitives.Mode>`]

trans_primitives (list[TransformPrimitive], optional):
Default: ["sum", \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above about single lines

@kmax12
Copy link
Contributor

kmax12 commented Apr 11, 2018

Great work. Merging in

@kmax12 kmax12 merged commit ff70cbe into master Apr 11, 2018
@rwedge rwedge mentioned this pull request Apr 13, 2018
rwedge added a commit that referenced this pull request Apr 13, 2018
**v0.1.20** Apr 13, 2018
* Improved chunking when calculating feature matrices  (#121)
* Primitives as strings in DFS parameters (#129)
* Integer time index bugfixes (#128)
* Add make_temporal_cutoffs utility function (#126)
* Show all entities, switch shape display to row/col (#124)
* fixed num characters nan fix (#118)
* modify ignore_variables docstring (#117)
@kmax12 kmax12 deleted the dfs-prim-strings branch June 11, 2018 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants