Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English sentence feature descriptions #1201

Merged
merged 21 commits into from Oct 30, 2020
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
0a1bfda
initial example description generators
frances-h Oct 7, 2020
f237cc6
add version that stores primitive templates on primitives
frances-h Oct 13, 2020
bf33f5a
update runon and combined methods and add tests
frances-h Oct 22, 2020
65acc99
update tests and switch to runon
frances-h Oct 26, 2020
e75ed4d
Merge branch 'main' into feature-descriptions
frances-h Oct 26, 2020
5da01a3
update tests
frances-h Oct 26, 2020
f0797b0
lint and coverage updates
frances-h Oct 26, 2020
6d75d64
release notes and update primitive templates
frances-h Oct 27, 2020
350a7e7
Merge branch 'main' into feature-descriptions
frances-h Oct 27, 2020
17251ef
template updates and description utils tests
frances-h Oct 28, 2020
9824c34
Merge branch 'main' into feature-descriptions
frances-h Oct 28, 2020
2a75760
Merge branch 'main' into feature-descriptions
frances-h Oct 28, 2020
23a0391
add feature_descriptions to docs
frances-h Oct 29, 2020
fb5443e
Merge branch 'main' into feature-descriptions
frances-h Oct 29, 2020
d1176f6
update guide and add class name as primitive description fallback
frances-h Oct 29, 2020
69150d3
use nth_slice instead of slice_num in primitive template
frances-h Oct 29, 2020
1b93f65
add feature lineage graphs and feature descriptions to docs index
frances-h Oct 29, 2020
def6a37
add no primitive name generic test
frances-h Oct 30, 2020
4134a80
update docs and test
frances-h Oct 30, 2020
2f04cdd
update docs feature reference
frances-h Oct 30, 2020
d364783
doc updates
frances-h Oct 30, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/api_reference.rst
Expand Up @@ -226,6 +226,14 @@ Feature calculation
calculate_feature_matrix
.. approximate_features

Feature descriptions
~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: featuretools
.. autosummary::
:toctree: generated/

describe_feature

Feature visualization
~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: featuretools
Expand Down
3 changes: 1 addition & 2 deletions docs/source/getting_started/afe.rst
Expand Up @@ -92,8 +92,7 @@ For each customer this feature calculates
Stacking results in features that are more expressive than individual primitives themselves. This enables the automatic creation of complex patterns for machine learning.

.. note ::

You can graphically visualize the lineage of a feature by calling :func:`featuretools.graph_feature` on it.
You can graphically visualize the lineage of a feature by calling :func:`featuretools.graph_feature` on it. You can also generate an English description of the feature with :func:`featuretools.describe_feature`. See :doc:`/guides/feature_descriptions` for more details.


Changing Target Entity
Expand Down
71 changes: 71 additions & 0 deletions docs/source/getting_started/graphs/demo_feat.dot
@@ -0,0 +1,71 @@
digraph "MODE(transactions.WEEKDAY(transaction_time))" {
graph [bb="0,0,1212.7,152",
rankdir=LR
];
node [label="\N",
shape=box
];
edge [arrowhead=none,
dir=forward,
style=dotted
];
{
graph [rank=min];
"1_WEEKDAY(transaction_time)_weekday" [height=0.94444,
label=<<FONT POINT-SIZE="12"><B>Step 1:</B> Transform<BR></BR></FONT>WEEKDAY>,
pos="108.69,58",
shape=diamond,
width=3.0192];
}
sessions [height=1.1806,
label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
<TR>
<TD colspan="1" bgcolor="#A9A9A9"><B>★ sessions (target)</B></TD>
</TR>
<TR>
<TD ALIGN="LEFT" port="MODE(transactions.WEEKDAY(transaction_time))" BGCOLOR="#D9EAD3">MODE(transactions.WEEKDAY(transaction_time))</TD>
</TR>
</TABLE>>,
pos="1047.2,61",
shape=plaintext,
width=4.5972];
transactions [height=2.1111,
label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
<TR>
<TD colspan="1" bgcolor="#A9A9A9"><B>transactions</B></TD>
</TR><TR><TD ALIGN="LEFT" port="transaction_time">transaction_time</TD></TR>
<TR><TD ALIGN="LEFT" port="session_id">session_id</TD></TR>
<TR><TD ALIGN="LEFT" port="WEEKDAY(transaction_time)">WEEKDAY(transaction_time)</TD></TR>
</TABLE>>,
pos="358.38,76",
shape=plaintext,
width=2.9167];
transactions:transaction_time -> "1_WEEKDAY(transaction_time)_weekday" [arrowhead="",
pos="e,159.9,76.211 260.38,94 229.96,94 197,86.972 169.69,79.127",
style=solid];
"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" [height=0.5,
label="group by
session_id",
pos="536.55,40",
width=1.0325];
transactions:"WEEKDAY(transaction_time)" -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" [arrowhead="",
pos="e,499.28,28.195 456.38,22 467.17,22 478.58,23.609 489.28,25.865",
style=solid];
transactions:session_id -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" [pos="456.38,58 470.62,58 485.94,55.198 499.28,51.805"];
"0_MODE(transactions.WEEKDAY(transaction_time))_mode" [height=0.94444,
label=<<FONT POINT-SIZE="12"><B>Step 2:</B> Aggregation<BR></BR></FONT>MODE>,
pos="727.71,40",
shape=diamond,
width=3.2776];
"0_MODE(transactions.WEEKDAY(transaction_time))_mode" -> sessions:"MODE(transactions.WEEKDAY(transaction_time))" [arrowhead="",
pos="e,889.21,40 845.73,40 856.82,40 867.99,40 878.87,40",
style=solid];
"1_WEEKDAY(transaction_time)_weekday" -> transactions:"WEEKDAY(transaction_time)" [arrowhead="",
pos="e,260.38,22 159.9,39.789 186.04,31.732 219,23.731 250.29,22.245",
style=solid];
"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" -> "0_MODE(transactions.WEEKDAY(transaction_time))_mode" [arrowhead="",
pos="e,609.69,40 574.02,40 581.83,40 590.49,40 599.62,40",
style=solid];
}
118 changes: 118 additions & 0 deletions docs/source/guides/feature_descriptions.rst
@@ -0,0 +1,118 @@
Generating Feature Descriptions
================================
As features become more complicated, their names can become harder to understand. Both the :func:`featuretools.describe_feature` function and the :func:`featuretools.graph_feature` function can help explain what a feature is and the steps Featuretools took to generate it. Additionally, the ``describe_feature`` function can be augmented by providing custom definitions and templates to improve the resulting descriptions.

.. ipython:: python
:suppress:

import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)

feature_defs = ft.dfs(entityset=es,
target_entity="customers",
agg_primitives=["mean", "sum", "mode", "n_most_common"],
trans_primitives=["month", "hour"],
max_depth=2,
features_only=True)
features = {feature.get_name(): feature for feature in feature_defs}

By default, ``describe_feature`` uses the existing variable and entity names and the default primitive description templates to generate feature descriptions.

.. ipython:: python

ft.describe_feature(features['HOUR(date_of_birth)'])
ft.describe_feature(features['MEAN(sessions.SUM(transactions.amount))'])

Improving Descriptions
~~~~~~~~~~~~~~~~~~~~~~~
While the default descriptions can be helpful, they can also be further improved by providing custom definitions of variables and features, and by providing alternative templates for primitive descriptions.

Feature Descriptions
---------------------
Custom feature definitions will get used in the description in place of the automatically generated description. This can be used to better explain what a variable or feature is, or to provide descriptions that take advantage of a user's existing knowledge about the data or domain.

.. ipython:: python

feature_descriptions = {
'customers: join_date': 'the date the customer joined'}

ft.describe_feature(features['HOUR(join_date)'],
feature_descriptions=feature_descriptions)

For example, the above replaces the variable name ``"join_date"`` with a more descriptive definition of what that variable represents in the dataset. Feature descriptions can also be provided for generated features.

.. ipython:: python

feature_descriptions = {
'sessions: SUM(transactions.amount)': 'the total transaction amount for a session'}

ft.describe_feature(features['MEAN(sessions.SUM(transactions.amount))'],
feature_descriptions=feature_descriptions)


Here, we create and pass in a custom description of the intermediate feature ``SUM(transactions.amount)``. The description for ``MEAN(sessions.SUM(transactions.amount))``, which is built on top of ``SUM(transactions.amount)``, uses the custom description in place of the automatically generated one. Feature descriptions can be passed in as a dictionary that maps the custom descriptions to either the feature object itself or the unique feature name in the form ``"[entity_name]: [feature_name]"``, as shown above.

Primitive Templates
--------------------
Primitives descriptions are generated using primitive templates. By default, these are defined using the ``description_template`` attribute on the primitive. Primitives without a template default to using the ``name`` attribute of the primitive if it is defined, or the class name if it is not. Primitive description templates are string templates that take input feature descriptions as the positional arguments. These can be overwritten by mapping primitive instances or primitive names to custom templates and passing them into ``describe_feature`` through the ``primitive_templates`` argument.
frances-h marked this conversation as resolved.
Show resolved Hide resolved

.. ipython:: python

primitive_templates = {
'sum': 'the total of {}'}
frances-h marked this conversation as resolved.
Show resolved Hide resolved

ft.describe_feature(features['SUM(transactions.amount)'],
primitive_templates=primitive_templates)


Multi-output primitives can use a list of primitive description templates to differentiate between the generic multi-output feature description and the feature slice descriptions. The first primitive template is always the generic overall feature. If only one other template is provided, it is used as the template for all slices. The slice number converted to the "nth" form is available through the ``nth_slice`` keyword.

.. ipython:: python

primitive_templates = {
'n_most_common': [
'the 3 most common elements of {}', # generic multi-output feature
'the {nth_slice} most common element of {}']} # template for each slice

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'],
primitive_templates=primitive_templates)

Notice how the multi-output feature uses the first template for its description. Each slice of this feature will use the second slice template:

.. ipython:: python

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'][0],
primitive_templates=primitive_templates)

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'][1],
primitive_templates=primitive_templates)

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'][2],
primitive_templates=primitive_templates)


Alternatively, instead of supplying a single template for all slices, templates can be provided for each slice to further customize the output. Note that in this case, each slice must get its own template.

.. ipython:: python

primitive_templates = {
'n_most_common': [
'the 3 most common elements of {}',
'the most common element of {}',
'the second most common element of {}',
'the third most common element of {}']}

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'],
primitive_templates=primitive_templates)

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'][0],
primitive_templates=primitive_templates)

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'][1],
primitive_templates=primitive_templates)

ft.describe_feature(features['N_MOST_COMMON(sessions.device)'][2],
primitive_templates=primitive_templates)


Custom feature descriptions and primitive templates can also be seperately defined in a JSON file and passed to the ``describe_feature`` function using the ``metadata_file`` keyword argument.
1 change: 1 addition & 0 deletions docs/source/guides/guides_index.rst
Expand Up @@ -13,4 +13,5 @@ Guides on more advanced Featuretools functionality
using_koalas_entitysets
deployment
advanced_custom_primitives
feature_descriptions
feature_selection
29 changes: 29 additions & 0 deletions docs/source/index.rst
Expand Up @@ -117,6 +117,35 @@ One of the reasons DFS is so powerful is that it can create a feature matrix for
feature_matrix_sessions.head(5)


Understanding Feature Output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. ipython:: python
:suppress:

features = {f.get_name(): f for f in features_defs}
frances-h marked this conversation as resolved.
Show resolved Hide resolved

In general, Featuretools references generated features through the feature name. In order to make features easier to understand, Featuretools offers two additional tools, :func:`featuretools.graph_feature` and :func:`featuretools.describe_feature`, to help explain what a feature is and the steps Featuretools took to generate it.

Feature lineage graphs
""""""""""""""""""""""
Feature lineage graphs visually walk through feature generation. Starting from the base data, they show step by step the primitives applied and intermediate features generated to create the final feature.

.. ipython:: python

ft.graph_feature(features['MODE(transactions.WEEKDAY(transaction_time))'])


.. graphviz:: getting_started/graphs/demo_feat.dot

Feature descriptions
""""""""""""""""""""
Featuretools can also automatically generate English sentence descriptions of features. Feature descriptions help to explain what a feature is, and can be further improved by including manually defined custom definitions. See :doc:`/guides/feature_descriptions` for more details on how to customize automatically generated feature descriptions.

.. ipython:: python

ft.describe_feature(features['MODE(transactions.WEEKDAY(transaction_time))'])


.. Technical problems it solves
.. ----------------------------

Expand Down
3 changes: 2 additions & 1 deletion docs/source/release_notes.rst
Expand Up @@ -4,6 +4,7 @@ Release Notes
-------------
**Future Release**
* Enhancements
* Add ``describe_feature`` to generate an English language feature description for a given feature (:pr:`1201`)
* Fixes
* Update ``EntitySet.add_last_time_indexes`` to work with Koalas 1.3.0 (:pr:`1192`, :pr:`1202`)
* Changes
Expand All @@ -18,7 +19,7 @@ Release Notes
* Update premium primitives job name on CI (:pr:`1205`)

Thanks to the following people for contributing to this release:
:user:`gsheni`, :user:`rwedge`, :user:`tamargrey`, :user:`thehomebrewnerd`, :user:`jeff-hernandez`
:user:`gsheni`, :user:`rwedge`, :user:`tamargrey`, :user:`thehomebrewnerd`, :user:`jeff-hernandez`, :user:`frances-h`

**v0.20.0 Sep 30, 2020**
.. warning::
Expand Down
5 changes: 4 additions & 1 deletion docs/source/setup.py
Expand Up @@ -4,11 +4,14 @@

def load_feature_plots():
es = ft.demo.load_mock_customer(return_entityset=True)
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'automated_feature_engineering/graphs/')
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'getting_started/graphs/')
agg_feat = ft.AggregationFeature(es['sessions']['session_id'], es['customers'], ft.primitives.Count)
trans_feat = ft.TransformFeature(es['customers']['join_date'], ft.primitives.TimeSincePrevious)
demo_feat = ft.AggregationFeature(ft.TransformFeature(es['transactions']['transaction_time'], ft.primitives.Weekday),
es['sessions'], ft.primitives.Mode)
ft.graph_feature(agg_feat, to_file=os.path.join(path, 'agg_feat.dot'))
ft.graph_feature(trans_feat, to_file=os.path.join(path, 'trans_feat.dot'))
ft.graph_feature(demo_feat, to_file=os.path.join(path, 'demo_feat.dot'))


if __name__ == "__main__":
Expand Down
1 change: 1 addition & 0 deletions featuretools/__init__.py
Expand Up @@ -22,6 +22,7 @@
IdentityFeature,
TransformFeature,
graph_feature,
describe_feature,
save_features,
load_features
)
Expand Down
1 change: 1 addition & 0 deletions featuretools/feature_base/api.py
Expand Up @@ -9,6 +9,7 @@
IdentityFeature,
TransformFeature
)
from .feature_descriptions import describe_feature
from .feature_visualizer import graph_feature
from .features_deserializer import load_features
from .features_serializer import save_features