Skip to content

Commit

Permalink
Fix bug with encode_features and features that create multiple columns (
Browse files Browse the repository at this point in the history
#622)

* fix indent

* tests for duplicate columns

* update changelog

* Update changelog.rst
  • Loading branch information
rwedge committed Jul 2, 2019
1 parent 0257f52 commit 7203f2d
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 2 deletions.
2 changes: 2 additions & 0 deletions docs/source/changelog.rst
Expand Up @@ -12,6 +12,8 @@ Changelog
* Filter dataframes through forward relationships (:pr:`625`)
* Specify Dask version in requirements for python 2 (:pr:`627`)
* Keep dataframe sorted by time during feature calculation (:pr:`626`)
* Fix bug in encode_features that created duplicate columns of
features with multiple outputs (:pr:`622`)
* Changes
* Remove unused variance_selection.py file (:pr:`613`)
* Remove Timedelta data param (:pr:`619`)
Expand Down
2 changes: 1 addition & 1 deletion featuretools/synthesis/encode_features.py
Expand Up @@ -74,7 +74,7 @@ def encode_features(feature_matrix, features, top_n=10, include_unknown=True,
assert fname in X.columns, (
"Feature %s not found in feature matrix" % (fname)
)
feature_names.append(fname)
feature_names.append(fname)

extra_columns = [col for col in X.columns if col not in feature_names]

Expand Down
3 changes: 2 additions & 1 deletion featuretools/tests/synthesis/test_encode_features.py
Expand Up @@ -135,6 +135,7 @@ def test_encode_features_topn(es):
features_enc, feature_defs_enc = encode_features(features,
feature_defs,
include_unknown=True)
assert topn.hash() in [feat.hash() for feat in feature_defs_enc]
assert topn.unique_name() in [feat.unique_name() for feat in feature_defs_enc]
for name in topn.get_feature_names():
assert name in features_enc.columns
assert features_enc.columns.tolist().count(name) == 1

0 comments on commit 7203f2d

Please sign in to comment.