Skip to content

Fix approximate compose feature matrix type closes #1165#1166

Merged
tuethan1999 merged 11 commits intomainfrom
iss1165
Sep 29, 2020
Merged

Fix approximate compose feature matrix type closes #1165#1166
tuethan1999 merged 11 commits intomainfrom
iss1165

Conversation

@tuethan1999
Copy link
Copy Markdown
Contributor

Fix approximate compose feature matrix type

Using a the feature matrix as the base frame instead of cutoff times preserves DataFrame type


After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of docs/source/changelog.rst to include this pull request.

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 23, 2020

Codecov Report

Merging #1166 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1166   +/-   ##
=======================================
  Coverage   98.60%   98.60%           
=======================================
  Files         130      130           
  Lines       13927    13932    +5     
=======================================
+ Hits        13733    13738    +5     
  Misses        194      194           
Impacted Files Coverage Δ
...computational_backends/calculate_feature_matrix.py 99.10% <100.00%> (ø)
...utational_backend/test_calculate_feature_matrix.py 99.43% <100.00%> (-0.01%) ⬇️
featuretools/tests/conftest.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2294837...1502522. Read the comment docs.

Copy link
Copy Markdown
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a check to the regular compose label times test confirming the feature matrix is not a compose labeltimes object?

Comment thread docs/source/changelog.rst Outdated
* Fixes
* Allow FeatureOutputSlice features to be serialized (:pr:`1150`)
* Fix duplicate label column generation when labels are passed in cutoff times and approximate is being used (:pr:`1160`)
* Fix approximate compose feature matrix type (:pr:`1166`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this changelog entry longer and more detailed

Comment on lines +191 to +207
def label_func(df):
return df['value'].sum() > 10

lm = cp.LabelMaker(
target_entity='id',
time_index='datetime',
labeling_function=label_func,
window_size='1m'
)

df = es['log'].df
df = to_pandas(df)
labels = lm.search(
df,
num_examples_per_instance=-1
)
labels = labels.rename(columns={'cutoff_time': 'time'})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These exact label times are used in separate test as well. Perhaps we should convert them to a fixture?

Copy link
Copy Markdown
Contributor

@tamargrey tamargrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be outside of the scope of this PR, but the docstring for create_feature_matrix has the return type as pd.DataFrame: The feature matrix. which won't be true for dask or koalas dataframes. It might be worth adding checks like the one in this PR assert(type(feature_matrix) == pd.core.frame.DataFrame) for the dask and koalas feature matrices calculated from cfm in test_cfm_compose and test_cfm_dask_compose

cutoff_time=labels,
approximate='1s',
verbose=True)
assert(type(feature_matrix) == pd.core.frame.DataFrame)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do isinstance(feature_matrix, pd.core.frame.DataFrame) if you'd like

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compose LabelTimes object is a subclass of pd.DataFrame, so an isinstance check would still return True if the feature matrix was a LabelTimes object

Copy link
Copy Markdown
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the contributors section of the changelog, can you remove the space between your name and the :user: tag

Copy link
Copy Markdown
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@tuethan1999 tuethan1999 merged commit 941c61b into main Sep 29, 2020
@tuethan1999 tuethan1999 deleted the iss1165 branch September 29, 2020 15:44
@tuethan1999 tuethan1999 mentioned this pull request Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Approximate Compose calculate_feature_matrix produces LabelTime object instead of Dataframe

3 participants