Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix approximate compose feature matrix type closes #1165 #1166

Merged
merged 11 commits into from
Sep 29, 2020
Merged

Conversation

tuethan1999
Copy link
Contributor

Fix approximate compose feature matrix type

Using a the feature matrix as the base frame instead of cutoff times preserves DataFrame type


After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of docs/source/changelog.rst to include this pull request.

@codecov
Copy link

codecov bot commented Sep 23, 2020

Codecov Report

Merging #1166 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1166   +/-   ##
=======================================
  Coverage   98.60%   98.60%           
=======================================
  Files         130      130           
  Lines       13927    13932    +5     
=======================================
+ Hits        13733    13738    +5     
  Misses        194      194           
Impacted Files Coverage Δ
...computational_backends/calculate_feature_matrix.py 99.10% <100.00%> (ø)
...utational_backend/test_calculate_feature_matrix.py 99.43% <100.00%> (-0.01%) ⬇️
featuretools/tests/conftest.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2294837...1502522. Read the comment docs.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a check to the regular compose label times test confirming the feature matrix is not a compose labeltimes object?

@@ -7,6 +7,7 @@ Changelog
* Fixes
* Allow FeatureOutputSlice features to be serialized (:pr:`1150`)
* Fix duplicate label column generation when labels are passed in cutoff times and approximate is being used (:pr:`1160`)
* Fix approximate compose feature matrix type (:pr:`1166`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this changelog entry longer and more detailed

Comment on lines 191 to 207
def label_func(df):
return df['value'].sum() > 10

lm = cp.LabelMaker(
target_entity='id',
time_index='datetime',
labeling_function=label_func,
window_size='1m'
)

df = es['log'].df
df = to_pandas(df)
labels = lm.search(
df,
num_examples_per_instance=-1
)
labels = labels.rename(columns={'cutoff_time': 'time'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These exact label times are used in separate test as well. Perhaps we should convert them to a fixture?

Copy link
Contributor

@tamargrey tamargrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be outside of the scope of this PR, but the docstring for create_feature_matrix has the return type as pd.DataFrame: The feature matrix. which won't be true for dask or koalas dataframes. It might be worth adding checks like the one in this PR assert(type(feature_matrix) == pd.core.frame.DataFrame) for the dask and koalas feature matrices calculated from cfm in test_cfm_compose and test_cfm_dask_compose

cutoff_time=labels,
approximate='1s',
verbose=True)
assert(type(feature_matrix) == pd.core.frame.DataFrame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do isinstance(feature_matrix, pd.core.frame.DataFrame) if you'd like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compose LabelTimes object is a subclass of pd.DataFrame, so an isinstance check would still return True if the feature matrix was a LabelTimes object

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the contributors section of the changelog, can you remove the space between your name and the :user: tag

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@tuethan1999 tuethan1999 merged commit 941c61b into main Sep 29, 2020
@tuethan1999 tuethan1999 deleted the iss1165 branch September 29, 2020 15:44
@tuethan1999 tuethan1999 mentioned this pull request Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Approximate Compose calculate_feature_matrix produces LabelTime object instead of Dataframe
3 participants