-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix approximate compose feature matrix type closes #1165 #1166
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1166 +/- ##
=======================================
Coverage 98.60% 98.60%
=======================================
Files 130 130
Lines 13927 13932 +5
=======================================
+ Hits 13733 13738 +5
Misses 194 194
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a check to the regular compose label times test confirming the feature matrix is not a compose labeltimes object?
docs/source/changelog.rst
Outdated
@@ -7,6 +7,7 @@ Changelog | |||
* Fixes | |||
* Allow FeatureOutputSlice features to be serialized (:pr:`1150`) | |||
* Fix duplicate label column generation when labels are passed in cutoff times and approximate is being used (:pr:`1160`) | |||
* Fix approximate compose feature matrix type (:pr:`1166`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this changelog entry longer and more detailed
def label_func(df): | ||
return df['value'].sum() > 10 | ||
|
||
lm = cp.LabelMaker( | ||
target_entity='id', | ||
time_index='datetime', | ||
labeling_function=label_func, | ||
window_size='1m' | ||
) | ||
|
||
df = es['log'].df | ||
df = to_pandas(df) | ||
labels = lm.search( | ||
df, | ||
num_examples_per_instance=-1 | ||
) | ||
labels = labels.rename(columns={'cutoff_time': 'time'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These exact label times are used in separate test as well. Perhaps we should convert them to a fixture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be outside of the scope of this PR, but the docstring for create_feature_matrix has the return type as pd.DataFrame: The feature matrix.
which won't be true for dask or koalas dataframes. It might be worth adding checks like the one in this PR assert(type(feature_matrix) == pd.core.frame.DataFrame)
for the dask and koalas feature matrices calculated from cfm in test_cfm_compose
and test_cfm_dask_compose
cutoff_time=labels, | ||
approximate='1s', | ||
verbose=True) | ||
assert(type(feature_matrix) == pd.core.frame.DataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could do isinstance(feature_matrix, pd.core.frame.DataFrame)
if you'd like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the compose LabelTimes object is a subclass of pd.DataFrame, so an isinstance check would still return True if the feature matrix was a LabelTimes object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the contributors section of the changelog, can you remove the space between your name and the :user:
tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Fix approximate compose feature matrix type
Using a the feature matrix as the base frame instead of cutoff times preserves DataFrame type
After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of
docs/source/changelog.rst
to include this pull request.