Fix approximate compose feature matrix type closes #1165#1166
Fix approximate compose feature matrix type closes #1165#1166tuethan1999 merged 11 commits intomainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1166 +/- ##
=======================================
Coverage 98.60% 98.60%
=======================================
Files 130 130
Lines 13927 13932 +5
=======================================
+ Hits 13733 13738 +5
Misses 194 194
Continue to review full report at Codecov.
|
rwedge
left a comment
There was a problem hiding this comment.
Can we add a check to the regular compose label times test confirming the feature matrix is not a compose labeltimes object?
| * Fixes | ||
| * Allow FeatureOutputSlice features to be serialized (:pr:`1150`) | ||
| * Fix duplicate label column generation when labels are passed in cutoff times and approximate is being used (:pr:`1160`) | ||
| * Fix approximate compose feature matrix type (:pr:`1166`) |
There was a problem hiding this comment.
Let's make this changelog entry longer and more detailed
| def label_func(df): | ||
| return df['value'].sum() > 10 | ||
|
|
||
| lm = cp.LabelMaker( | ||
| target_entity='id', | ||
| time_index='datetime', | ||
| labeling_function=label_func, | ||
| window_size='1m' | ||
| ) | ||
|
|
||
| df = es['log'].df | ||
| df = to_pandas(df) | ||
| labels = lm.search( | ||
| df, | ||
| num_examples_per_instance=-1 | ||
| ) | ||
| labels = labels.rename(columns={'cutoff_time': 'time'}) |
There was a problem hiding this comment.
These exact label times are used in separate test as well. Perhaps we should convert them to a fixture?
tamargrey
left a comment
There was a problem hiding this comment.
This might be outside of the scope of this PR, but the docstring for create_feature_matrix has the return type as pd.DataFrame: The feature matrix. which won't be true for dask or koalas dataframes. It might be worth adding checks like the one in this PR assert(type(feature_matrix) == pd.core.frame.DataFrame) for the dask and koalas feature matrices calculated from cfm in test_cfm_compose and test_cfm_dask_compose
| cutoff_time=labels, | ||
| approximate='1s', | ||
| verbose=True) | ||
| assert(type(feature_matrix) == pd.core.frame.DataFrame) |
There was a problem hiding this comment.
could do isinstance(feature_matrix, pd.core.frame.DataFrame) if you'd like
There was a problem hiding this comment.
the compose LabelTimes object is a subclass of pd.DataFrame, so an isinstance check would still return True if the feature matrix was a LabelTimes object
rwedge
left a comment
There was a problem hiding this comment.
In the contributors section of the changelog, can you remove the space between your name and the :user: tag
Fix approximate compose feature matrix type
Using a the feature matrix as the base frame instead of cutoff times preserves DataFrame type
After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of
docs/source/changelog.rstto include this pull request.