-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let tsfresh choose the value column if possible and increase test coverage #722
Conversation
You have style errors. See them below. ./tsfresh/feature_extraction/data.py:149:1: E302 expected 2 blank lines, found 1 |
Ni @nils-braun. Apparently I missed that. Thanks! But why did you change the |
I though that it makes the logic of the None-column_value easier, but you are right, in the end both is fine. |
@hoesler Are you fine with merging this? |
I assume you are :-) If not, just comment and we can change it back! |
Result of Benchmark Tests
|
@@ -164,17 +172,16 @@ def __init__(self, df, column_id, column_sort=None, value_columns=None): | |||
:type column_sort: str|None | |||
|
|||
:param value_columns: list of column names to treat as time series values. | |||
If `None`, all columns except `column_id` and `column_sort` will be used. | |||
If `None` or empty, all columns except `column_id` and `column_sort` will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, you don't actually handle the empty case, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Thanks
if column_kind is None: | ||
raise ValueError("A value for column_kind needs to be supplied") | ||
|
||
if column_value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be added to the docs and the column_value arg should be optional.
if column_value is None: | ||
possible_value_columns = _get_value_columns(df, column_id, column_sort, column_kind) | ||
if len(possible_value_columns) != 1: | ||
raise ValueError("Could not guess the value column! Please hand it to the function as an argument.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the message could be more specific and also include possible_value_columns.
@@ -291,7 +307,7 @@ def __len__(self): | |||
return sum(grouped_df.ngroups for grouped_df in self.grouped_dict.values()) | |||
|
|||
|
|||
def to_tsdata(df, column_id=None, column_kind=None, column_value=None, column_sort=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made column_id optional, because you can pass a TsData object
Damn. I started a review a long while ago, but forgot to actually submit it. Sorry! Here are some small comments I had. |
Ok, thanks! I will work on the issues and open a new PR! |
Hi @hoesler!
Sorry for coming late (and even more sorry, because we needed long to review your PR). I realized that there was a small change concerning the API: before your very nice changes, it was possible to let tsfresh find out the column_value, if it was the only remaining column. I added this again.
Doing this, I tried to increase the test coverage a bit.
Feel free to drop a comment, if you want!