Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let tsfresh choose the value column if possible and increase test coverage #722

Merged
merged 7 commits into from
Jul 15, 2020

Conversation

nils-braun
Copy link
Collaborator

Hi @hoesler!

Sorry for coming late (and even more sorry, because we needed long to review your PR). I realized that there was a small change concerning the API: before your very nice changes, it was possible to let tsfresh find out the column_value, if it was the only remaining column. I added this again.
Doing this, I tried to increase the test coverage a bit.

Feel free to drop a comment, if you want!

@github-actions
Copy link

You have style errors. See them below.

./tsfresh/feature_extraction/data.py:149:1: E302 expected 2 blank lines, found 1

@coveralls
Copy link

coveralls commented Jun 24, 2020

Coverage Status

Coverage increased (+0.1%) to 98.093% when pulling 122adaa on feature/some-small-refactoring into e867d74 on master.

@hoesler
Copy link
Contributor

hoesler commented Jun 24, 2020

Ni @nils-braun.

Apparently I missed that. Thanks!

But why did you change the value_columns arg of WideTsData from list[str] to str? If I just look at the class in isolation, it doesn't feel right to force users to select all or only one column. The current API is reflected in to_tsdata, so I would move more restricted selection logic to that function. Users can then circumvent that logic, by passing WideTsData directly instead of data frames.

@nils-braun
Copy link
Collaborator Author

I though that it makes the logic of the None-column_value easier, but you are right, in the end both is fine.
I will change it back to your original propsal.

@nils-braun
Copy link
Collaborator Author

@hoesler Are you fine with merging this?

@nils-braun nils-braun changed the base branch from master to main July 4, 2020 16:14
@nils-braun
Copy link
Collaborator Author

I assume you are :-) If not, just comment and we can change it back!

@github-actions
Copy link

Result of Benchmark Tests

Benchmark Min Max Mean Mean on Repo HEAD
tests/benchmark.py::test_benchmark_small_data 15.65 15.81 15.73 +- 0.07 12.46 +- 0.05
tests/benchmark.py::test_benchmark_large_data 6.27 6.50 6.40 +- 0.10 6.00 +- 0.02
tests/benchmark.py::test_benchmark_with_selection 8.69 8.87 8.75 +- 0.08 8.13 +- 0.05

@@ -164,17 +172,16 @@ def __init__(self, df, column_id, column_sort=None, value_columns=None):
:type column_sort: str|None

:param value_columns: list of column names to treat as time series values.
If `None`, all columns except `column_id` and `column_sort` will be used.
If `None` or empty, all columns except `column_id` and `column_sort` will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, you don't actually handle the empty case, do you?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Thanks

if column_kind is None:
raise ValueError("A value for column_kind needs to be supplied")

if column_value is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be added to the docs and the column_value arg should be optional.

if column_value is None:
possible_value_columns = _get_value_columns(df, column_id, column_sort, column_kind)
if len(possible_value_columns) != 1:
raise ValueError("Could not guess the value column! Please hand it to the function as an argument.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the message could be more specific and also include possible_value_columns.

@@ -291,7 +307,7 @@ def __len__(self):
return sum(grouped_df.ngroups for grouped_df in self.grouped_dict.values())


def to_tsdata(df, column_id=None, column_kind=None, column_value=None, column_sort=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made column_id optional, because you can pass a TsData object

@hoesler
Copy link
Contributor

hoesler commented Jul 16, 2020

Damn. I started a review a long while ago, but forgot to actually submit it. Sorry! Here are some small comments I had.

@nils-braun
Copy link
Collaborator Author

Ok, thanks! I will work on the issues and open a new PR!

@nils-braun nils-braun deleted the feature/some-small-refactoring branch October 17, 2020 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants