Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Adding min_periods in WindowSummarizer to avoid different behaviour in the future #4052

Merged
merged 7 commits into from Jan 11, 2023

Conversation

arnavrneo
Copy link
Contributor

Reference Issues/PRs

Fixes #4050

What does this implement/fix? Explain your changes.

In the WindowSummarizer class, min_periods is explicitly defined in the pandas rolling function call with the value equal to the window_length.

Does your contribution introduce a new dependency? If yes, which one?

No.

@arnavrneo
Copy link
Contributor Author

@KishManani @miraep8 Kindly look into and review this PR.

@aiwalter
Copy link
Contributor

aiwalter commented Jan 3, 2023

@arnavrneo do you mind to add a test for this?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkiraly fkiraly added maintenance Continuous integration, unit testing & package distribution module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing labels Jan 3, 2023
@KishManani
Copy link
Contributor

KishManani commented Jan 3, 2023

not sure whether I understand the deprec message correctly, maybe I am confused - should the min_periods not be set to 0 rathre than window_length?

On pandas v1.5.2 the default behaviour for something like df.rolling(window=3).mean() is to do df.rolling(window=3, min_periods=3).mean(). Changing to df.rolling(window=3, min_periods=0).mean() would give you a different behaviour.

import pandas as pd
import numpy as np

y = pd.DataFrame(data=np.random.rand(10))
y.rolling(window=3).mean()
	0
0	
1	
2	0.7207749584147084
3	0.5946349404144744
4	0.34069940037566454
5	0.30547069208136307
6	0.5604781957810597
7	0.7837284354272988
8	0.6594005203859353
9	0.44447740298324145

y.rolling(window=3, min_periods=3).mean()
	0
0	
1	
2	0.7207749584147084
3	0.5946349404144744
4	0.34069940037566454
5	0.30547069208136307
6	0.5604781957810597
7	0.7837284354272988
8	0.6594005203859353
9	0.44447740298324145

y.rolling(window=3, min_periods=0).mean()
	0
0	0.46538184566621255
1	0.7205428982372711
2	0.7207749584147084
3	0.5946349404144744
4	0.34069940037566454
5	0.30547069208136307
6	0.5604781957810597
7	0.7837284354272988
8	0.6594005203859353
9	0.44447740298324145

@arnavrneo
Copy link
Contributor Author

@aiwalter @fkiraly Okay, I'll again run the tests and will update the PR asap.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 4, 2023

Thanks, @arnavrneo!

Kindly note that linting is failing, see the guide here on how to format your code: https://www.sktime.org/en/stable/developer_guide/coding_standards.html

@arnavrneo
Copy link
Contributor Author

@fkiraly On my end, tests have formatted the code. Can you please review it again?

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 10, 2023

still not, your setup might be incomplete?
You can also see it on the CI, if you click here:
image

Due to the release being soon, I will just go through it myself - might be worth if you check what went wrong with your linting setup though.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 10, 2023

(I think I fixed the linting issues)

@arnavrneo
Copy link
Contributor Author

@fkiraly Okay sir, I will reset the linting check setup. Thank you for going through the linting yourself.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linting issues are fixed now, thanks for your contribution!

@fkiraly fkiraly merged commit 6ac90c2 into sktime:main Jan 11, 2023
@arnavrneo arnavrneo deleted the window-summarizer-min-periods branch January 11, 2023 14:15
klam-data pushed a commit to CodeSmithDSMLProjects/sktime that referenced this pull request Jan 18, 2023
…ehaviour in the future (sktime#4052)

Fixes sktime#4050 as follows:

In the `WindowSummarizer` class, `min_periods` in the `pandas` `rolling` function call is set explicitly, to a value equal to `window_length`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Continuous integration, unit testing & package distribution module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Explicitly specify min_periods in WindowSummarizer to avoid different behaviour in the future
4 participants