Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] New transformation based pipeline classifiers #1721

Merged
merged 299 commits into from Dec 19, 2021
Merged

Conversation

MatthewMiddlehurst
Copy link
Contributor

Implements new feature based pipelines for summary statistics, random intervals (with a new transformer to generate them) and a TSFresh based pipeline from an upcoming workshop paper by me and @TonyBagnall. We can probably close #1063 after this unless there are any objections.

There is quite a lot going on here, but the majority of new lines and file changes are just tests.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

The new estimators, they are relatively simple and most are basic transformer > classifier pipelines.
Files of interest are:

sktime/classification/feature_based/_fresh_prince.py
sktime/classification/feature_based/_random_interval_classifier.py
sktime/classification/feature_based/_summary_classifier.py
sktime/transformations/panel/random_intervals.py

@TonyBagnall
Copy link
Contributor

Question: why are we introducing n classifiers following the pattern [transformer][classifier], but not a generic pipeline object that implements this pattern?

Makes sense to have shorthands, but it would be more robust if the shorthand is a shorthand for the composable pipeline?

we already have a pipeline, or do you mean a wrapper for a pipeline where transformer and classifier are specified in constructor? Something like a PipelineClassifier? Seems a bit superfluous and I thought you liked using pipelines directly, but I see no reason why not have one.

I agree its best to not have commented out code. Maybe put it all in a directory in contrib called regenerate tests or somesuch?

@MatthewMiddlehurst
Copy link
Contributor Author

RE: Commented out sections for tests, that is a bigger issue than this PR (same pattern follows for most tests), but I don't mind doing it in a separate PR with all the others.

TonyBagnall
TonyBagnall previously approved these changes Dec 8, 2021
@TonyBagnall
Copy link
Contributor

RE: Commented out sections for tests, that is a bigger issue than this PR (same pattern follows for most tests), but I don't mind doing it in a separate PR with all the others.

lets discuss at the dev meeting, I'll put together an agenda

TonyBagnall
TonyBagnall previously approved these changes Dec 10, 2021
@TonyBagnall
Copy link
Contributor

@fkiraly could you rereview in the light of comments above please? ta

@TonyBagnall
Copy link
Contributor

we agreed at the dev meeting to centralise the commented out code in a separate PR. @fkiraly does this answer your issues?

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 17, 2021

@fkiraly does this answer your issues?

Yes, happy if there's a separate issue that says what needs to be done.

What did you decide on the boilerplate/DRY issue?

@MatthewMiddlehurst
Copy link
Contributor Author

I still don't really see an issue with the current set-up, issues such as SummaryClassifier multivariate output and extra functionality like train estimates in FreshPrince would not lend well to a generic pipeline.

Im not against having a generic pipeline wrapper for users. Using such a wrapper in the current pipelines would require a bit of unwrapping for the above cases, however.

chrisholder
chrisholder previously approved these changes Dec 18, 2021
sktime/classification/feature_based/_fresh_prince.py Outdated Show resolved Hide resolved
@TonyBagnall
Copy link
Contributor

so to summarise, we thought a wrapper containing a configurable pipeline was a layer of wrapping too far, and it is in fact preferable to encourage users to use a Pipeline, since sklearn users are familiar with them. There is not that much code repetition and not all classifiers could use the base class.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questions answered, looks fine to me now.

Can you please remove commented out code before you merge?

@TonyBagnall
Copy link
Contributor

questions answered, looks fine to me now.

Can you please remove commented out code before you merge?

this is deferred to here #1765

@TonyBagnall TonyBagnall merged commit 8fee732 into main Dec 19, 2021
@TonyBagnall TonyBagnall deleted the summary-classifier branch December 19, 2021 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] time series classification: shorthands for common sktime pipelines including TsfreshClassifier
5 participants