Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifier base class #1517

Merged
merged 37 commits into from Oct 19, 2021
Merged

Classifier base class #1517

merged 37 commits into from Oct 19, 2021

Conversation

TonyBagnall
Copy link
Contributor

@TonyBagnall TonyBagnall commented Oct 13, 2021

What does this implement/fix? Explain your changes.

Some changes to the base class for classifiers and the tags

  1. Remove classification from the tags "X-y-must-have-same-index" and "enforce_index_type". They are not used by any classifiers
  2. Add the capability tags to the base class. think capabilities should be checked against data in the check_X functionality.
    some not dependent on the data could be move down, but will be defined so may as well have them in the base imo
  3. Changes in comments. Note it claims to allow 2D arrays, which is not true, but I left them because it will soon be true.
  4. remove unnecessary argument to get_tag
    coerce_to_numpy = self.get_tag("coerce-X-to-numpy")
  5. removed the classifier_list
    its not complete and probably should not be here, I imagine I put it there.
  6. Remove default _predict behaviour, add default _predict_proba behaviour. The enhancement template states predict must be implemented. I think that is correct

note there will be a few more refinements, but this will allow us to blast through the fit/_fit refactor

@TonyBagnall TonyBagnall added the module:classification classification module: time series classification label Oct 13, 2021
@TonyBagnall TonyBagnall marked this pull request as ready for review October 16, 2021 22:17
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 17, 2021

quick question, while you're at it:

do you want to add the planned support for the various Panel formats?

All it would take is adding one or two convert from datatypes to the functions.

@TonyBagnall
Copy link
Contributor Author

quick question, while you're at it:

do you want to add the planned support for the various Panel formats?

All it would take is adding one or two convert from datatypes to the functions.

Not on this PR, I want to finish refactor to fit first then look at checking and transforming data, one thing at a time

Copy link
Contributor

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fix to the default predict_proba and introduced a threading tag.

Happy with this, next step is to finish the refactor.

sktime/base/_base.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • docstrings of BaseObject.get_tag are changed from sth that's correct imo to sth that's incorrect? This is a very small change, but a blocker.
  • I think the default behaviour in _predict should remain? Perhaps with a check whether an infinite look occurs (in the public part of the function) that raises an error.
  • Happy with the changes to the BaseClassifier, but I think we need to add a section to the docstring that explains the state change caused by fit etc. See BaseForecaster. Good to have but not a blocker.

@TonyBagnall
Copy link
Contributor Author

  • docstrings of BaseObject.get_tag are changed from sth that's correct imo to sth that's incorrect? This is a very small change, but a blocker.

yes, fixed, my misunderstanding (see above)

  • I think the default behaviour in _predict should remain? Perhaps with a check whether an infinite look occurs (in the public part of the function) that raises an error.

I dont. Your extension guidelines say that _predict must be implemented. I see no value in this default behaviour, it will allow the user to run code which should raise an error, if, say, they implement _predicted instead of _predict

  • Happy with the changes to the BaseClassifier, but I think we need to add a section to the docstring that explains the state change caused by fit etc. See BaseForecaster. Good to have but not a blocker.

yes I agree, on the next PR for base when I change the checking/recasting/capabilities, I want to push through the fit refactor first.

Use markus's github instead of full name, added matthew.
@TonyBagnall TonyBagnall merged commit 005152a into main Oct 19, 2021
@TonyBagnall TonyBagnall deleted the classifier_base_class branch October 19, 2021 16:18
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 21, 2021

I think the default behaviour in _predict should remain? Perhaps with a check whether an infinite look occurs (in the public part of the function) that raises an error.

@TonyBagnall, generally please ask for re-review if changes have been requested and don't merge.

The above has not been addressed, it's an interface breaking change, so it should not have been merged.

It breaks the interface for anyone who has locally implemented a classifier and has only implemented _predict_proba, you are breaking the extension contract.

I don't think we should remove the "create _predict from _predict_proba" logic since this is standard in scikit-learn.

But in-principle I'm fine with this as long as we deprecate properly.

@TonyBagnall
Copy link
Contributor Author

your own extension template stated that _predict must be implemented, and did not even mention _predict_proba. I recently had a student who implemented _predict as instructed on the template, but it failed when calling _predict_proba. Feel free to revise, I will review, but please leave the default predic_proba in there. This is how I think it should be

@TonyBagnall
Copy link
Contributor Author

I'm ok with a default predict, but you then need to change the extension template to make it clear you need one of them. I'm not that concerned at the moment about this, so will leave it to you.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 21, 2021

I recently had a student who implemented _predict as instructed on the template, but it failed when calling _predict_proba. Feel free to revise, I will review, but please leave the default predict_proba in there. This is how I think it should be

Three things here:

  • the extension template states the "implementer"/"extender" contract, the base class should state the "user" contract (cf BaseForecaster).
  • I added the "implementer" contract in the extension template because there wasn't one and people were extending. That is, because there wasn't one, not because I want it to be a certain way.
  • I didn't consider predict_proba to be a mandatory part of the "user" contract because not all classifiers can predict probabilities. But there is also no "user" contract written down.

I'd be fine if the "user" contract guarantees predict_proba, but in that case I'd strongly recommend you write down the user contract as a module docstring for BaseClassifier, just like for BaseForecaster.

I'm in-principle fine with any "should" state that you put out there, as long as the contracts are clearly stated and consistent; and as long as interface changes are properly deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants