Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] Hotfix tag scitype:y typo #1449

Merged
merged 11 commits into from Sep 27, 2021
Merged

[MNT] Hotfix tag scitype:y typo #1449

merged 11 commits into from Sep 27, 2021

Conversation

aiwalter
Copy link
Contributor

@aiwalter aiwalter commented Sep 24, 2021

Fixed typo y:scitype, renamed to scitype:y

Reference Issues/PRs

related #1141 #1083 #1376

@aiwalter aiwalter added the maintenance Continuous integration, unit testing & package distribution label Sep 24, 2021
@aiwalter aiwalter changed the title Hotfix tag scitype:y typo [MNT] Hotfix tag scitype:y typo Sep 24, 2021
@fkiraly
Copy link
Collaborator

fkiraly commented Sep 24, 2021

@aiwalter, ouch, this is very painful - no one noticed this?

Why were the tests not failing with the wrong tag name?

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 24, 2021

This makes me think: should get_tag throw an error if it tries to retrieve a non-existing tag, instead of None?

@aiwalter
Copy link
Contributor Author

aiwalter commented Sep 24, 2021

I think it can still return None, but only if the tag exists at all. If I give myWrongTag as a tag, it should throw an error in my opinion

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 24, 2021

Yes, agree with the user expectation and default behaviour, @aiwalter.

When I implemented this, I was mimicking the previous default behaviour which was using dict.get, which leads to unexpected results if the tag does not exist (e.g., due to typo), as you say.

Here's my alternative suggestion:
#1450

@@ -29,6 +29,8 @@ def _get_n_columns(tag):
n_columns_list = [2]
elif tag == "both":
n_columns_list = [1, 2]
else:
n_columns_list = [1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think else it should raise an error, since this is an unexpected input!

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 24, 2021

@aiwalter, @mloning, I have identified the (or a) source of the failing tests.

The reason is that clone does not clone dynamic tags properly.
This results in some copies having the "univariate" tag value set that the estimator inherits by default, which makes the grid search fail as soon as it encounters a component that is set to "univariate" and not the expected "both" tag value.

There are multiple potential solutions for this:

  • ensure that dynamic tags are set in _fit and not in __init__ - this may be confusing to extenders. In this particular case, you'd have to move lines 456-469 in _tune from __init__ to the start of BaseGridSearch._fit.
  • write our own, sktime version of clone which adds the line of copying over the _tags_dynamic dict. This is prototyped in PR sktime clean clone method #1454
  • introduce a new interface point for implementers, a init_dynamic_tags or similar, which is called at the start of fit.
  • introduce a new interface point for intialization of tags and components, init_tags_components, which initializes dynamic tags and component estimators, and is called at the end of the base __init__.

I would be in favour of no.2 and would be against no.1. I would prefer no.4 above no.3.

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 25, 2021

@aiwalter, I found another issue after fixing the above one.

Namely, it comes from evaluate, which is called in grid search but only lets pd.Series through.
However, in the multivariate case it may be called with a pd.DataFrame.

So, it means, currently the grid search algorithms are not capable of dealing with multivariate data, and the tag should always say "univariate".

I've created a PR containing this temporary fix, #1455.

@aiwalter
Copy link
Contributor Author

I think its contained also in my PR #1376

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 25, 2021

Yes, @aiwalter, but I'd say, let's fix the issues first before we add more stuff.
Proposed merge sequence is #1436, #1454, #1455, #1456, #1449, #1450, also see discussion on slack.
There are three bugs here that we didn't detect since the tests broke.

@aiwalter
Copy link
Contributor Author

Thx @fkiraly , will look into it tomorrow

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 25, 2021

Hm, looking into #1376, it appears you already discovered two, the issue with the cv splitter and the univariate tag.

Yes, I'd say, let's fix these separately and ensure we have a "clean slate" with tests fixed before we add more content - just to be sure we don't introduce any follow-on bugs.

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 25, 2021

For instance, I'm worried that your #1376 doesn't fix the NaiveForecaster issue of the missing update, but the tests still seem to pass?

@aiwalter
Copy link
Contributor Author

lets get this merged?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should fix all the test issues now.

Looks like we got it fixed!
Great, thanks!

@fkiraly fkiraly merged commit 1f7a7b0 into sktime:main Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Continuous integration, unit testing & package distribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants