[ENH] Benchmarking interface v2 based on `kotsu` package #2977

alex-hh · 2022-07-11T07:44:19Z

Reference Issues/PRs

See previous discussions #2884 and #2805 with previous examples of integrating kotsu into sktime by @TNTran92 and @DBCerigo
The current PR is intended to provide an example of how benchmarking could be implemented by wrapping kotsu in such a way that users only need to deal with sktime objects.

What does this implement/fix? Explain your changes.

This PR provides a candidate benchmarking interface for sktime estimators/datasets by wrapping kotsu.
Currently only supports forecasters, but is intended to be flexible.
It is intended to allow users to create their own benchmarks by adding sets of estimators and tasks.

Does your contribution introduce a new dependency? If yes, which one?

kotsu

What should a reviewer concentrate their feedback on?

To avoid sktime users interacting directly with kotsu, we interface kotsu via Benchmark objects designed to handle sktime objects
Currently the only output allowed is a df / csv file, however it might be desirable to provide simple ways to save models / predictions etc., by passing appropriate arguments to kotsu's benchmarking
Only a forecasting benchmark and a base benchmark are currently implemented. The idea is that other types of modelling problem could extend the base benchmark in similar ways however it wasn't clear to me how best to do this and there may be other suggestions!
No tests are implemented currently

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors.
Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
I've added unit tests and made sure they pass locally.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

For new estimators

I've added the estimator to the online documentation.
I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

review-notebook-app · 2022-07-11T07:44:23Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

TNTran92 · 2022-07-12T03:01:38Z

examples/04_benchmarking_v2.ipynb

+    }
+   ],
+   "source": [
+    "results_df = benchmark.run(\"./forecasting_results.csv\")\n",


In my run, forecasting_results.csv shows up in the root directory of the repo. In this by design? If yes, I think it's better in the working directory.

Though I can easily pd.read_csv this file, is there a reason to write to a output file?

In my run, forecasting_results.csv shows up in the root directory of the repo. In this by design? If yes, I think it's better in the working directory.

I don't think there was much thought gone into where it should be saving yet - the example notebook is intended to be an example, and the hope would be users would configure where they wanted to save their results.

Though I can easily pd.read_csv this file, is there a reason to write to a output file?

When writing kotsu we generally wanted results to be persisted; you run some experiments one day, and we want those results to still be there when we come back to the project/work say a week later. Leaving results in memory isn't a reliable way of achieving that.

DBCerigo · 2022-07-12T14:56:01Z

(Slides from dev-days-2022 talk on 12-7-22 on "sktime - easy benchmarking", which is relevant to this PR)

ltsaprounis · 2022-07-12T15:48:10Z

@DBCerigo - FYI sktime template pattern motivation / explanation https://github.com/sktime/sktime-workshop-pydata-london-2022/blob/main/notebooks/2_basic_extensions.ipynb

fkiraly · 2022-07-12T16:09:35Z

sktime/benchmarking/benchmarks.py

+"""
+from typing import Callable, Optional, Type, Union
+
+import kotsu


I would suggest to move the import inside the constructor and inside the run.

To complete soft dependency encapsulation, kindly add a _check_soft_dependencies at the start of the __init__.

Done in 4e18ac4

sktime/benchmarking/benchmarks.py

fkiraly

Looks great as a minimal example!

I have suggestions above, the blocking one is the one on isolating the soft dependency.
The other one would be nicer from a pattern/design perspective imo.

fkiraly

I also think it is a problem that estimators and their parameters are separated.

I'd rather have objects passed than classes and their parameters separately.
Unfitted objects carry the same information - they are the class plus their kwargs (can be retrieved by get_params)

See discussion in #2804

TNTran92 · 2022-07-14T04:04:52Z

Looks like at runtime, the run method turns models into objects by running model = model_spec.make()
This method returns an entity that has the form of factory(**_kwargs)
Do you think we can pass object directly to run instead of having to go through spec?

DBCerigo · 2022-07-24T10:13:02Z

Now ready for re-review @fkiraly.

Some replies made directly on review comments above, and another #2804 (comment) in #2804.

PR checklist

I've added myself to the list of contributors.
Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
I've added unit tests and made sure they pass locally.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

fkiraly

I suppose this is good, as a design study!

Blocking change request:

please add docstrings to public methods and user facing classes

Questions:

how would carrying out tests integrate with this? You need access to individual predictions for that.
how would storing partial results on the HD or via a data interface class work?
I do not fully understand what type or signature you require for dataset_loader, could you elaborate?

DBCerigo · 2022-08-03T11:12:36Z

Blocking change request:

* please add docstrings to public methods and user facing classes

Just to confirm, you'd like the doc strings extended to contain Parameters sections etc. right?

Questions:

* how would carrying out tests integrate with this? You need access to individual predictions for that.

What do you mean by "tests"?
If you mean statistical tests and the like - anything that is for a single estimator can we achieved using the system in the PR. Anything that is cross/multi-estimator, that will require the storing of the predictions, which we'll do in the subsequent PR.
If you mean unit tests - such tests are present in this PR.

* how would storing partial results on the HD or via a data interface class work?

Can you clarify what you mean by "partial results"?

* I do not fully understand what type or signature you require for `dataset_loader`, could you elaborate?

This implementation isn't intending to add a new object or interface for a dataset_loader, it was intending to follow what was defined in sktime.datasets.

fkiraly · 2022-08-07T13:18:45Z

Just to confirm, you'd like the doc strings extended to contain Parameters sections etc. right?

Yes, Parameters and, if appropriate, Returns. These end up in the public documentation.

how would carrying out tests integrate with this? You need access to individual predictions for that.

What do you mean by "tests"?

Apologies, I meant "statistical hypothesis tests", on comparing performance of estimators. Not a blocking comment, anyway.

Can you clarify what you mean by "partial results"?

For instance, predictions, of m out of n estimators involved in the experiment - similar to the old framework.

This implementation isn't intending to add a new object or interface for a dataset_loader, it was intending to follow what was defined in sktime.datasets.

Ah, I see - docstrings would be helpful that detail the assumptions, then.

DBCerigo · 2022-08-16T18:14:34Z

@fkiraly this ready for (possibly final) re-review now. Doc strings all updated in 3e28d29. Nice one.

TNTran92 · 2022-08-16T18:27:06Z

Great work. Thanks a lot @DBCerigo

The classification factory is also up (#3278); I'm still working on its unit tests

fkiraly · 2023-01-12T20:17:08Z

🎉 welcome back, @DBCerigo 🎉

DBCerigo · 2023-01-30T10:50:54Z

@fkiraly good to be back 😸 This is now ready for re-review. The change since last review is c25de09, as discussed on slack.

fkiraly · 2023-01-30T15:42:34Z

Great! Good to have you back.

There's one point to address, it is dependency isolation.

If kotsu becomes a soft dependency, dependent functionality should be isolated. To see examples how this is to be done, search for:

pytest.mark.skipif
_check_soft_dependencies

in the code base. Also see
https://www.sktime.org/en/stable/developer_guide/dependencies.html

DBCerigo · 2023-02-02T14:49:17Z

@fkiraly thanks for the pointers for the test skips - all implemented now.

Let's wait until we have 2+ subclasses of the base class before doing any more design changes, as having multiple examples will make it plainer to see and verify the usefulness of any further design changes. So I would suggest merging this PR as is. Then implementing and merging 1+ more benchmarks (I think @TNTran92 has a PR for classifiers already? though likely needs some retweaking now), then reviewing the implementation as a whole.

fkiraly

Massive PR, extremely useful functionality!

Comments are either addressed, or make sense to address separately, e.g., minor rewors to the interface.

) This fixes some merge conflicts and formatting in `.all-contributorsrc`. Includes contributors from #2977 to remove conflicts with main, of #2977

#2977 cannot be pushed to due to branch lock, but has conflicts on `all-contributorsrc`. Therefore we will carry out the merge as follows: 1. overwrite main with `all-contributorc` from #2977 2. merge #2977 3. revert 1

Reverts #4206, see there for explanation of the strategy to resolve the merge conflict with #2977

DBCerigo force-pushed the kotsu-benchmarking-v2 branch from f0004cd to 5b3f380 Compare July 11, 2022 15:28

alex-hh changed the title ~~Reimplement benchmarking with kotsu v2~~ [ENH] Reimplement benchmarking with kotsu v2 Jul 11, 2022

TNTran92 reviewed Jul 12, 2022

View reviewed changes

fkiraly self-assigned this Jul 12, 2022

fkiraly reviewed Jul 12, 2022

View reviewed changes

sktime/benchmarking/benchmarks.py Show resolved Hide resolved

fkiraly requested changes Jul 12, 2022

View reviewed changes

fkiraly mentioned this pull request Jul 12, 2022

[ENH] Using Kotsu as a benchmarking framework - Implementation in sktime #2804

Open

DBCerigo mentioned this pull request Jul 24, 2022

[WIP] [ENH] Reimplement benchmarking with kotsu #2884

Closed

18 tasks

DBCerigo force-pushed the kotsu-benchmarking-v2 branch from 5b3f380 to 4e18ac4 Compare July 24, 2022 09:06

alex-hh marked this pull request as ready for review July 24, 2022 13:04

alex-hh requested a review from aiwalter as a code owner July 24, 2022 13:04

DBCerigo force-pushed the kotsu-benchmarking-v2 branch 3 times, most recently from b09507d to e094aec Compare July 27, 2022 17:49

fkiraly requested changes Jul 27, 2022

View reviewed changes

DBCerigo mentioned this pull request Aug 2, 2022

[WIP] [ENH] Reimplement benchmarking with kotsu datavaluepeople/sktime#1

Closed

15 tasks

TNTran92 mentioned this pull request Aug 3, 2022

[ENH] Kotsu application to sktime #2805

Closed

6 tasks

DBCerigo force-pushed the kotsu-benchmarking-v2 branch from e094aec to 3e28d29 Compare August 16, 2022 15:26

TNTran92 mentioned this pull request Aug 16, 2022

[ENH] Classification Benchmarking with kotsu #3278

Closed

6 tasks

Alex and others added 14 commits January 30, 2023 10:45

[ENH] add kotsu as soft dependency

1362379

[ENH] add base benchmark class

b99007a

[ENH] add forecasting benchmark

845037e

[ENH] add forecasting benchmark example notebook

67716e8

[ENH] add test covering BaseBenchmarking

80734aa

[ENH] add test for adding estimator with args

e69f2d7

[ENH] add test for adding task with args specified

bf6e6a9

[ENH] add test & fix giving task entrypoint as str

73d33e6

[ENH] add test for ForecastingBenchmark

ac6cc28

[ENH] move kotsu import to init and use check

5d6c6a3

[ENH] Make benchmarking interface take instances

76486a2

[ENH] Improve docs in benchmark notebook

c25de09

[DOC] Expand docstrings of benchmarking interfaces

53fee5b

[DOC] Add contributors: DBCerigo, alex-hh, ali-tny

3b6d803

DBCerigo force-pushed the kotsu-benchmarking-v2 branch from cb891ea to 3b6d803 Compare January 30, 2023 10:48

[ENH] Add missing skipif soft deps benchmarking

507d6cb

fkiraly approved these changes Feb 5, 2023

View reviewed changes

fkiraly mentioned this pull request Feb 5, 2023

[MNT] Fix merge conflicts and formatting in .all-contributorsrc #4205

Merged

fkiraly mentioned this pull request Feb 5, 2023

[MNT] temp MR to remove 2977 merge conflicts #4206

Merged

fkiraly merged commit fcf8a4e into sktime:main Feb 5, 2023

fkiraly changed the title ~~[ENH] Reimplement benchmarking with kotsu v2~~ [ENH] Benchmarking interface v2 based on kotsu package Feb 5, 2023

fkiraly mentioned this pull request Feb 5, 2023

[MNT] temp MR to remove 2977 merge conflicts - part 3 #4207

Merged

fkiraly added a commit that referenced this pull request Feb 5, 2023

[MNT] temp MR to remove 2977 merge conflicts - part 3 (#4207)

febcea5

Reverts #4206, see there for explanation of the strategy to resolve the merge conflict with #2977

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Benchmarking interface v2 based on `kotsu` package #2977

[ENH] Benchmarking interface v2 based on `kotsu` package #2977

alex-hh commented Jul 11, 2022 •

edited

Loading

review-notebook-app bot commented Jul 11, 2022

TNTran92 Jul 12, 2022 •

edited

Loading

DBCerigo Jul 13, 2022

DBCerigo commented Jul 12, 2022

ltsaprounis commented Jul 12, 2022

fkiraly Jul 12, 2022

DBCerigo Jul 24, 2022

fkiraly left a comment

fkiraly left a comment

TNTran92 commented Jul 14, 2022 •

edited

Loading

DBCerigo commented Jul 24, 2022 •

edited

Loading

fkiraly left a comment

DBCerigo commented Aug 3, 2022

fkiraly commented Aug 7, 2022

DBCerigo commented Aug 16, 2022

TNTran92 commented Aug 16, 2022

fkiraly commented Jan 12, 2023

DBCerigo commented Jan 30, 2023

fkiraly commented Jan 30, 2023

DBCerigo commented Feb 2, 2023

fkiraly left a comment

[ENH] Benchmarking interface v2 based on kotsu package #2977

[ENH] Benchmarking interface v2 based on kotsu package #2977

Conversation

alex-hh commented Jul 11, 2022 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Any other comments?

PR checklist

For all contributions

For new estimators

review-notebook-app bot commented Jul 11, 2022

TNTran92 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

DBCerigo Jul 13, 2022

Choose a reason for hiding this comment

DBCerigo commented Jul 12, 2022

ltsaprounis commented Jul 12, 2022

fkiraly Jul 12, 2022

Choose a reason for hiding this comment

DBCerigo Jul 24, 2022

Choose a reason for hiding this comment

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly left a comment

Choose a reason for hiding this comment

TNTran92 commented Jul 14, 2022 • edited Loading

DBCerigo commented Jul 24, 2022 • edited Loading

PR checklist

fkiraly left a comment

Choose a reason for hiding this comment

DBCerigo commented Aug 3, 2022

fkiraly commented Aug 7, 2022

DBCerigo commented Aug 16, 2022

TNTran92 commented Aug 16, 2022

fkiraly commented Jan 12, 2023

DBCerigo commented Jan 30, 2023

fkiraly commented Jan 30, 2023

DBCerigo commented Feb 2, 2023

fkiraly left a comment

Choose a reason for hiding this comment

[ENH] Benchmarking interface v2 based on `kotsu` package #2977

[ENH] Benchmarking interface v2 based on `kotsu` package #2977

alex-hh commented Jul 11, 2022 •

edited

Loading

TNTran92 Jul 12, 2022 •

edited

Loading

TNTran92 commented Jul 14, 2022 •

edited

Loading

DBCerigo commented Jul 24, 2022 •

edited

Loading