Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] load_UCR_UEA_dataset checks for existence of files rather than just directories #2899

Merged
merged 5 commits into from Jun 30, 2022

Conversation

TonyBagnall
Copy link
Contributor

What does this implement/fix? Explain your changes.

This PR does three things to code that is designed to work with the specific data format and file name structure used in time series classification. This will not impact generic data loading, it is for the specific use case of using load_UCR_UEA_dataset to load .ts files.

  1. Changes the example usage for load_UCR_UEA_dataset so that it loads a baked in dataset (ArrowHead) rather than download one from timeseriesclassification.com. This is to remove the danger of problems with timeseriesclassification impacting the testing of examples
  2. Private function _list_available_datasets now checks that the correct <problem_name>.ts files exist, rather than just check that the <problem_name> directory exists. This fixes a bug where an empty directory <problem_name> would cause load_UCR_UEA_dataset to think the data was stored locally, and thus crash.
  3. Adds a unit test for load_UCR_UEA_dataset (although only tests on baked in data sets)

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

load_UCR_UEA_dataset

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
  • [ x] The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

and check for file existence
@TonyBagnall TonyBagnall added module:classification classification module: time series classification module:datasets&loaders data sets and data loaders labels Jun 29, 2022
@lmmentel lmmentel added the bugfix Fixes a known bug or removes unintended behavior label Jun 29, 2022
@lmmentel
Copy link
Contributor

Thanks, I started seeing the related error also in other PRs.

@lmmentel lmmentel self-requested a review June 29, 2022 13:55
@TonyBagnall TonyBagnall merged commit 9363cb8 into main Jun 30, 2022
@TonyBagnall TonyBagnall deleted the uea-ucr-download branch June 30, 2022 17:14
tobiasweede pushed a commit to tobiasweede/sktime that referenced this pull request Jul 5, 2022
…just directories (sktime#2899)

* unit test for load_UCR_UEA_dataset
and check for file existence

* make FreshPRINCE default classifier in classification_experiments.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a known bug or removes unintended behavior module:classification classification module: time series classification module:datasets&loaders data sets and data loaders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants