Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClearML-Data:Could not load dataset state #1123

Open
alex-sage opened this issue Oct 2, 2023 · 4 comments
Open

ClearML-Data:Could not load dataset state #1123

alex-sage opened this issue Oct 2, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@alex-sage
Copy link

Describe the bug

I keep running into this issue, where I want to set up a dataset and it ends up no longer being able to read the datset state.
I created a new dataset using the basic CLI command, started adding data. Suddenly the CLI would only give me the following error:

Error: Could not load Dataset id=2d38ac74b2ad4a4495207e01b9dc277a state

This now happened 3 times already.
I can now no longer delete the datasets and start over, as the delete command gives me the same error. When trying to delete the dataset through the web UI, I get this error:

Error 406 : Project has associated non-empty datasets (please delete all the dataset versions or use force=true): id=cdfa028c319a4dbcb593382c4e1de335

When trying to remove the dataset with the python API using force=True as suggested, I get this:

2023-10-02 17:16:03,983 - clearml.Task - ERROR - Action failed <400/101: tasks.get_by_id/v1.0 (Invalid task id: id=cdfa028c319a4dbcb593382c4e1de335, company=9be5804ead8d45beac4ba3b9a3936117)> (task=cdfa028c319a4dbcb593382c4e1de335)
2023-10-02 17:16:03,983 - clearml.Task - ERROR - Failed reloading task cdfa028c319a4dbcb593382c4e1de335
2023-10-02 17:16:04,362 - clearml.Task - ERROR - Action failed <400/101: tasks.get_by_id/v1.0 (Invalid task id: id=cdfa028c319a4dbcb593382c4e1de335, company=9be5804ead8d45beac4ba3b9a3936117)> (task=cdfa028c319a4dbcb593382c4e1de335)
2023-10-02 17:16:04,362 - clearml.Task - ERROR - Failed reloading task cdfa028c319a4dbcb593382c4e1de335
2023-10-02 17:16:04,362 - clearml - WARNING - Could not get dataset with ID cdfa028c319a4dbcb593382c4e1de335: Task ID "cdfa028c319a4dbcb593382c4e1de335" could not be found

Can anyone tell me how to get out of this state?

To reproduce

I could try again and give the exact commands I use, but since this now happened to me 3 times, I'm not sure if they matter all that much...

Perhaps this helps - This is the stack trace I get when trying to do a Dataset.get() with the ID of one of the affected datasets:

In [3]: dataset = Dataset.get(dataset_id='2d38ac74b2ad4a4495207e01b9dc277a')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 dataset = Dataset.get(dataset_id='2d38ac74b2ad4a4495207e01b9dc277a')

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1778, in Dataset.get(cls, dataset_id, dataset_project, dataset_name, dataset_tags, only_completed, only_published, include_archived, auto_create, writable_copy, dataset_version, alias, overridable, shallow_search, **kwargs)
   1774     instance = Dataset.create(
   1775         dataset_name=dataset_name, dataset_project=dataset_project, dataset_tags=dataset_tags
   1776     )
   1777     return finish_dataset_get(instance, instance._id)
-> 1778 instance = get_instance(dataset_id)
   1779 # Now we have the requested dataset, but if we want a mutable copy instead, we create a new dataset with the
   1780 # current one as its parent. So one can add files to it and finalize as a new version.
   1781 if writable_copy:

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1690, in Dataset.get.<locals>.get_instance(dataset_id_)
   1682     local_state_file = StorageManager.get_local_copy(
   1683         remote_url=task.artifacts[cls.__state_entry_name].url,
   1684         cache_context=cls.__cache_context,
   (...)
   1687         force_download=force_download,
   1688     )
   1689     if not local_state_file:
-> 1690         raise ValueError("Could not load Dataset id={} state".format(task.id))
   1691 else:
   1692     # we could not find the serialized state, start empty
   1693     local_state_file = {}

ValueError: Could not load Dataset id=2d38ac74b2ad4a4495207e01b9dc277a state

In [4]: dataset = Dataset.get(dataset_id='1cdc8407d0494adf822d282f7ad45739')
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 dataset = Dataset.get(dataset_id='1cdc8407d0494adf822d282f7ad45739')

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1778, in Dataset.get(cls, dataset_id, dataset_project, dataset_name, dataset_tags, only_completed, only_published, include_archived, auto_create, writable_copy, dataset_version, alias, overridable, shallow_search, **kwargs)
   1774     instance = Dataset.create(
   1775         dataset_name=dataset_name, dataset_project=dataset_project, dataset_tags=dataset_tags
   1776     )
   1777     return finish_dataset_get(instance, instance._id)
-> 1778 instance = get_instance(dataset_id)
   1779 # Now we have the requested dataset, but if we want a mutable copy instead, we create a new dataset with the
   1780 # current one as its parent. So one can add files to it and finalize as a new version.
   1781 if writable_copy:

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1694, in Dataset.get.<locals>.get_instance(dataset_id_)
   1691 else:
   1692     # we could not find the serialized state, start empty
   1693     local_state_file = {}
-> 1694 instance_ = cls._deserialize(local_state_file, task)
   1695 # remove the artifact, just in case
   1696 if force_download and local_state_file:

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:2619, in Dataset._deserialize(cls, stored_state, task)
   2617     stored_state_file = Path(stored_state).as_posix()
   2618     with open(stored_state_file, 'rt') as f:
-> 2619         stored_state = json.load(f)
   2621 instance = cls(_private=cls.__private_magic, task=task)
   2622 # assert instance._id == stored_state['id']  # They should match

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    274 def load(fp, *, cls=None, object_hook=None, parse_float=None,
    275         parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    276     """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
    277     a JSON document) to a Python object.
    278 
   (...)
    291     kwarg; otherwise ``JSONDecoder`` is used.
    292     """
--> 293     return loads(fp.read(),
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
    296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/__init__.py:357, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352     del kw['encoding']
    354 if (cls is None and object_hook is None and
    355         parse_int is None and parse_float is None and
    356         parse_constant is None and object_pairs_hook is None and not kw):
--> 357     return _default_decoder.decode(s)
    358 if cls is None:
    359     cls = JSONDecoder

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

It really looks to me like the Json file was not written correctly (it seems to be empty).

Expected behaviour

I'd expect the state to be found and that the commands don't give me this error. At the very least it would be nice to be able to remove these datasets and try to start over.

Environment

  • app.clear.ml
  • 1.13.1
  • Python Version 3.8.13
  • Debian GNU/Linux 11 (bullseye)
@alex-sage alex-sage added the bug Something isn't working label Oct 2, 2023
@eugen-ajechiloae-clearml
Copy link
Collaborator

Hi @alex-sage ! Before deleting a dataset, you need to delete/archive all dataset versions under it.
Note that, in the internal implementation, datasets are projects and the versions are tasks. So when trying to force delete a dataset through the API you should delete the project:

from clearml.backend_api.session.client import APIClient

client = APIClient()
client.projects.delete(project="1cdc8407d0494adf822d282f7ad45739", force=True)

I am not sure why the dataset you created didn't upload/write the state file properly. Could be a network/server error. If you have a consistent way to reproduce the issue, please let us know!

@alex-sage
Copy link
Author

alex-sage commented Oct 3, 2023

Hi @eugen-ajechiloae-clearml!

Thank you for your help, now I was finally able to delete the invalid datasets.

I found one way to reproduce the problem, but it seems to only happen when using our network storage as a target.

$ clearml-data create --project example-project --name example-dataset2 --storage /home/data/datasets/clearml/
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
ClearML results page: https://app.clear.ml/projects/e25905f30e964e53aefb5e2da15bcf8d/experiments/ba7ce395a6714802b52ca9ba2cd36e0a/output/log
ClearML dataset page: https://app.clear.ml/datasets/simple/e25905f30e964e53aefb5e2da15bcf8d/experiments/ba7ce395a6714802b52ca9ba2cd36e0a
New dataset created id=ba7ce395a6714802b52ca9ba2cd36e0a

$ clearml-data add --wildcard */*
clearml-data - Dataset Management & Versioning CLI
Adding files/folder/links to dataset id ba7ce395a6714802b52ca9ba2cd36e0a
0 files added
(yolov5-newest) 
sage@w15:/home/data/datasets/ball_tracking/baseball/evaluation 

$ clearml-data add --files .
clearml-data - Dataset Management & Versioning CLI
Adding files/folder/links to dataset id ba7ce395a6714802b52ca9ba2cd36e0a

Error: Could not load Dataset id=ba7ce395a6714802b52ca9ba2cd36e0a state

The second line is using an invalid wildcard on purpose, so that 0 files will be added. This seems to cause the state json file not to be written. Seems like this could be a bug, since I doubt our network keeps failing exactly at this moment 3 times in a row 😉
I checked in our local storage location (which is perfectly accessible) and the state file for that dataset is indeed not there.

Edit: I just noticed that the same thing also happens if I add files and abort the hash calculation (By pressing CTRL-C once) half-way through. It prints the message "User aborted", but does not seem to write back the state file.

@alex-sage
Copy link
Author

This is still a bug that is affecting datasets stored locally.
If anything goes wrong during the dataset creation process, or it is interrupted manually (perhaps the user typed a wrong command and presses CTRL-C), the state.json file is missing afterwards, and the dataset cannot be opened or even deleted via the clearml-data API.

@eugen-ajechiloae-clearml
Copy link
Collaborator

Hi @alex-sage ! We have acknowledged the issue.
In the meantime: you should be able to use Task.delete https://clear.ml/docs/latest/docs/references/sdk/task#delete using the dataset's ID to delete datasets, as datasets are tasks themselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants