Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_dataset fails to run due to nanosecond timestamps in API response #140

Open
nachomaiz opened this issue Apr 5, 2023 · 2 comments
Open

Comments

@nachomaiz
Copy link

Hi,

I'm getting an error when using the load_dataset function. It seems that the API is providing datetime information with nanosecond resolution, while datetime only supports up to microsecond resolution:

Traceback (most recent call last):
  File "~\t2.py", line 3, in <module>
    dw_ds = dw.load_dataset("{owner}/{id}")  # modified for privacy
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\venv\Lib\site-packages\datadotworld\__init__.py", line 99, in load_dataset
    load_dataset(dataset_key,
  File "~\venv\Lib\site-packages\datadotworld\datadotworld.py", line 164, in load_dataset
    last_modified = datetime.strptime(dataset_info['updated'],
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\Miniconda3\envs\nox\Lib\_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\Miniconda3\envs\nox\Lib\_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '2023-03-22T17:14:38.878483744Z' does not match format '%Y-%m-%dT%H:%M:%S.%fZ'

This should work if nanoseconds are stripped from the string before parsing with datetime.

datetime.datetime.strptime("2023-03-22T17:14:38.878483Z")  # works

Let me know if I can provide any more info.

Happy to contribute a PR if you would like.

Thanks!

@alexcrawley
Copy link

I've also just hit this, as a workaround you can pass force_update=True to bypass the last_modified check.

@nachomaiz
Copy link
Author

nachomaiz commented Oct 27, 2023

I've also just hit this, as a workaround you can pass force_update=True to bypass the last_modified check.

Oh, good tip! Will try that instead. Thanks @alexcrawley!

Still, I think it should still be addressed within the package if the purpose is to store cached requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants