Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digital Corpora is using timezoneless time stamps in its API #64

Open
ajnelson-nist opened this issue Mar 20, 2023 · 6 comments
Open

Digital Corpora is using timezoneless time stamps in its API #64

ajnelson-nist opened this issue Mar 20, 2023 · 6 comments
Assignees

Comments

@ajnelson-nist
Copy link
Contributor

This is following up on #59 and #63.

The mtime and modified fields emitted in /index.tsv use the form of ISO-8601 that has a space instead of T and no time zone designation.

I tried feeding this TSV into a library that uses the more string XML Schema Datatypes dateTime format, which requires the letter T. I'd also like to use the dateTimeStamp format, which further requires time zone.

In ctools, I see the MySQL database call bottle_app.py uses, and it looks like there's no timestamp manipulation between the database call and returning the result.

So, it looks like Digital Corpora is delegating timestamp format to MySQL. @simsong , would you be OK with upgrading this to be a bit more strict - using T, and adding Z if that is the known stored timestamp time zone? This could undo the "Z" addition I'd made in #61.

@ajnelson-nist
Copy link
Contributor Author

There is some promising-looking API documentation for UTC clarity:

The session time zone setting affects display and storage of time values that are zone-sensitive. This includes the values displayed by functions such as NOW() or CURTIME(), and values stored in and retrieved from TIMESTAMP columns. Values for TIMESTAMP columns are converted from the session time zone to UTC for storage, and from UTC to the session time zone for retrieval.

Source: https://dev.mysql.com/doc/refman/8.0/en/time-zone-support.html

@simsong
Copy link
Contributor

simsong commented Mar 20, 2023

Hi. We are using TIMESTAMP and it stores in GMT. When we write data with Timezone, it converts and stores it in GMT. We do not use the session time zone. We always pull out in GMT. So we can just slap a Z on it. It comes out +00:00 which I find annoying from a usability point of view, so we should just make it Z.

@simsong
Copy link
Contributor

simsong commented Mar 20, 2023

So if you want, we can just append a Z or a +00:00. Would that make things okay? Or would you like things in time_t format?

@ajnelson-nist
Copy link
Contributor Author

Just appending a Z is great. But, it's not clear to me how that would be handled within ctools. You'd need to know which fields in a query are timestamp fields. Or is there a MySQL database function that returns things in ISO-8601 format with timezone? I've been in other database engines recently, and am not familiar with MySQL's nuances anymore.

@simsong
Copy link
Contributor

simsong commented Mar 20, 2023

Look, we are just doing a select *:

"""SELECT * from downloadable WHERE present=1 ORDER BY s3key""",

The real issue is that you are using the DictWriter's writerow:

https://docs.python.org/3/library/csv.html#csv.csvwriter.writerow

So you are getting the way the datetime object gets converted to a Dialect:

https://docs.python.org/3/library/csv.html#csv.Dialect

And what's happening is that datetime is coming over as a UTC time without a timezone, and it's being converted to a string.

So if you want this to work with DictWriter, you're going to need to go into the dict and manually convert the datetime to a string and append a Z. This is not a ctools issue, this is a DictWriter issue.

@simsong
Copy link
Contributor

simsong commented Mar 25, 2023

We have Z in listing, but not in search results.
https://downloads.digitalcorpora.org/search?q=dmg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants