Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how are timezones handled when available? #32

Closed
rahulbot opened this issue Jul 16, 2021 · 3 comments
Closed

how are timezones handled when available? #32

rahulbot opened this issue Jul 16, 2021 · 3 comments
Labels
question Further information is requested

Comments

@rahulbot
Copy link
Contributor

Some articles include the full publication time, with timezone, in HTML meta tags or Javascript config. Does this library parse and handle those timezones? Relatedly, how does it internally store dates with regards to timezone - are the all returned in machine-local time, held in GMT, or something else?

For instance, this Guardian article includes the article:published_time meta tag with a timezone included. Does this library recognize that timezone and return the date as it would be in GMT? Same for this article on CNN, which includes the datePublished meta tag.

@adbar
Copy link
Owner

adbar commented Jul 16, 2021

Hi @rahulbot, since I was mostly interested in a granularity on day level I didn't implement time zone identification so far. However, the underlying libraries python-dateutil, dateparser, and the optional one ciso8601 all deal with it IMHO.

@adbar adbar added the question Further information is requested label Jul 16, 2021
@rahulbot
Copy link
Contributor Author

Good to know, thanks. In the longer term, if we do switch to htmldate for use in Media Cloud we might explore integrating time parsing (at least for the machine readable timestamps in metadata). In that case we'd probably add in timezone parsing.

@moehmeni
Copy link

You can use %Y-%m-%dT%H:%M:%S%z as the outputformat argument
Output :
2021-10-18T15:30:00+0330


And with that output and something like python-dateutil package (parse method) , you can reach this pattern :
2021-10-18 15:30:00+03:30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants