New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change timezone serialization logic #16631
Conversation
Nice! LGTM, I'd love others take a look as well though. |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
except InvalidTimezone: | ||
pass | ||
try: | ||
return pendulum.parse("2021-01-01T12:00:00" + tz_name).tzinfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more than slightly odd -- when is this case hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From any of the new cases provided in the tests_dag_serialization.py:test_deserialization_start_date()
But as an example if the DAG start_date is pendulum.parse("2019-08-01T00:00:00.000+01:00"
then DAG timezone will be a pendulum.tz.timezone.FixedOffset
with name +01:00
so that it's what it will be serialized to.
Now in the deserialization step pendulum.timezone('+01:00')
will raise a InvalidTimezone
because that only work for "named" timezones (and not all I must add). See sdispater/pendulum#554
So if we want to parse +01:00
back to a timezone object, pendulum.timezone()
won't do it, we need to "hack it" by creating a "fake" iso 8601 string with the +01:00
and using pendulum.parse()
to construct a FixedTimezone()
object for us.
I couldn't find any other method in pendulum's to convert the string '+01:00
to a FixedTimezone/Timezone/datetime.tzinfo
except InvalidTimezone: | ||
pass | ||
try: | ||
return pendulum.parse("2021-01-01T12:00:00" + tz_name).tzinfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From any of the new cases provided in the tests_dag_serialization.py:test_deserialization_start_date()
But as an example if the DAG start_date is pendulum.parse("2019-08-01T00:00:00.000+01:00"
then DAG timezone will be a pendulum.tz.timezone.FixedOffset
with name +01:00
so that it's what it will be serialized to.
Now in the deserialization step pendulum.timezone('+01:00')
will raise a InvalidTimezone
because that only work for "named" timezones (and not all I must add). See sdispater/pendulum#554
So if we want to parse +01:00
back to a timezone object, pendulum.timezone()
won't do it, we need to "hack it" by creating a "fake" iso 8601 string with the +01:00
and using pendulum.parse()
to construct a FixedTimezone()
object for us.
I couldn't find any other method in pendulum's to convert the string '+01:00
to a FixedTimezone/Timezone/datetime.tzinfo
pendulum.datetime(2019, 8, 1, tz=FixedTimezone(3600)), | ||
None, | ||
pendulum.datetime(2019, 8, 1, tz=FixedTimezone(3600)), | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashb , any of this added cases with offset timezones "+01:00", FixedTimezone(3600)
, etc will raise an InvalidTimezone
exception in the SerializedDAG._deserialize_timezone()
at the pendulum.timezone(xxx)
.
Those timezones serialize to the timezone name which is +01:00
,etc and pendulum.timezone('+01:00')
raises InvalidTimezone
because that method is meant to resolve "named" timezones only (so only works with UTC
, Europe/Stockholm
, EST
, etc )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this works:
In [13]: pendulum.timezone(3600)
Out[13]: Timezone('+01:00')
Maybe we need to change the serialization format to accept string or int -- I think that might be less hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe coupled with tz_name in pendulum.tz.timezones
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try but I really that will add some very pendulum-specific logic.
I mean, it will involve checking if this is a pendulum.tz.timezone.FixedTimezone
and serializing to int if so, there is no "standard" good way of checking if a given datetime.tzinfo
is a simple timezone (fixed offset only) or a full fledged timezone (with arbitrarily complex dst rules, etc).
In principle the user can provide a datetime.tzinfo
of it's own with name RubenTZ
and that will fail the pendulum.timezone('RubenTZ')
but doesn't mean that we can just serialize to a fixedoffset (RubenTZ is much more complex)
What I'm trying to say is that is going to endup being very hackish anyway, but in the serializing part instead of the deserializing part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe coupled with tz_name in pendulum.tz.timezones?
I'm not sure what you mean , but if you mean to use .tzname()
instead of .name
I can already tell you that is even more brittle.
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Just for clarification, The whole
Besides that, there are some other pendulum/
|
Man I hate timezones. |
Quite agree :) |
Random timezone factoid: Ireland and the UK aren't in the same timezone Ireland is technically in Central European Time, but instead of Summer time, they Have Winter time. https://danq.me/2021/05/11/ireland-timezone/ 🤯 |
Who doesn't ? I was just focused on getting rid of the exceptions when you use
But in general get rid of "remembering" what was the original datetime's timezone. |
Eucla: UTC +8:45 😱 |
This is what we now do (as of 1.10.....10?) |
DAGs do need a timezone, as sometimes often you want a DAG to run at midnight "local" time, not midnight UTC. |
If that's the only purpose let me explore the possibility of serializing the timezone as a numeric offset (seconds from utc). Unless there is some interaction with the DST changes. I'll do some tests EDIT: forget I said that, obviously most timezone are more that just an offset so that's not going to work. I guess if it has a name we need to look it up. And we should check at DAG.init that the timezone is indeed serializable/deserializable to a pendulum.timezone |
There is yes -- if I make the dag run at midnight "Europe/London" it should follow DST. |
tz_name = getattr(var, 'name', None) | ||
if not tz_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also use tzname
for pendulum Timezone as well? It’s a standard method that should always yield something useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a standard method that should always yield something useful
Well, not always with pendulum as explained in #16631 (comment) , pendulum.timezone('Europe/Stockholm').tzname(...)
can return CEST
which can't be converted back with pendulum.timezone('CEST')
as it raises InvalidTimezone
I did try that at some point. The whole concept of datetime.tzinfo
-> str
-> datetime.tzinfo
works sometimes but there is no really hard guarantee. So that's why I use the .name
if present (which marginally increases the chances of success serialization/deserialization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gosh we really should drop Pendulum at some point I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell there is no good timezone library in python really :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no good timezone libraries, in general 🙂 Personally I think the new zoneinfo
is as good as it can be.
Closes #16613
Current dag serialization logic assumes the the
dag.timezone
is apendulum.Timezone
and more specifically a "named" timezone (FixedTimezone
will fail to serialize for example) and not just adatetime.tzinfo
.The proposed change will work with named timezones
pendulum.tz.timezone.Timezone
, fixed offset timezonespendulum.tz.timezone.FixedTimezone
and "regular"datetime.tzinfo
implementations.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.