Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect serialisation of the FixedTimezone #38139

Merged
merged 1 commit into from
Mar 16, 2024
Merged

Conversation

millin
Copy link
Contributor

@millin millin commented Mar 14, 2024

When the start_date of a DAG is specified with a fixed timezone, its serialisation will not be correct and this causes the scheduler to crash.

Example of an entry in the data column of the table serialized_dag
before fix:

{
...
  "start_date": 1684684800.0,
  "timezone": "FixedTimezone(28800, name=\"+08:00\")",
...
}

after fix:

{
...
  "start_date": 1684684800.0,
  "timezone": 28800,
...
}
Crash log
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/cli/commands/scheduler_command.py", line 52, in _run_scheduler_job
    run_job(job=job_runner.job, execute_callable=job_runner._execute)
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/utils/session.py", line 79, in wrapper
    return func(*args, session=session, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/job.py", line 393, in run_job
    return execute_job(job, execute_callable=execute_callable)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/job.py", line 422, in execute_job
    ret = execute_callable()
          ^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 855, in _execute
    self._run_scheduler_loop()
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 987, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 1061, in _do_scheduling
    self._create_dagruns_for_dags(guard, session)
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/utils/retries.py", line 91, in wrapped_function
    for attempt in run_with_db_retries(max_retries=retries, logger=logger, **retry_kwargs):
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 347, in __iter__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/utils/retries.py", line 100, in wrapped_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 1133, in _create_dagruns_for_dags
    self._create_dag_runs(non_dataset_dags, session)
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 1167, in _create_dag_runs
    dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/utils/session.py", line 76, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/dagbag.py", line 191, in get_dag
    self._add_dag_from_db(dag_id=dag_id, session=session)
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/dagbag.py", line 273, in _add_dag_from_db
    dag = row.dag
          ^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/serialized_dag.py", line 231, in dag
    return SerializedDAG.from_dict(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/serialization/serialized_objects.py", line 1443, in from_dict
    return cls.deserialize_dag(serialized_obj["dag"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/serialization/serialized_objects.py", line 1362, in deserialize_dag
    v = cls._deserialize_timezone(v)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/utils/timezone.py", line 294, in parse_timezone
    return pendulum.timezone(name)  # type: ignore[operator]
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/pendulum/__init__.py", line 86, in timezone
    return Timezone(name)
           ^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/pendulum/tz/timezone.py", line 67, in __new__
    raise InvalidTimezone(key)
pendulum.tz.exceptions.InvalidTimezone: FixedTimezone(28800, name="+08:00")
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/zoneinfo/_common.py", line 12, in load_tzdata
    return resources.files(package_name).joinpath(resource_name).open("rb")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/pathlib.py", line 1044, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/airflow/.local/lib/python3.11/site-packages/tzdata/zoneinfo/FixedTimezone(28800, name="+08:00")'

This issue introduced in Airflow 2.8.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@uranusjr
Copy link
Member

I hate Pendulum’s timezone classes

@potiuk
Copy link
Member

potiuk commented Mar 14, 2024

I hate Pendulum’s timezone classes

You are not alone

@Taragolis
Copy link
Contributor

I expect that this happen only in Pendulum 3 because inheritance of pendulum timezones changed in that version.

Let's me join to the club who hate pendulum timezones😅

This issue introduced in Airflow 2.8.

When the `start_date` of a DAG is specified with a fixed timezone, its serialisation will not be correct and this causes the scheduler to crash.
@eladkal eladkal added this to the Airflow 2.9.0 milestone Mar 16, 2024
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Mar 16, 2024
@Taragolis Taragolis merged commit 0720ea0 into apache:main Mar 16, 2024
51 checks passed
@millin millin deleted the patch-1 branch March 16, 2024 20:31
@millin
Copy link
Contributor Author

millin commented Mar 19, 2024

Maybe this fix should be cherry-picked to the 2.8.4 version?

@potiuk potiuk modified the milestones: Airflow 2.9.0, Airflow 2.8.4 Mar 19, 2024
potiuk pushed a commit that referenced this pull request Mar 19, 2024
This issue introduced in Airflow 2.8.

When the `start_date` of a DAG is specified with a fixed timezone, its serialisation will not be correct and this causes the scheduler to crash.

(cherry picked from commit 0720ea0)
utkarsharma2 pushed a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
This issue introduced in Airflow 2.8.

When the `start_date` of a DAG is specified with a fixed timezone, its serialisation will not be correct and this causes the scheduler to crash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants