Skip to content

Don't use time.time() or timezone.utcnow() for duration calculations#12353

Merged
ashb merged 1 commit intoapache:masterfrom
astronomer:dont-diff-time-or-utcnow-for-durations
Nov 29, 2020
Merged

Don't use time.time() or timezone.utcnow() for duration calculations#12353
ashb merged 1 commit intoapache:masterfrom
astronomer:dont-diff-time-or-utcnow-for-durations

Conversation

@ashb
Copy link
Member

@ashb ashb commented Nov 13, 2020

time.time() - start, or timezone.utcnow() - start_dttm will work
fine in 99% of cases, but it has one fatal flaw:

They both operate on system time, and that can go backwards.

While this might be surprising, it can happen -- usually due to clocks
being adjusted.

And while it is might seem rare, for long running processes it is more
common than we might expect. Most of these durations are harmless to get
wrong (just being logs) it is better to be safe than sorry.

Also the utcnow() style I have replaced will be much lighter weight -
creating a date time object is a comparatively expensive operation, and
computing a diff between two even more so, especially when compared to
just subtracting two floats.

To make the "common" case easier of wanting to compute a duration for a
block, I have made Stats.timer() return an object that has a
duration field.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues provider:Apache provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues labels Nov 13, 2020
@ashb ashb requested review from XD-DENG and turbaszek November 13, 2020 23:05
@ashb
Copy link
Member Author

ashb commented Nov 13, 2020

(This will likely fail tests at first. I likely messed up some calculations somewhere.)

@turbaszek
Copy link
Member

Should we add note about this approach in https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices ?

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@kaxil kaxil force-pushed the dont-diff-time-or-utcnow-for-durations branch from 77f5299 to 0f5a3a7 Compare November 14, 2020 03:44
@kaxil
Copy link
Member

kaxil commented Nov 14, 2020

Master was failing, I have fixed it and rebased this PR

@ashb
Copy link
Member Author

ashb commented Nov 14, 2020

Should we add note about this approach in https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices ?

Done in a7642459d

@kaxil
Copy link
Member

kaxil commented Nov 14, 2020

One failure

self = <Task(PythonSensor): test_reschedule_handling_sensor>
context = {'conf': <airflow.configuration.AirflowConfigParser object at 0x7f383eda7860>, 'dag': <DAG: test_reschedule_handling>,...36:50.699432+00:00: scheduled__2020-11-14T12:36:50.699432+00:00, externally triggered: False>, 'ds': '2020-11-14', ...}

    def execute(self, context: Dict) -> Any:
        started_at = time.monotonic()
        try_number = 1
        log_dag_id = self.dag.dag_id if self.has_dag() else ""
        if self.reschedule:
            # If reschedule, use first start date of current try
            task_reschedules = TaskReschedule.find_for_task_instance(context['ti'])
            if task_reschedules:
                started_at = task_reschedules[0].start_date
                try_number = len(task_reschedules) + 1
        while not self.poke(context):
>           if time.monotonic() - started_at > self.timeout:
E           TypeError: unsupported operand type(s) for -: 'float' and 'FakeDatetime'

airflow/sensors/base_sensor_operator.py:210: TypeError

@ashb ashb force-pushed the dont-diff-time-or-utcnow-for-durations branch from a764245 to e03c1c3 Compare November 16, 2020 10:22
@ashb
Copy link
Member Author

ashb commented Nov 16, 2020

That failure was an actual bug -- I had broken rescheduled sensors. Yay for tests!

@github-actions
Copy link

The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 19, 2020
@ashb ashb force-pushed the dont-diff-time-or-utcnow-for-durations branch from b9c63a1 to 91d9d67 Compare November 19, 2020 11:59
@ashb
Copy link
Member Author

ashb commented Nov 19, 2020

Hopefully got the tests sorted now (just one last instance of mocking time.time when it now needs to mock time.monotonic)

@ashb ashb force-pushed the dont-diff-time-or-utcnow-for-durations branch from 91d9d67 to 6eeac20 Compare November 19, 2020 22:03
@ashb
Copy link
Member Author

ashb commented Nov 20, 2020

Hmmmm, even with rebasing Kube tests are failing. I wonder what I broke.

@ashb ashb force-pushed the dont-diff-time-or-utcnow-for-durations branch from 6eeac20 to f960ac7 Compare November 20, 2020 23:05
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@ashb ashb force-pushed the dont-diff-time-or-utcnow-for-durations branch from f960ac7 to f05dd39 Compare November 25, 2020 12:34
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@ashb ashb force-pushed the dont-diff-time-or-utcnow-for-durations branch 2 times, most recently from b445dee to a650be6 Compare November 27, 2020 22:57
`time.time() - start`, or `timezone.utcnow() - start_dttm` will work
fine in 99% of cases, but it has one fatal flaw:

They both operate on system time, and that can go backwards.

While this might be surprising, it can happen -- usually due to clocks
being adjusted.

And while it is might seem rare, for long running processes it is more
common than we might expect. Most of these durations are harmless to get
wrong (just being logs) it is better to be safe than sorry.

Also the `utcnow()` style I have replaced will be much lighter weight -
creating a date time object is a comparatively expensive operation, and
computing a diff between two even more so, _especially_ when compared to
just subtracting two floats.

To make the "common" case easier of wanting to compute a duration for a
block, I have made `Stats.timer()` return an object that has a
`duration` field.
@ashb ashb force-pushed the dont-diff-time-or-utcnow-for-durations branch from a650be6 to 5225354 Compare November 28, 2020 23:45
@ashb
Copy link
Member Author

ashb commented Nov 29, 2020

YAY GREEN FINALLY!

@ashb ashb merged commit 02d9434 into apache:master Nov 29, 2020
@ashb ashb deleted the dont-diff-time-or-utcnow-for-durations branch November 29, 2020 10:12
@turbaszek
Copy link
Member

@ashb what exactly was the problem with the tests?

@ashb
Copy link
Member Author

ashb commented Nov 29, 2020

https://github.com/apache/airflow/pull/12353/files#diff-f0e400bd7a526279c114ce0bd87e80f1f9344885b749f2da495d4f7bb670432eR228

When statsd was actually enabled that line was failing, but we don't test with statsd on anywhere outside of the kube tests (thankfully?) -- but to get the error we had to download the kube logs artifact and trawl through the logs until I found the actual error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues full tests needed We need to run full set of tests for this PR to merge provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments