Skip to content

LongTimerTask isn't cancelling properly #573

@shaneseaton

Description

@shaneseaton

🐛 Describe the bug
I am trying to build an approval workflow, and the deadline for approvals is two weeks.

@bp.orchestration_trigger(context_name="context")
def test_long_timer(context: df.DurableOrchestrationContext):
    deadline = context.current_utc_datetime + timedelta(days=14)
    approval_event_task = context.wait_for_external_event("testapproval")
    timeout_task = context.create_timer(deadline)

    winner = yield context.task_any([approval_event_task, timeout_task])
    if winner == approval_event_task:
        timeout_task.cancel()       
        return True
    elif winner == timeout_task:
        return False

If I get an approval event, the timeout_task attempts to cancel, and the function tries to exit. Now if the days is only 1 or 2, we get TimerTask, and everything works as expected. However since the days is 14, we get a LongTimerTask, and the cancellation doesn't work on it. The result is the function exits, but the framework wont mark it completed until the LongTimer completes.

🤔 Expected behavior
After cancelling the long_time_task, the function should have been able to complete, instead of waiting for the LongTimerTask to complete first.

Steps to reproduce
Using the code above, after sending a 'testapproval' event, you will see the orchestration wont complete.

If deployed to Azure
This was failing locally while developing using Azurite, and after deploying to Azure using default Storage

Hack, possible clues
So I figured the issue here could be:

  • the timeout task is stuck in 'running' state
  • given the LongTimerTask is a CompositeTask, the cancelation were not being applied to all children.

I really am not sure why, but this solved the problem, however I am unsure if it created more problems down the line.

   if winner == approval_event_task:
        timeout_task.cancel()       

        # Cancel Timer Hack
        if isinstance(timeout_task, azure.durable_functions.models.Task.LongTimerTask):
            for child in timeout_task.pending_tasks:
                child.cancel()
            timeout_task.set_value(False, None)
        # End Hack

        return True

After putting in this code, it just works as expect, but I hate the hack.

Aside note
I find this framework incredibly useful, but it is SO HARD to understand what is really happening. Are there any good articles on how this is all hanging together. Like what tasks are, how they relate to actions, and I can't for the life of me workout how task/actions are scheduled, or where the magic runtime is that ordinates all this logic. Any leads would be great (don't say the normal documentation, I want the nitty gritty details). I am not expected anyone to write that up, just hoping it somewhere in a niche corner of the web

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions