Introducing a subscription API for autonomous task scheduling #11779

chrisguidry · 2024-01-30T21:28:53Z

In earlier work, we've introduced autonomous task scheduling, where tasks
outside a flow run are created as scheduled and picked up by one or more
processes running Task.serve. In our initial implementation, we used a
polling approach where each TaskSever would make requests from the API to
look for any tasks that were currently Scheduled, and then move them to
Running as they entered the task engine.

This work introduces a new mechanism for TaskServers to get work from their
Prefect Server: a long-lived websocket connection subscribed to a queue of
TaskRuns to be worked. Because the Prefect Server is a singleton, it can
govern a queue in-memory that will be distributed out among each of the
TaskServers to make a simple task brokering system.

The websocket implementation is modeled on the events/in and events/out
websockets in Prefect Cloud, and it's expected that we'd negotiate
authentication in a common way across all websockets.

Note: this does not address issues of resiliency, like what happens if the
Prefect Server is restarted (in-flight tasks would be lost), or if there are
no TaskServers draining the Queue (the Prefect Server would eventually run out
of memory), or if a TaskServer died before transitioning a task to Running
(the task would remain Scheduled and never get picked up). These are some
of the items I'd like to address in future work if we like this direction.

Note: there are no tests for this new subsystem yet.

…-task-engine

Co-authored-by: Nathan Nowack <thrast36@gmail.com>

…11737) Co-authored-by: Chris Guidry <chris.g@prefect.io>

… into init-task-engine

…refectHQ/prefect into init-task-engine

In earlier work, we've introduced autonomous task scheduling, where tasks outside a flow run are created as scheduled and picked up by one or more processes running `Task.serve`. In our initial implementation, we used a polling approach where each `TaskSever` would make requests from the API to look for any tasks that were currently `Scheduled`, and then move them to `Running` as they entered the task engine. This work introduces a new mechanism for `TaskServer`s to get work from their Prefect Server: a long-lived websocket connection subscribed to a queue of `TaskRun`s to be worked. Because the Prefect Server is a singleton, it can govern a queue in-memory that will be distributed out among each of the `TaskServer`s to make a simple task brokering system. The websocket implementation is modeled on the `events/in` and `events/out` websockets in Prefect Cloud, and it's expected that we'd negotiate authentication in a common way across all websockets. Note: this does not address issues of resiliency, like what happens if the Prefect Server is restarted (in-flight tasks would be lost), or if there are no `TaskServer`s draining the Queue (the Prefect Server would eventually run out of memory), or if a `TaskServer` died before transitioning a task to `Running` (the task would remain `Scheduled` and never get picked up). These are some of the items I'd like to address in future work if we like this direction. Note: there are no tests for this new subsystem yet.

chrisguidry · 2024-01-30T21:30:06Z

src/prefect/task_server.py

@@ -77,14 +77,7 @@ async def start(self) -> None:

        async with self as task_server:
            async with self._loops_task_group as tg:
-                tg.start_soon(


Here I'm removing the old polling loop...

chrisguidry · 2024-01-30T21:30:14Z

src/prefect/task_server.py

-                        jitter_range=0.3,
-                    )
-                )
+                tg.start_soon(task_server._subscribe_to_task_scheduling)


...and replacing it with the subscription

chrisguidry · 2024-01-30T21:31:13Z

src/prefect/task_server.py

+            logger.info(f"Received task run: {task_run.id} - {task_run.name}")
+            await self._submit_pending_task_run(task_run)
+
+    async def _submit_pending_task_run(self, task_run: TaskRun):


This function is just the inner guts of the _submit_pending_task_runs loop, just intended to operate on one task run at a time; no other changes were made except that.

abrookins

Looking awesome. I'm inspired to go and switch out all the RunInput and pause/resume polling, as soon as I understand this pattern better.

abrookins · 2024-01-31T02:29:30Z

src/prefect/server/api/task_runs.py

+
+    new_task_run: schemas.core.TaskRun = schemas.core.TaskRun.from_orm(model)
+
+    # Place autonomously scheduled task runs onto a notification queue for the websocket


This feels like the heart of the matter, in some ways, and whatever we discover while exploring these waters should be useful for e.g. replacing workers polling work queues with websocket subscriptions. But other than saying I'm intrigued I should probably pass over this in silence for the moment.

Yes very much, and I think we can grow this functionality out to support objects in other states for other subscriptions. In Prefect Server, we should be able to handle a fairly significant number of in-flight objects in memory. For Prefect Cloud we'll need more of an external message broker, but the same concept applies.

netlify · 2024-01-31T18:56:33Z

✅ Deploy Preview for prefect-docs-preview ready!

Name	Link
🔨 Latest commit	`d7035c9`
🔍 Latest deploy log	https://app.netlify.com/sites/prefect-docs-preview/deploys/65ba9b78755e7b000803cacf
😎 Deploy Preview	https://deploy-preview-11779--prefect-docs-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

zzstoatzz and others added 25 commits January 22, 2024 17:36

init task engine

083fe2d

Merge branch 'main' of https://github.com/PrefectHQ/prefect into init…

bdc2a1d

…-task-engine

exploring task engine

768ed79

change name

52e5bf8

add _not_ task runner

69f8bf6

background

57eaefb

rm comment

0fc1d12

Merge branch 'main' of https://github.com/PrefectHQ/prefect into init…

f2cd571

…-task-engine

init init task server

26214c0

wip

e32dbae

rm breakpoint

dee5d4e

Merge branch 'main' of https://github.com/PrefectHQ/prefect into init…

ea6d5b3

…-task-engine

task parameters in storage (#11736)

77a55df

Co-authored-by: Nathan Nowack <thrast36@gmail.com>

run pre-commits

5fd8ffb

track task run params in state details to avoid quantum entanglement (#…

9c4ef1b

…11737) Co-authored-by: Chris Guidry <chris.g@prefect.io>

Giving TaskServer a run_once testing method

e74e3c1

update logging

d854a69

Merge branch 'init-task-engine' of https://github.com/PrefectHQ/prefect…

2bc96f4

… into init-task-engine

add some basic tests

633f5fb

add docstring

62786f3

add TaskRunFilterFlowRunId client and server filters

1684731

Merge branch 'allow-filter-null-flow-run-ids' of https://github.com/P…

596c150

…refectHQ/prefect into init-task-engine

pull in new filter and update setting name and mask flow run log

7aae491

merge conflicts + new filter

3303016

chrisguidry commented Jan 30, 2024

View reviewed changes

chrisguidry requested review from abrookins and zzstoatzz January 30, 2024 21:38

chrisguidry requested a review from urimandujano January 30, 2024 21:39

abrookins approved these changes Jan 31, 2024

View reviewed changes

zzstoatzz and others added 3 commits January 31, 2024 08:47

[task scheduling] task server tweaks (#11785)

6a13bbe

Make sure that the scheduled task runs queue is created on an event loop

366ef46

Task subscription api retry queue (#11789)

254a89f

chrisguidry marked this pull request as ready for review January 31, 2024 18:53

chrisguidry requested review from zangell44 and a team as code owners January 31, 2024 18:53

chrisguidry changed the base branch from init-task-engine to main January 31, 2024 18:56

Merge branch 'main' into task-subscription-api

4f7f581

chrisguidry added 2 commits January 31, 2024 14:02

Using the experimental flag for the enqueuing of automated tasks

6ee16c0

Removing earlier sketch

d7035c9

chrisguidry merged commit e92af4b into main Jan 31, 2024
55 of 60 checks passed

chrisguidry deleted the task-subscription-api branch January 31, 2024 19:51

zanieb mentioned this pull request Feb 8, 2024

Run tasks on the main thread #11930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing a subscription API for autonomous task scheduling #11779

Introducing a subscription API for autonomous task scheduling #11779

chrisguidry commented Jan 30, 2024

chrisguidry Jan 30, 2024

chrisguidry Jan 30, 2024

chrisguidry Jan 30, 2024

abrookins left a comment

abrookins Jan 31, 2024

chrisguidry Jan 31, 2024

netlify bot commented Jan 31, 2024 •

edited

Loading


		new_task_run: schemas.core.TaskRun = schemas.core.TaskRun.from_orm(model)

		# Place autonomously scheduled task runs onto a notification queue for the websocket

Introducing a subscription API for autonomous task scheduling #11779

Introducing a subscription API for autonomous task scheduling #11779

Conversation

chrisguidry commented Jan 30, 2024

chrisguidry Jan 30, 2024

Choose a reason for hiding this comment

chrisguidry Jan 30, 2024

Choose a reason for hiding this comment

chrisguidry Jan 30, 2024

Choose a reason for hiding this comment

abrookins left a comment

Choose a reason for hiding this comment

abrookins Jan 31, 2024

Choose a reason for hiding this comment

chrisguidry Jan 31, 2024

Choose a reason for hiding this comment

netlify bot commented Jan 31, 2024 • edited Loading

✅ Deploy Preview for prefect-docs-preview ready!

netlify bot commented Jan 31, 2024 •

edited

Loading