Runtime Manager Server #193

josephjclark · 2023-03-17T17:00:47Z

This PR contains:

A new Runtime Manager Server (a HTTP interface between Lightning and the Runtime Manager)
An updated Runtime Manager with Workflow support, using workerpool
A mock Lightning Server with an attempts REST API.

Some Terminology

The various components here all have different ideas of what a "workflow" actually means, so unfortunately we need to be a bit careful with language,

A Workflow is an abstract definition of a series of jobs, stored in the Lightning database. It has triggers, jobs and edges.
An attempt is a JSON serialisation of a Lightning Workflow, representing a single run of that Workflow. It has triggers, jobs and edges, as well as a result (which may well include stats like start time, duration and so on).
An Execution Plan is a JSON representation of a "workflow", represented as a graph of jobs with edges.

The terms Workflow and Execution Plan are often used interchangably. The CLI and runtime technically support Execution Plans, although the word "workflow" is often used informally because it makes sense to users.

General Architecture

This PR assumes (and starts building) a particular architecture:

Lightning users create Workflows by linking jobs and triggers together with edges.
When a workflow is triggered, Lightning will generate an Attempt (with a unique id) and add it to a queue
Lightning exposes an attempts/next API, which on POST will remove the top n attempts from the queue an send them to the caller.
The Runtime Manager Server will poll the attempts/next endpoint, greedily fetching as many Attempts as its configuration allows.
The Runtime Manager Server will convert an Attempt into an Execution Plan, and call execute on the Runtime Manager instance available to it.
The Runtime Manager will execute a plan in a worker thread, broadcasting events (start, complete, log)
The Runtime Manager Server will post results and log information back to Lightning as appropriate via HTTP.

Runtime Manager Changes

The previous iteration of the Runtime Manager was just a proof of concept for using threads.

Frankly there's still a lot of stuff we need to prove in terms of using threads, and we may well want to setup a different engine at a later date. That's a job for later.

For now, the Runtime Manager has been updated to accept an execution plan, which it will duly run in a worker thread, publishing events as it goes.

To facilitate development, the runtime manager can accept a mock runtime worker function. This does not use @openfn/runtime, and instead will eval the first expression in the plan and return the result. This mock worker is only intended to be used at dev time, and will not execute real jobs properly. I consider this safe because the mock worker will never be used in production. Meanwhile, the ability to run real code allows us to unit test behaviours like logging, basic state management, and error handling. It allows us to "black box" runtime execution and test the behaviours around it.

I've not really implemented this yet, but the RTM will accept hook functions which allow state and config to be lazily resolved from a string id. The RTM server will provide those hooks and call out to Lightning.

RTM Server

The RTM Server's job is to create a HTTP wrapper around the runtime manager.

The RTM server is the communication bridge between Lightning and an RTM. It's fairly thin, and is responsible for:

Calling out to Lightning to ask for work
Executing Attempts
Pushing events and logs to Lightning
Posting the final result to Lightning

The RTM server, for now, exposes a http endpoint that allows a workflow to be posted directly. This is essential for local and unit testing - it means we can test its behaviour standalone without the need of a mock/real Lightning instance.

For now, the RTM server will post all logs directly to Lightning. Later, we'll add some kind of batch/debounce behaviour (maybe even a web socket).

No auth strategy has been implemented at the moment.

The RTM and RTM server are designed so far to have zero persistence.

You can start a dev server by running pnpm start:watch from packages/rtm-server. This will recompile the server when the server OR RTM are changed. You can post an attempt directly to this server via curl:

curl -X POST http://localhost:2222/workflow -d @tmp/attempt.json -H "Content-Type: application/json"

If you've also got the Lightning server running, you can interface to it with:

pnpm start:watch -l mock

This will call out to the default mock server. Pass-l http://localhost:1234 or whatever if the mock is on a different point (or you want to redirect to a different server)

Lightning API

There's a first pass of a Lightning API in packages/rtm-server/src/mock/lightning.ts. It defines a bunch of routes, and implements mock functionality for some of them.

Start up the lightning server from rtm-server with:

pnpm start:lightning

The key API endpoints are:

POST api/1/attempts/next
GET api/1/credential/:id
POST api/1/attempts/log/:id
POST api/1/attempt/complete/:id

Take a look at src/mock/lighting/api for a nice declaration of the API. Implementations can be found in ./middleware'.

Note that attempts/next is a POST because it changes the server's state.

Attempts should return one or more Attempt objects, with a complete execution plan inside (including state and jobs at the moment). Later we might break this up - but it's quite a complicated problem on the Lightning side. Easier just to dump all the information we need into the queue table.

Credentials should never be stored or logged in the RTM server.

The log endpoint accepts an array of log items.

Note that because the RTM and RTM server have no persistence, Lightning should maintain its own timeout tracking on all Attempts. Once an Attempt has been taking from the queue, after a timeout has elapsed, Lightning should restore the Attempt back to the queue after a timeout, under the assumption that the RTM which claimed the attempt is unavailable.

Results from expired/timedout/completed events should be rejected by the Lightning server.

Still to do

Well everything really, but here are some major work items to give a sense of the shape of it

Finish up the mock lightning server (we're at a good starting point now)
Finish up the mock RTM Server (also a good starting point now)
Add integration tests to show the two APIs talking to one another (and doing nothing)
Work out how to handle runtime logs
Make the existing runtime manager support Workflows (not just jobs)
Plug the existing RTM into the RTM server to actually process jobs
Write proper integration tests RTM server and live RTM (but still using mock lightning)
Adaptor versions aren't properly fed through to the runtime RTM: linker is not be loading the correct adaptor version #266
No way to pass initial state through RTM Server: No way to pass initial state yet #267

josephjclark · 2023-06-09T15:31:25Z

Just to prove things sort of work at the moment:

Here is me posting a very simple attempt to a temporary lightning dev API:

Here's the log from the Lightning POV: you can see the dev requrst be accepted, the polling from the RTM server, and if you look carefully you can see the complete callback when the workflow is done:

And here is the log from the runtime manager server. You can see it pull the work from Lightning, process it (through a worker thread), and return the result. This will work with autoinstall btw.

josephjclark · 2023-06-09T15:52:04Z

This PR includes #262 (which may explain a couple of things in the diff)

I think this is better?

- use Lightning view of Attempt and convert it to an ExecutionPlan - start restructuring tests to be more consistent and readable - remove some unused stuff - update notes with better architectural docs - RTM has to be passed into the server now, it no longer creates its own mock

Apart from integration, which is gona have a rethink

Currently broken when executing

josephjclark · 2023-07-27T08:31:22Z

I've just rebased and pushed this. And the diff is legit!

josephjclark marked this pull request as draft March 17, 2023 17:19

taylordowns2000 mentioned this pull request Mar 24, 2023

New runtime manager service #52

Closed

josephjclark changed the title ~~Runtime Manager~~ Runtime Manager & Workflow Support Apr 5, 2023

josephjclark changed the title ~~Runtime Manager & Workflow Support~~ Runtime Manager Server & Workflow Support Apr 5, 2023

josephjclark self-assigned this Apr 5, 2023

josephjclark changed the title ~~Runtime Manager Server & Workflow Support~~ Runtime Manager Server Apr 5, 2023

josephjclark force-pushed the rtm branch from 2cb3f2d to 6e80d23 Compare May 25, 2023 17:24

josephjclark marked this pull request as ready for review June 9, 2023 15:49

josephjclark removed their assignment Jun 9, 2023

taylordowns2000 assigned stuartc Jun 13, 2023

josephjclark added 18 commits July 27, 2023 09:18

rtm-server: start a mock service

d1f52a4

rtm-server: sort of add integration tests

bc66c24

package lock

950c33a

rtm-server: String togeether a mock lightning server

427e972

rtm: big refactor of lightning api

0386110

rtm-server: Update mock implementation

44cf555

rtm: refactor lightning mock for a nicer seperation of api and logic

1cd3c6c

rtm: tweak api layout

68e47cc

rtm: hook up integration tests for glorious victory

d643fba

rtm-server: refactoring out the core worker loop

9806f57

I think this is better?

rtm-server: fix all tests

a5943af

Apart from integration, which is gona have a rethink

rtm-server: remove axios

3c2349f

rtm-server: fix integration

bcec14e

rtm-server: lightning mock must receive an rtm id

0662eb5

rtm-server: add logging support

f37959f

rtm-server: allow server to start from dev console with post api

19b23c0

runtime-manager: start moving API over to new style

b4e918a

Currently broken when executing

josephjclark added 24 commits July 27, 2023 09:26

rtm-server: flesh out lightning mock a bit

9cf78f6

rtm-server: refactor dev apis for lightning, add some docs

c036e7d

rtm: update tests

9ee910a

rtm-server: udpate readme

d534708

rtm-server: lightning api restructure

5ad2233

rtm-server: log http stuff at debug

02afdd0

rtm-server: sundry improvements, fix backoff

22152e8

rtm: load repo from env var

f5e4400

runtime-manager: get autoinstall working

d04c1e6

rtm: fix autoinstall, prefix local logs with workflowid

bb2bc1f

rtm: logging fixes

e9819dc

rtm: remoe debug log

f759648

rtm-server: convert initial state on attempt to data

2943ca8

rtm: properly map adaptor versions for the linker

706ed05

rtm: fix complete event

d3ca452

rtm: skip repo validation in unit tests

c76a579

rtm-server: update attempt data structure

0441027

rtm: another complete event fix

462e267

tweak dependencies

3372e19

rtm: update test

5906a86

rtm-server: update test

dda62a6

changesets

485905b

rtm-server: update nodemon

987596e

rtm: update readme

13d7699

josephjclark force-pushed the rtm branch from 51ff284 to 13d7699 Compare July 27, 2023 08:30

josephjclark added 2 commits August 31, 2023 14:23

Merge branch 'main' into rtm

f4e8be7

rtm: make private

588cd0d

josephjclark closed this Sep 28, 2023

josephjclark deleted the rtm branch October 18, 2024 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime Manager Server #193

Runtime Manager Server #193

josephjclark commented Mar 17, 2023 •

edited

Loading

josephjclark commented Jun 9, 2023 •

edited

Loading

josephjclark commented Jun 9, 2023

josephjclark commented Jul 27, 2023 •

edited

Loading

Runtime Manager Server #193

Runtime Manager Server #193

Conversation

josephjclark commented Mar 17, 2023 • edited Loading

Some Terminology

General Architecture

Runtime Manager Changes

RTM Server

Lightning API

Still to do

josephjclark commented Jun 9, 2023 • edited Loading

josephjclark commented Jun 9, 2023

josephjclark commented Jul 27, 2023 • edited Loading

josephjclark commented Mar 17, 2023 •

edited

Loading

josephjclark commented Jun 9, 2023 •

edited

Loading

josephjclark commented Jul 27, 2023 •

edited

Loading