Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime Manager Server #193

Closed
wants to merge 59 commits into from
Closed

Runtime Manager Server #193

wants to merge 59 commits into from

Conversation

josephjclark
Copy link
Collaborator

@josephjclark josephjclark commented Mar 17, 2023

This PR contains:

  • A new Runtime Manager Server (a HTTP interface between Lightning and the Runtime Manager)
  • An updated Runtime Manager with Workflow support, using workerpool
  • A mock Lightning Server with an attempts REST API.

Some Terminology

The various components here all have different ideas of what a "workflow" actually means, so unfortunately we need to be a bit careful with language,

  • A Workflow is an abstract definition of a series of jobs, stored in the Lightning database. It has triggers, jobs and edges.
  • An attempt is a JSON serialisation of a Lightning Workflow, representing a single run of that Workflow. It has triggers, jobs and edges, as well as a result (which may well include stats like start time, duration and so on).
  • An Execution Plan is a JSON representation of a "workflow", represented as a graph of jobs with edges.

The terms Workflow and Execution Plan are often used interchangably. The CLI and runtime technically support Execution Plans, although the word "workflow" is often used informally because it makes sense to users.

General Architecture

This PR assumes (and starts building) a particular architecture:

  • Lightning users create Workflows by linking jobs and triggers together with edges.
  • When a workflow is triggered, Lightning will generate an Attempt (with a unique id) and add it to a queue
  • Lightning exposes an attempts/next API, which on POST will remove the top n attempts from the queue an send them to the caller.
  • The Runtime Manager Server will poll the attempts/next endpoint, greedily fetching as many Attempts as its configuration allows.
  • The Runtime Manager Server will convert an Attempt into an Execution Plan, and call execute on the Runtime Manager instance available to it.
  • The Runtime Manager will execute a plan in a worker thread, broadcasting events (start, complete, log)
  • The Runtime Manager Server will post results and log information back to Lightning as appropriate via HTTP.

Runtime Manager Changes

The previous iteration of the Runtime Manager was just a proof of concept for using threads.

Frankly there's still a lot of stuff we need to prove in terms of using threads, and we may well want to setup a different engine at a later date. That's a job for later.

For now, the Runtime Manager has been updated to accept an execution plan, which it will duly run in a worker thread, publishing events as it goes.

To facilitate development, the runtime manager can accept a mock runtime worker function. This does not use @openfn/runtime, and instead will eval the first expression in the plan and return the result. This mock worker is only intended to be used at dev time, and will not execute real jobs properly. I consider this safe because the mock worker will never be used in production. Meanwhile, the ability to run real code allows us to unit test behaviours like logging, basic state management, and error handling. It allows us to "black box" runtime execution and test the behaviours around it.

I've not really implemented this yet, but the RTM will accept hook functions which allow state and config to be lazily resolved from a string id. The RTM server will provide those hooks and call out to Lightning.

RTM Server

The RTM Server's job is to create a HTTP wrapper around the runtime manager.

The RTM server is the communication bridge between Lightning and an RTM. It's fairly thin, and is responsible for:

  • Calling out to Lightning to ask for work
  • Executing Attempts
  • Pushing events and logs to Lightning
  • Posting the final result to Lightning

The RTM server, for now, exposes a http endpoint that allows a workflow to be posted directly. This is essential for local and unit testing - it means we can test its behaviour standalone without the need of a mock/real Lightning instance.

For now, the RTM server will post all logs directly to Lightning. Later, we'll add some kind of batch/debounce behaviour (maybe even a web socket).

No auth strategy has been implemented at the moment.

The RTM and RTM server are designed so far to have zero persistence.

You can start a dev server by running pnpm start:watch from packages/rtm-server. This will recompile the server when the server OR RTM are changed. You can post an attempt directly to this server via curl:

curl -X POST http://localhost:2222/workflow -d @tmp/attempt.json -H "Content-Type: application/json"

If you've also got the Lightning server running, you can interface to it with:

pnpm start:watch -l mock

This will call out to the default mock server. Pass-l http://localhost:1234 or whatever if the mock is on a different point (or you want to redirect to a different server)

Lightning API

There's a first pass of a Lightning API in packages/rtm-server/src/mock/lightning.ts. It defines a bunch of routes, and implements mock functionality for some of them.

Start up the lightning server from rtm-server with:

pnpm start:lightning

The key API endpoints are:

POST api/1/attempts/next
GET api/1/credential/:id
POST api/1/attempts/log/:id
POST api/1/attempt/complete/:id

Take a look at src/mock/lighting/api for a nice declaration of the API. Implementations can be found in ./middleware'.

Note that attempts/next is a POST because it changes the server's state.

Attempts should return one or more Attempt objects, with a complete execution plan inside (including state and jobs at the moment). Later we might break this up - but it's quite a complicated problem on the Lightning side. Easier just to dump all the information we need into the queue table.

Credentials should never be stored or logged in the RTM server.

The log endpoint accepts an array of log items.

Note that because the RTM and RTM server have no persistence, Lightning should maintain its own timeout tracking on all Attempts. Once an Attempt has been taking from the queue, after a timeout has elapsed, Lightning should restore the Attempt back to the queue after a timeout, under the assumption that the RTM which claimed the attempt is unavailable.

Results from expired/timedout/completed events should be rejected by the Lightning server.

Still to do

Well everything really, but here are some major work items to give a sense of the shape of it

  • Finish up the mock lightning server (we're at a good starting point now)
  • Finish up the mock RTM Server (also a good starting point now)
  • Add integration tests to show the two APIs talking to one another (and doing nothing)
  • Work out how to handle runtime logs
  • Make the existing runtime manager support Workflows (not just jobs)
  • Plug the existing RTM into the RTM server to actually process jobs
  • Write proper integration tests RTM server and live RTM (but still using mock lightning)
  • Adaptor versions aren't properly fed through to the runtime RTM: linker is not be loading the correct adaptor version #266
  • No way to pass initial state through RTM Server: No way to pass initial state yet #267

@josephjclark josephjclark marked this pull request as draft March 17, 2023 17:19
@josephjclark josephjclark changed the title Runtime Manager Runtime Manager & Workflow Support Apr 5, 2023
@josephjclark josephjclark changed the title Runtime Manager & Workflow Support Runtime Manager Server & Workflow Support Apr 5, 2023
@josephjclark josephjclark self-assigned this Apr 5, 2023
@josephjclark josephjclark changed the title Runtime Manager Server & Workflow Support Runtime Manager Server Apr 5, 2023
@josephjclark
Copy link
Collaborator Author

josephjclark commented Jun 9, 2023

Just to prove things sort of work at the moment:

Here is me posting a very simple attempt to a temporary lightning dev API:

image

Here's the log from the Lightning POV: you can see the dev requrst be accepted, the polling from the RTM server, and if you look carefully you can see the complete callback when the workflow is done:

image

And here is the log from the runtime manager server. You can see it pull the work from Lightning, process it (through a worker thread), and return the result. This will work with autoinstall btw.

image

@josephjclark josephjclark marked this pull request as ready for review June 9, 2023 15:49
@josephjclark
Copy link
Collaborator Author

This PR includes #262 (which may explain a couple of things in the diff)

@josephjclark josephjclark removed their assignment Jun 9, 2023
@josephjclark
Copy link
Collaborator Author

josephjclark commented Jul 27, 2023

I've just rebased and pushed this. And the diff is legit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants