Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime and Compiler v2 #11

Merged
merged 263 commits into from
Nov 9, 2022
Merged

Runtime and Compiler v2 #11

merged 263 commits into from
Nov 9, 2022

Conversation

josephjclark
Copy link
Collaborator

@josephjclark josephjclark commented Aug 24, 2022

The next generation runtime engine.

To make it so:

npm install -g @openfn/cli
openfn --help
openfn --test

Merge Checklist

What do we need to do before merging this to main?

  • Update Lightning to use the new describe-package (instead of compiler)
  • Check the workflow-diagram tests and un-skip (I need help with this!)
  • Port describe-package tests over to ava ? Nah
  • Tests passing
  • Update the top level readme
  • Work out what to do with the top level package.json and examples folder
  • Spin out unfinished items below into new issues (anything spun out has been struckthrough here)

Likely blockers with real-world jobs

  • Immutable state (we should make state mutable for now, or at least hide immutability behind a flag)

Runtime

Create a new runtime which accepts a job as a pipeline of functions (as a string representation of an esm module which exports an array of functions), and runs them through the execute reducer.

import { readFile } from 'node:fs/promises';
import run from '@openfn/runtime';

const job = await readFile('expression.js', 'utf8'); // compiled expression .ejs
const initialState = {};
const { data } = await run(source, initialState);

The job is loaded into a sandbox environment using experimental node:vm module loaders. This is more to load the source properly and control the execution environment, it's not a security thing. Seems to work great.

The runtime manager may want to run the whole runtime in a vm2 context for added security.

  • Load the job ESM code as an actual executable module (and run it)
  • Allow environment globals like console to be overridden inside the job
  • Don't import execute, use a native implementation in this package (even if it's identical)
  • Automated Logging and log level controls
  • run should return a promise which is also an event emitter
  • Emit events as operations start and return (current status: running operation 4: fetch from salesforce)
  • State is mutable by default, but can be made immutable if a flag is passed

Compiler

The new compiler will be quite different from the old because it won't need to do stuff like inject the execute reducer or ensure there's an environment for language adaptors. The runtime handles all that stuff now.

I am still using recast, acorn and ast-types to do the heavy lifting (although tbh I'm not sure why we're favouring acorn over esprima!)

  • Set up some patterns for visitors and unit tests
  • Create a transformer to ensure a valid export default []
  • Create a transformer to move top-level call expressions into the default exports
  • Get the top-level compile function working and tested (with examples)
  • Create a validator to catch naughties in code (like errant import/export statements)
  • Provide a validation API with error reporting (closely related to the validator I hope!)
  • Compile from typescript (!)
  • Add wrappers for global state - State globals in the new runtime #17
  • Add imports for a given adaptor

Devtools / CLI

Probably installed as a global to work on adaptors, or run straight out of kit when working on the runtime and compiler

  • Write a new devtools CLI which loads a jobfile and executes it with the runtime
  • Write output to disk
  • Use strong conventions to run from a single folder, but provide overrides
  • Ensure the runtime module loader knows where to find modules (OPENFN_MODULE_HOME)
  • Allow individual module (adaptor) paths to be passed in
  • Run in a nested process so that consumers don't have to pass in experimental flags
  • create a bin stub
  • publish to npm as a pre-built package (Stu did this)
  • Add compile only task (not really happy about how this is set up tbh)
  • Use peer dependencies rather than building stuff in

Runtime Manager

The runtime manager /service provides an API to run jobs in worker threads.

There's an API to support and report on thread management, and there's a little web server which stays alive and receives post requests to run jobs. The server#'s just a demo client of the manager API really (but it'll be a fun toy to build out!).

  • Use a worker pool to execute jobs in thread
  • Accept jobs as strings and compile them (with a cache)
  • Track which threads are running which jobs, plus some usage stats
  • Work out a way to unit test the thread pool
    (remaining steps moved to New runtime manager service #52)

Ava Notes

I've disabled type checking in ava's unit tests because:

  1. It slows them down
  2. There's absolutely no reason to type check here
  3. If ava does encounter a type error anywhere in its test or source compilation, it'll hang and timeout without reporting anything (I have lost hours to this).
  4. I for one am happy to relax typing rules while developing against tests (it's often useful to have unreferenced variables and imports, for example!)

Type errors will still get caught by the build process in CI. We could consider adding a test:tsc script as well if we want seperate CLI/local test runner for typings. At the moment I have no appetite for this.

Also worth nothing that ava works nicely in monorepos - just drop a config in the top level and ava will find it from down in the packages. Now it's easy for all packages to share a standard test definition. Neat!

Naming Jobs

When running a job, we ideally want to give it a useful name. Probably found in the preceeding comments. But how will the runtime know the name?

I think we need like an Operation function, which explicitly declares an operation with a name and a handler.

export default [
  Operation("do the thing", fn(() => { ... })
]

Alternatively we could ad arguments to every adaptor function to do this stuff.

Breaking changes

the state global

I want to remove the state global. The currnet core.execute function loads state into the 'global' context. This doesn't work very well with my new vision of "immutable" state, where we clone the state object and pass it into each operation.

I would like to:

  • Remove global state from the runtime environment
  • In the compiler, wrap every operation in a (state) => x wrapper. Maybe only if a flag is provided, maybe only if we detect a global state reference inside that operation.

This means old code can be made to work if it's compiled properly, while new code can be encouraged to adopt better practice.

General TODOs

  • Think a bit more about source maps. Runtime needs to be able report errors against source map positions. Source maps could be inline, disk space isn't a priority.
  • Create a logger service (TODO create an issue to track this)
  • Module resolution #19

@josephjclark josephjclark marked this pull request as draft August 24, 2022 15:41
Not really happy about this but at the moment it's needed for unit tests. vm.SourceTextModule doesn't seem to be available from inside the ava worker
We can use this in unit tests. Instead of calling out to the actual runtime (which throws errors reading vm.SourceTextModule, something  complicated with the --experimental-vm-modules flag not getting passed to the ava thread), we create a worker which calls our simple mock function. All the worker lifecycle stuff is abstracted into a helper function which is used equally by the actual and mock workers - which gives us a really realistic mokc simulation.
@josephjclark

This comment was marked as resolved.

@stuartc
Copy link
Member

stuartc commented Sep 9, 2022

@josephjclark I just went through this PR, this is really great - it's feeling like a very solid foundation.

The CLI appears to expect a folder? Or a file called job.js, for a first pass would it not be easier/clearer to expect the exact file instead of a path that contains files with a naming pattern?

TODO what if stdout and output path are set?

I think in this case, then console.log and other things are redirected to STDERR. And the resulting state is sent to STDOUT and a file. It's a good question because what I just proposed is a pretty niche interface.

Perhaps we make the option mutually exclusive for now, you pick just one.
But this does lead me to think we might be better off logging to STDERR if you have --stdout enabled.

Security thoughts: the process inherits the node command arguments
(it has to for experimental modules to work)
Is this a concern? If secrets are passed in they could be visible
The sandbox should help

I'm wasn't immediately concerned about this, like I couldn't think of what might be passed in. I think what is super important is code accessing process, including importing it from node:process, we're far more concerned with someone accessing process.env or adding a callback to process.on.

We will need to test how the sandboxing can deal with this. I think that's the only handle we really have, I don't know how we could 'knee-cap' the worker without making life very hard for everyone - we should focus on the sandboxing and test that.

As an aside, it wouldn't be unreasonable to pass the process a 'cookie' (Erlang terminology) or key/secret used to trust other nodes in the cluster. Basically I do think there could/would be a requirement to have workers carry some kind of temporary and constant trust mechanism for communicating with the backend cluster.

You'll have to walk me through that module resolution issue you outlined above, I'd like to understand it better.

@josephjclark
Copy link
Collaborator Author

So I've been through and updated everything that I think needs doing. The examples now work, the readme is updated.

We do need to update Lightning to use the new describe-package, as after merging we can't update the old package (at least not easily). But it should be OK to merge before we do this. Perhaps it needs an issue over on the Lightning repo.

@stuartc I think you need to give this a quick once-over - check the top level readme and package json, plus check my previous comment re describe-package tests. It's probably fine and we could merge and fix later.

Subject to that, I think we can merge...

@josephjclark josephjclark marked this pull request as ready for review November 8, 2022 16:54
@stuartc stuartc merged commit 6e52d10 into main Nov 9, 2022
@stuartc stuartc deleted the v2 branch November 9, 2022 10:02
@stuartc stuartc removed their request for review November 9, 2022 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants