Skip to content

Runtime: support private state keys #1253

@josephjclark

Description

@josephjclark

Users often have difficulty working with large state objects returned by their jobs. It's fairly common to write large objects to state which are used throughout the workflow, but aren't really relevant as outputs from any given step.

For example: a workflow downloads 10kb of mapping data from collections in step 1, and re-uses those mappings in step 3, 5 and 9. State is the only way to share that information. But when viewing the output dataclips from different steps, the user doesn't want to see the mappings on state.

One solution to this is to have private state keys. A private state key is preserved by the runtime but not returned as output from steps - so it's essentially invisible to lightning, the worker and the CLI.

In the runtime, we need to work out a way to strip the private keys when we emit and return step state - but send the unmodified state object into the next step. We need to differentiate internal state from external state.

We could try something fancy like proxy properties - or just iterate over top-level keys while cloning and remove them. I'd prefer something simple.

There are a couple of ways we could handle private state keys:

  • Any key starting with an underctore would be treated private. So if you do state._mappings = {}, that will be redacted from the dataclip but sent downstream internally.
  • Have a special state key called private or something. So state.private.mappings = {}. Tidy but a little lumpy at the same time
  • Use special compiler syntax, like state.#mappings = {}, which sort of reflects private class properties. Would rather keep the compiler out of it tbh
  • Use an adaptor function like markPrivate(key) to flag a key as private. This might do something complicated with proxies.

We should inform the user somehow that private keys are being hidden. We could debug log at the end of each step, or maybe just return { _mappings: "[private]" } . Not sure yet - but we should make it clear to users that a key is on state but being hidden from them.

You could also use these internal/private state keys for sensitive data. And adaptors could use private keys to track state but hide it from the user (we've largely killed off that pattern but this would give us the opportunity to restore it, should the need arise).

This probably isn't suitable for configuration as that should never been sent to the next step. References would be a good candidate though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    DevX Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions