Small self-contained ideas
Implement remainder of choice-rule predicates. Currently only string predicates are handled, because we access the
sattribute of the second argument.
Implement remaining predicate combinator,
Allow list indexing in main function. Currently only dictionary lookup works, but should be easy enough to also allow things like
Could handle bare
returnby converting to
return Noneand thence to a
More-helpful exceptions if
RetrySpecIRgiven a node of the wrong form; test for these situations.
More-helpful exceptions if
Catchergiven a node of the wrong form; test for these situations.
TryIR, check body is single assignment. If not, could maybe convert to
Parallelwith just one strand, then extract single result?
Notice and collapse
elsechains into one
Check that local definitions used for
Parallelstates have no args.
Check for unused or undefined branches of a
Parallelstate to have
Catchclauses. In Python, the latter is 'allow
Proper nested scopes for local variables of
Provide defaults for retry-spec tuples, in line with JSON.
Allow keyword arguments in a
else. Will be mildly fiddly because our concept of connecting up the 'next state' can't currently reach inside the
Choicestate to set its
Defaultfield. Could possibly replace
exit_stateswith a collection of closures which know how to set the correct field of the correct object? Alternatively, always create an
elsebranch at the State Machine level, consisting of a single no-op
Validate final state machine; e.g., there should be no unexpected states with un-filled 'next state' slots.
Better and more thorough error-handling throughout, including more-helpful error messages when requirements are not met.
Tools to automatically deploy Step Function and Lambda.
Detection of tests like in
if x == 'JPEG', and conversion into equivalent use of
Avoid having to ferry entire state back/forth to the Lambda machinery when only the function args and its return value actually need to be communicated.
Waitstate. Could be as simple as noticing a magic function
PSF.Wait(...). Or could translate Python
Special entry state to extract fields from input corresponding to function parameter names, and create an initial
Allow use of functions called only for side-effect. (I.e., just
x = bar(y).)
requirements.txtetc. for these tools.
Higher-level research avenues
Automatic deduction, based on data-flow, of which operations are
independent and could be gathered into a
Parallel state. Some care
needed because there might be hidden dependencies: One function
invocation might have some side-effect that the next computation
relies on, without this being explicit in the data-flow through
variables. E.g., in the snippet
c = foo(a) b = bar(a)
it seems that
bar(a) can proceed independently, in
parallel, but perhaps
bar(a) relies on some global state which
foo(a) establishes, like a change to a shared database.
Directly interpret Python
The state-machine runtime could effectively perform the compilation work itself, directly understanding Python.
Currently everything has to be a JSON-friendly object (number, string, list/array, dictionary). Passing objects more complex than this around between invocations of Lambda functions would need some thought. For some situations it might be enough to add type annotations to the various functions and insert serialisation and deserialisation code into the 'wrapper'. For big objects some level of indirection to/from an external object store (e.g., S3 buckets) might be required.
Extract in-line computation from top-level function
If some part of the top-level computation is a short sequence of statements, it would be convenient for the translation tool to do the equivalent of an 'extract method' refactoring, make the extracted code accessible via the Lambda, and replace it with a call.
It is entirely possible to have loops in a Step Function state machine. On the Python level, it would be convenient to be able to use the familiar
for x in xs: do_stuff(x)
syntax. Creation of a state-machine graph containing a cycle would be fairly straightforward, but we would also require a JSON-friendly iterator. Initially, loops could be restricted just to iterating over ists or dicts.
It is likely that any non-trivial application of these ideas would
involve the use of external libraries. It would be convenient if the
tool could automatically detect dependencies, and bundle them up into
the zip-file used to create the Lambda. A solution requiring
something along the lines of a
requirements.txt might be a
reasonable halfway house. Could be combined with the 'deployment
tools' thought, looking at things like Zappa
And most likely GitHub in particular.
Conversion of unit tests
The original Python code presumably has unit tests. It would be convenient if these could be converted into tests which performed invocations of the Step Function.
Optimise resulting state machine
AWS bills Step Function invocations according to how many transitions the state machine makes. Could the structure be optimised so as to minimise the expected number of transitions, under appropriate assumptions for what the input is likely to look like?
The following questions are not strictly within the scope of a tool which translates Python code to a Step Function, but arose while doing the work:
Reconsider the current design whereby the branches of a
self-contained state machines. It seems like it would be possible to
have each branch consist just of an entry-point state-name, within the
same top-level collection of states. This could simplify the task of
translating programming languages to state machines, and (as a small
side-benefit) allow re-use of states between top-level execution and
Automatically learn appropriate retry specifications
Could the programmer be freed from having to specify appropriate retry specifications? If all Lambda invocations were pure (or, more weakly, 'idempotent' might be enough), then the runtime could gather statistics on each Lambda's reliability, how failures tend to be clustered temporally, etc., and deduce suitable retry specs.
Similar tooling for Apache Airflow
Airflow allows the construction of workflow DAGs in code. There might be scope to write a compiler, similar to the one in this repo, to extract the data-flow from Python code and convert it into an Airflow DAG.
This document: Copyright 2018 Ben North; licensed under CC BY-SA 4.0
See the file
COPYING for full licensing details.