This is related to #2943. Not a fix, but a better and cheaper intermediate solution.
Sometimes when something goes wrong in a run, key events like step:start and step:complete (I think literally just those two? Or anything but a log?) will fail.
As soon a step:complete fails, right now, the run will become Lost, because all subsequent events will be rejected by lighting.
If we forget the complexity of idempotency, of retries, of strictness, why a particular event was rejected: the wider story is that as soon as a step:complete is rejected, Lightning must mark the run crashed (or killed?) rather than lost.
Because Lost means "we don't know what happened to this run". But that's inaccurate - we know exactly what happened to the Run - Lightning rejected its events.
This is related to #2943. Not a fix, but a better and cheaper intermediate solution.
Sometimes when something goes wrong in a run, key events like
step:startandstep:complete(I think literally just those two? Or anything but a log?) will fail.As soon a
step:completefails, right now, the run will become Lost, because all subsequent events will be rejected by lighting.If we forget the complexity of idempotency, of retries, of strictness, why a particular event was rejected: the wider story is that as soon as a
step:completeis rejected, Lightning must mark the runcrashed(orkilled?) rather thanlost.Because Lost means "we don't know what happened to this run". But that's inaccurate - we know exactly what happened to the Run - Lightning rejected its events.