Fix error handling in pure runtimes #130

kokobd · 2023-11-13T02:21:20Z

There were two problems with pure runtimes:

When returning Left from fallibleRuntime, Left appears literally in the response, and the lambda function exit successfully
If the input can't be parsed from json, the error won't be caught properly. The error message will only appear in CloudWatch logs, but not reported back to AWS Lambda. Thus, it's hard for caller to determine what error happened. Fixes withInfalllibleParse crashes runtime on parse failure #108

Screenshot for the two problems, respectively:

Reasons for these problems:

We didn't handle Either in falliableRuntimeWithContext

fallibleRuntimeWithContext :: ToJSON result =>
  (LambdaContext -> Value -> Either String result) -> IO ()
fallibleRuntimeWithContext = mRuntimeWithContext . fmap (fmap pure)

pureRuntimeWithContext :: ToJSON result =>
  (LambdaContext -> Value -> result) -> IO ()
pureRuntimeWithContext = mRuntimeWithContext . fmap (fmap pure)

Either String result wasn't handled, and it was treated as a ToJSON value as a whole.

In withInfallibleParse fn = either error fn . parseEither parseJSON, the Left side of Either was thrown by error in a pure context. When we later call try, the value is still lazy and possibly not evaluated.

This PR addresses both problems.

JackKelly-Bellroy

Looks promising. I don't quite see how this PR addresses the laziness fix you described in point №2, can you please spell that out for me? I would've expected to see some kind of force-ish function call in runtimeLoop.

Can you please also add a changelog entry?

src/AWS/Lambda/Context.hs

src/AWS/Lambda/Runtime/Value.hs

src/AWS/Lambda/Combinators.hs

kokobd · 2023-11-13T14:13:36Z

I don't quite see how this PR addresses the laziness fix you described in point №2. can you please spell that out for me? I would've expected to see some kind of force-ish function call in runtimeLoop.

There was no NFData constraint, so adding that would break backward-compatibility. The key is this line fallibleRuntimeWithContext fn = mRuntimeWithContext (\lc -> either error pure . fn lc). Here the type of error is String -> IO result. So the error is thrown whenever the IO action runs. I really want to create a custom exception type UserError (given that we are reporing the type of error as User to AWS), and use throwIO to replace error everywhere.

Can you please also add a changelog entry?

I'll.

IamfromSpace · 2023-11-22T19:38:51Z

Sorry for a slow response to this one, I was traveling and then came back to quite a bit.

This does look pretty appealing! You're right that it seems quite odd to include Left in the logs literally. And it is better to crash the runtime so that AWS recycles the container, since an error indicates that we may now be in an undefined state (though, this is dramatically less likely for pure runtimes, since IO likely isn't happening, maybe even not one?). I was just espousing the benefits of this approach, so I feel a bit silly for having missed it here.

Before the final go-ahead, I'm curious to discuss if this behavioral change would be noticeable to any possible consumers. If anyone was attempting to parse the error from their logs, their parsers would break. If there was anyone that consistently saw errors in a pure runtime, they might now see a performance hit as their containers get trashed and restarted vs reused. And I suppose, it is maybe worth considering that for pure runtimes, re-use may be preferable, just by virtue of being pure, so the container should still be in a valid state to keep processing new requests. Handling invalid JSON is definitely seems like a net improvement.

If any of these did seem like it could affect consumers, we could consider a breaking change, which could released right away for a case like this.

Curious to hear everyone's perspectives here.

JackKelly-Bellroy · 2023-11-23T05:29:05Z

We've only ever used mRuntimeWithContext in our stuff, and this PR came about because we tried to use pureRuntime and it behaved in surprising ways. Do you have a way to survey other hal-users?

kokobd · 2023-11-24T00:44:16Z

if this behavioral change would be noticeable to any possible consumers

This is a breaking change. The output of the lambda function was wrapped in {"Right": xxx} or {"Left": xxx} before for falliableRuntime, but the extra layer of object is removed in this PR.

they might now see a performance hit as their containers get trashed and restarted vs reused.

If they prefer not reporting errors to AWS, they can use pureRuntime and return Either e a.

JackKelly-Bellroy · 2023-12-15T04:07:20Z

Hi @IamfromSpace, is there anything we can do to unstick this?

IamfromSpace · 2023-12-15T05:30:37Z

Hey, absolutely, sorry I’ve been swamped as of late. But I can dedicate tomorrow to getting this out, and it’s well over due!

My plan is to flip the Aeson flag to false, and release another 1.0 in a way that’s seamless for the current stack LTS, but lets cabal and other users opt in easily. Then we’ll put this on top as a 1.1, with that flag set to true, so we’re in a good spot for the next LTS release.

Let me know if there’s any concerns with this plan, otherwise I’ll stick to it!

JackKelly-Bellroy · 2023-12-15T08:02:53Z

You might not need the flag any more. There's now an empty release of the compat package https://hackage.haskell.org/package/attoparsec-aeson-2.1.0.0 which depends on aeson >=1.4.1.- && <2.2. If that's in the stackage snapshots you might be able to get away with an unconditional dependency.

JackKelly-Bellroy · 2023-12-15T08:03:21Z

And thank you for coming back to look at this, especially so close to the end of the year.

IamfromSpace · 2023-12-15T21:12:23Z

You might not need the flag any more. There's now an empty release of the compat package https://hackage.haskell.org/package/attoparsec-aeson-2.1.0.0 which depends on aeson >=1.4.1.- && <2.2. If that's in the stackage snapshots you might be able to get away with an unconditional dependency.

Yeah, this absolutely works as an alternate approach, and it's what was implemented first. Ultimately, in trying the flag based approach, I just liked the result better than using the compatibility package. It over all seemed like more things "just worked" via this route. Long term, I also think flags may be a good way to handle both back porting changes while allowing trunk-based development. At some point I'll experiment more with this, where the current code base actually serves all current major releases, but flags set the correct interfaces or behaviors across them.

Ultimately, I think flags just seemed like the more flexible and sustainable pattern. Next time something like this happens, there's no need to wait on compatibility package release!

IamfromSpace · 2023-12-16T02:14:51Z

Okay, hey all, I've got the main blocker out of the way, and I've spent a good while making sure I really understand what's going with this change. I still want to tinker a bit more, and I'm going to prioritize getting this closed out over the next couple days.

So far my thinking on this has two angles:

I think Runtime.Value.fallibleRuntime* is just totally broken. It type checks because ToJson a => Either String a is also a valid ToJson instance, where instead of just a being passed into Runitime.Value.mRuntime* it's the whole Either. So not only would { "Left": ... } be appearing in the error, but { "Right": ... } would appear in the success case. It type checked, but just ultimately doesn't make sense. There may be some users out there who went, "hey, that's odd, it's all wrapped in a Right" and then just unwrapped it. So we would want to release a 1.1 for this change. Overall, this fix is a definite improvement. I'm a bit embarrassed to have let this out!
I'm less convinced on how to handle bad parsing. It is something that I've always felt is a bit awkward, but that there's no great way to do it regardless. The Value runtime was introduced to help here, where if parsing could go wrong, you would use that, and the other runtime was meant for cases where it just wasn't possible. I do think it's a bit silly to crash the container over a parser error, and it may make the most sense to introduce a Parse and Execution data type, as mentioned. I'm hesitant to have the different runtimes (within the same module) have different meta-behaviors. If mRuntime crashes on a bad parse, it seems like pureRuntime should too. But likely, neither of them should.

My feeling is that these two things should be separated. Since they both introduce a break, it might be worthwhile to try to release them at the same time though.

tl;dr: I think fallible runtime shouldn't be wrapped in an Either, but that it should crash--unless none of the runtimes crash.

Any additional thoughts or consideration folks want to throw out here?

kokobd · 2023-12-16T13:18:57Z

Re 2:

mRuntime doesn't crash the runtime on parsing in most cases.

I think it's probably because the input value is consumed in an effectful way, thus, try (fn (either error id eCtx) event) is able to catch that error even though it was thrown in a pure function.

tl;dr: I think fallible runtime shouldn't be wrapped in an Either, but that it should crash--unless none of the runtimes crash.

Having the behavior (crashing vs reporting user error) depends on lazy evaluation is a bug IMO. We should probably use withFallibleParse :: FromJSON a => (a -> b) -> Value -> Either String b everywhere we used withInfallibleParse :: FromJSON a => (a -> b) -> Value -> b.

Even if we do want the runtime to crash, we should explicitly do that (by throwing an exception that we don't catch intentially) - instead of relying on lazy evaluation.

kokobd · 2023-12-16T13:22:19Z

My feeling is that these two things should be separated. Since they both introduce a break, it might be worthwhile to try to release them at the same time though.

I'll create another PR.

IamfromSpace · 2023-12-17T01:37:45Z

Okay awesome. I think we can get the fallible changes through right away if you just want to update this one to just do that.

IamfromSpace · 2023-12-17T19:58:47Z

mRuntime doesn't crash the runtime on parsing in most cases.

Okay, I think I finally get it; sorry, I feel like I've been a bit dense on this one! I was mixing up errors, crashing the runtime, and container discarding. If an error is reported without exit, the process is reused. If the process exits, the process is restarted. In both cases the container is reused, and there's simply no way to force it to be discarded (to prevent use of an undefined or corrupted environment).

Personally, I think safer options (discarding the container and/or restarting the process) are better, but that's simply not how lambda containers work. They prioritize speed, and know that most applications simply won't corrupt the environment--and if you make your lambdas stateful, then you just have to be careful.

So, I totally agree then, fallible and pure runtimes should not crash the runtime environment--which is consistent with the other runtimes that parse. They don't crash, they simply report the error and keep on running. Which, for parsing, is definitely preferred anyway, we know the environment is fine to re-use.

Also agreed that we want to eliminate the possibility of laziness here for the parsing error all together. Good catch on how that interplay works.

I'm going to play with this PR as is, but I'm leaning towards now it just being the right approach. Sorry for the back and forth, I just really want to make sure I've understood the complete picture!

src/AWS/Lambda/Runtime.hs

CHANGELOG.md

IamfromSpace · 2023-12-18T17:41:41Z

Appreciate the patience on this one. At this point, once the merge conflict is resolved, this one is good to go for me.

kokobd · 2023-12-18T23:44:52Z

At this point, once the merge conflict is resolved, this one is good to go for me.

Thanks, if there is no need to split this into two PRs.

~~I'll test it in our private repo and if it works, we can merge this.~~ The latest commit still works.

IamfromSpace · 2023-12-19T00:48:56Z

Thanks, if there is no need to split this into two PRs.

Yeah, no need. I originally just wanted to make sure that we were getting resolved stuff through while giving more time for discussion to open topics. But since these are ready and reasonably related, I think we're better off just going for it.

IamfromSpace · 2023-12-20T03:51:19Z

This went out yesterday as 1.1 🎉 Thanks much for this, this is a definite improvement.

I’d also like to explore longer term approaches for wringing out laziness so that these kinds of things can’t manifest in user code either. We don’t want things to be lazy across executions. I’ll open an issue on the topic.

kokobd requested a review from IamfromSpace as a code owner November 13, 2023 02:21

fix error handling in pure runtimes

f13ea2a

kokobd force-pushed the fix-pure-error branch from 5ab8844 to f13ea2a Compare November 13, 2023 02:22

JackKelly-Bellroy reviewed Nov 13, 2023

View reviewed changes

rename withInfallibleParseEither

15b1e70

kokobd added 2 commits November 23, 2023 20:21

revert unnecessary changes

8ac1e24

add changelog entry

2d57afa

kokobd force-pushed the fix-pure-error branch from 8ed6a17 to 2d57afa Compare November 24, 2023 01:26

IamfromSpace reviewed Dec 18, 2023

View reviewed changes

src/AWS/Lambda/Runtime.hs Outdated Show resolved Hide resolved

IamfromSpace reviewed Dec 18, 2023

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

kokobd added 4 commits December 18, 2023 18:35

simplify pureRuntimeWithContext impl

4a03ab1

update changelog

7ec0fa8

Merge remote-tracking branch 'origin/master' into fix-pure-error

9aad4cb

update changelog

8029c0e

kokobd force-pushed the fix-pure-error branch from 3555877 to 8029c0e Compare December 18, 2023 23:38

IamfromSpace merged commit 8ec1a2b into Nike-Inc:master Dec 19, 2023
16 checks passed

IamfromSpace mentioned this pull request Dec 20, 2023

Eliminate Possible Laziness Between Executions #136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error handling in pure runtimes #130

Fix error handling in pure runtimes #130

kokobd commented Nov 13, 2023 •

edited

JackKelly-Bellroy left a comment

kokobd commented Nov 13, 2023

IamfromSpace commented Nov 22, 2023

JackKelly-Bellroy commented Nov 23, 2023

kokobd commented Nov 24, 2023

JackKelly-Bellroy commented Dec 15, 2023

IamfromSpace commented Dec 15, 2023

JackKelly-Bellroy commented Dec 15, 2023

JackKelly-Bellroy commented Dec 15, 2023

IamfromSpace commented Dec 15, 2023

IamfromSpace commented Dec 16, 2023

kokobd commented Dec 16, 2023 •

edited

kokobd commented Dec 16, 2023

IamfromSpace commented Dec 17, 2023 •

edited

IamfromSpace commented Dec 17, 2023

IamfromSpace commented Dec 18, 2023

kokobd commented Dec 18, 2023 •

edited

IamfromSpace commented Dec 19, 2023

IamfromSpace commented Dec 20, 2023

Fix error handling in pure runtimes #130

Fix error handling in pure runtimes #130

Conversation

kokobd commented Nov 13, 2023 • edited

JackKelly-Bellroy left a comment

Choose a reason for hiding this comment

kokobd commented Nov 13, 2023

IamfromSpace commented Nov 22, 2023

JackKelly-Bellroy commented Nov 23, 2023

kokobd commented Nov 24, 2023

JackKelly-Bellroy commented Dec 15, 2023

IamfromSpace commented Dec 15, 2023

JackKelly-Bellroy commented Dec 15, 2023

JackKelly-Bellroy commented Dec 15, 2023

IamfromSpace commented Dec 15, 2023

IamfromSpace commented Dec 16, 2023

kokobd commented Dec 16, 2023 • edited

kokobd commented Dec 16, 2023

IamfromSpace commented Dec 17, 2023 • edited

IamfromSpace commented Dec 17, 2023

IamfromSpace commented Dec 18, 2023

kokobd commented Dec 18, 2023 • edited

IamfromSpace commented Dec 19, 2023

IamfromSpace commented Dec 20, 2023

kokobd commented Nov 13, 2023 •

edited

kokobd commented Dec 16, 2023 •

edited

IamfromSpace commented Dec 17, 2023 •

edited

kokobd commented Dec 18, 2023 •

edited