-
-
Notifications
You must be signed in to change notification settings - Fork 93
Blazor worker "memory access out of bounds" bug #4059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
I've now gotten the analysis request payload that causes the issue. FWIW this request seems to include a lot of stuff that it doesn't need to perform the analysis. (though I haven't verified this) I'm seeing three paths forward:
Currently experimenting to see what the char threshold of failure is. |
One more path forward: if the request is huge, ask the backend instead of wasm. |
Note: I can replicate this in Release AOT builds, but not Debug builds. |
Seems I can't - i.e. the bug seems AOT-specific. |
Do you know what function this is happening in? Perhaps we're able to separate reading the json from doing the analysis in some way. Maybe we parse the json in JS code and then call wasm using |
Maybe I'm answering the wrong question, but: once we've defined calling
I don't know what you mean by this. Could you please spell it out? |
Spelt out below but I think the fact that it happens before OnMessage might scupper what I was suggesting. Currently we do:
Could we instead do:
|
That's a good idea. |
TIL "scupper." Yeah I don't think that'll work (because we don't get to OnMessage) So, seems we have two options:
I'm currently trying to see if anything inhttps://github.com/Tewr/BlazorFileReader/blob/main/src/Blazor.FileReader/Tewr.Blazor.FileReader.md could help us out. Or maybe there's something official. It's hard to not be distracted by all of the view/Razor-specific stuff in this Blazor world. |
Yeah, either of those sound good. I like the ArrayBuffer/file approach but does that also have size limits? Are we sure this is the problem though? 76k isn't small but it's also not huge. The Stackoverflow thing says the limit is over 50MB now. |
I'm now a bit less confident. I added this to Wasm.fs:
and posted the giant json to that handler rather than the OnMessage one. It works? Digging in.. |
I was wrong before! We are getting to within OnMessage. We're even deserializing OK. The failure seems to be happening within the actual analysis. Added some logs; waiting for another AOT build - seems we're closer. |
Even more interesting: with my code like this:
I got "performed analysis" and it dies right after, without ever emitting |
Well that's optimistic at least! |
So, I got it to sort-of work here's some code (with minor simplifications):
Right now, it successfully runs the full deserialization, analysis, serialization of result, and post-back, and then fails with the memory exception (after js gets the response) I think the key thing here is that the |
Not sure the exact reason this code gets further, but definitely feels relates to either GC and/or tasks |
Curiously, if I
Seems to indicate that this is a matter of GC-ing Some notes about this test:
May be totally irrelevant to the serialization, fwiw. Maybe we're not GCing something correctly in our code? |
I'd like that solution, but some bad news: If I update So, seems that the issue is GC-ing (or something) the un-serialized result, and not with the later stuff. For full clarity, this fails:
Edit: let me make sure these results are right. Don't believe this quite yet. |
Given ^, thinking how to proceed... we can get the analysis results, but upon exiting the task (due to GC, or something else), we get the error. And this is unrelated to serializing those results or posting them back to JS - we get the error when those actions aren't present. So, the strat from #4008 seems irrelevant here, for now. |
Note: if I...
We don't reach "manually gc'd" I'm going to look at various things like (this isn't something I've played with before - just an idea. I'm in foreign .NET stuff) |
I queried various methods on
This is comparing "cold start" (first analysis request after page load) to "cold start". Edit: added some commas to make it easier to read/compare. I also queried a few other things on GC., but found them to be useless. I haven't actually studied these numbers yet - just collected. Not sure if they're even useful. |
quick thought: maybe we're GCing the analysis engine or something, while we could keep it running eternally instead? Not sure if that makes any sense :) |
I wonder if I can replicate this in Debug mode, with a manual Edit: nope, it didn't fail. Still not sure if GC is relevant. |
I'm going to try experimenting with "how much of the analysis engine is being thrown away each time, and can we reduce that?" For example,
is currently run/loaded (and then thrown away) every request, but doesn't have to be. Edit: hits a dead end fast. |
A fix has been found (will link PR shortly). |
Unfortunately, while #4070 got past that error, another related one occurs shortly after:
|
I believe STJ has a default recusion depth, maybe 250. You can change it. |
We already have |
in case it doesn't respect MaxValue, I'll set it to some reasonably large values smaller than such. |
I set the STJ MaxDepth to 1024, and it didn't change anything. Now failing with only a 6740-character payload. Much less than our original payload. Almost there... I've just now made my feedback loop much tighter, so the rest of the progress should be faster. (edit: maybe. now I'm manually manipulating ASTs so it's tricky :) ) |
Here's the current state of a payload that deserializes OK in Debug or non-AOT Release builds, but not AOT builds (in which, it fails with This started off as user data 😅 Really seems related to depth, but haven't quite figured out how. |
When I see this AST, I interpret it as:
and then I'm not sure how to read it. Certainly there's at least one pipe involved, but I'm not sure how to grok the fact that there are ... 3 pipes but only one target? I realize that the AST looks a bit absurd as-is, but: Debug mode is OK deserializing it, and AOT build isn't. I'm trying to figure out why that is. @pbiggar any ideas? |
to be clear, if I remove any |
I'm not sure if this is purely a matter of depth (I'll check by replacing a few layers with more Edit: replacing them with many lets serializes OK |
I'm thinking of what could cause memory errors in this code. Is there an assertion hit by weird pipes perhaps? Or a array index? |
What about the pipes? Does removing them work? |
Aha! I can remove one of them and still get the error, but if I remove 2 (leaving only 1), no error! (verifying just to be sure I didn't mess up AST) Edit: yes, I think that's it. |
Still baffling, but I think we can create a producible build pretty easily now, anyway. |
....why do we need 16 |
putting a |
Wait, but the problem is just in deserialization - we're not running the interpreter at all. Unless the interpreter is somehow called during deserialization. The OnMessage at this point is just:
|
ah. JsonFSharpConverter? |
Good hunch - no error if the serializer doesn't register that converter. Something to follow up on tomorrow. |
I vaguely recall some flag existing where certain packages could be excluded from AOT compilation. Something to look into. |
also, a rebuild of fsharp.systemtextjson on a newer .net version may help https://github.com/Tarmil/FSharp.SystemTextJson/blob/4e4560619b1902eda3749432e8f8447c0c8ed9a1/global.json#L3 |
Maybe report the issue to FSTJ to see if they've any idea. They might be motivated to help. |
Will do, once I have a minimal repro. |
Maybe the easiest option: ship a Debug compilation or a non-AOT Release compilation of FSharp.SystemTextJson rather than the AOT Release one, if that truly is the problem. (if I can't find the flag I recall around excluding some packages from AOT) |
seems to happen only with very large requests.
The text was updated successfully, but these errors were encountered: