-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(*) Lua bridge refactor and dispatch fixes #546
Conversation
d3dcbf8
to
195461e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #546 +/- ##
===================================================
+ Coverage 90.05848% 90.36951% +0.31103%
===================================================
Files 47 47
Lines 10089 10311 +222
===================================================
+ Hits 9086 9318 +232
+ Misses 1003 993 -10
Flags with carried forward coverage won't be shown. Click here to find out more. |
6563636
to
86ab07e
Compare
c592745
to
f8d284b
Compare
@thibaultcha I've just tested it and it's still crashing. Running it enough times I do get segfaults, but every run is reporting memory access errors consistently. These are the logs from a Gateway run with Valgrind. This particular run didn't crash, but Valgrind reported reads on data that was freed by This was running a datakit configuration with 2 parallel calls that fail (I didn't spin up the other localhost upstream). |
@hishamhm What is the specific reproducible example for this? It does not trigger in the test given with this PR, and seems to trigger when
But it does not trigger any memory issues. |
@thibaultcha Sure, let me try to get the same filter crash without the gateway. |
Took me a while but I got a consistent repro running without the gateway: This branch/commit was intended only for sharing the test case: I think we can eventually isolate the offending proxy-wasm calls and produce a hostcalls-based testcase, but to go one step at a time I wanted to get the error reproduced without the Gateway on Datakit first. Looking at the logs, I think the triggering condition is to dispatch a call, then in the same handler (in this case, on_request_headers), trigger a local response, before the dispatch callback gets a chance to run. |
@hishamhm Thanks for the test! This turned out to be yet another thing! |
3965484
to
98c4572
Compare
@hishamhm Would you give it another try with the current state of this branch? Except for a small failure in dynamic builds I'm still investigating it all runs green, so I hope it fixes everything now. |
64eb2b1
to
98c4572
Compare
308a043
to
587d974
Compare
Ok I fixed the last problem but @flrgh found another problem in the Kong PR which I also am looking at now. |
587d974
to
169282f
Compare
@flrgh Ok, the latest state of this branch should also take care of the Gateway issue. I have the |
169282f
to
75e1017
Compare
I'll give this branch a spin later today! |
8954d73
to
a7bb6a1
Compare
I have found one more bug that I am trying to get rid of before merging this. |
b4c035a
to
da0bb17
Compare
da0bb17
to
0e39182
Compare
Major refactor of the Lua bridge to support multiple concurrent yielding Lua threads. The old implementation would break down when scheduling more than one yielding Lua thread at a time. The new implementation "tricks" OpenResty by scheduling uthreads via C and passing these threads to the OpenResty runloop as if they were created from Lua (via `ngx.thread`). Because all uthreads must resume their "parent thread" when finished (as per OpenResty's implementation), we schedule a stub "entry thread" whenever we are trying to use the Lua bridge. This entry thread itself does nothing and is collected at request pool cleanup. List of significant changes for this refactor: - **Breaking:** the `proxy_wasm.start()` FFI function is **removed**. Only `proxy_wasm.attach()` is now necessary, and the filter chain is only resumed once the ngx_http_wasm_module `rewrite` or `access` phases are entered. Prior, `proxy_wasm.start()` would resume the filter chain during the ngx_http_lua_module phase handlers, which was incompatible with Lua threads yielding. - The `ngx.semaphore` API can be used in the Lua bridge. The default Lua resolver now has synchronization enabled. - In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the request's `env` instead of a copy so as to manipulate the `env->state` control variable. - The `wasm_call` directive can now yield, which allows for sanity testing of the Lua bridge yielding functionality. - A new `rctx->resume_handler` pointer holds the resume entry point back from yielding facilities into `ngx_http_core_run_phases`. For now, only the Lua bridge uses it, but other yielding facilities should be refactored to use it so as to factorize our resuming code. Fix #524
Originally this fixes an issue described in #545, in which a failing dispatch call may cause a pending call to segfault when it is resumed. To fix this issue, this commit updates how dispatch calls are treated and makes the execution continue when they fail, including the Lua bridge resolver (very tricky).
Delete the posted event for dispatches that were sent in the latest Proxy-Wasm step.
The invocation of `coctx->cleanup` previously introduced allows removing this custom handling of pending sleep timers added to the Lua bridge. Kept as a commit for future reference.
0e39182
to
b9bb644
Compare
Major refactor of the Lua bridge to support multiple concurrent yielding Lua threads, and refactor of dispatch calls failures to continue executing the request. Replaces #539, #545, #523.
The new implementation "tricks" OpenResty by scheduling uthreads via C and passing these threads to the OpenResty runloop as if they were created from Lua (via
ngx.thread
). Because all uthreads must resume their "parent thread" when finished (as per OpenResty's implementation), we schedule a stub "entry thread" whenever we are trying to use the Lua bridge. This entry thread itself does nothing and is collected at request pool cleanup.List of significant changes for this refactor:
proxy_wasm.start()
FFI function is removed. Onlyproxy_wasm.attach()
is now necessary, and the filter chain is only resumed once the ngx_http_wasm_modulerewrite
oraccess
phases are entered. Prior,proxy_wasm.start()
would resume the filter chain during the ngx_http_lua_module phase handlers, which was incompatible with Lua threads yielding.ngx.semaphore
API can be used in the Lua bridge. The default Lua resolver now has synchronization enabled.sock->env
member is now a pointer to the request'senv
instead of a copy so as to manipulate theenv->state
control variable.wasm_call
directive can now yield, which allows for sanity testing of the Lua bridge yielding functionality.rctx->resume_handler
pointer holds the resume entry point back from yielding facilities intongx_http_core_run_phases
. For now, only the Lua bridge uses it, but other yielding facilities should be refactored to use it so as to factorize our resuming code.Fix #524
Fix #528