scheduler: avoid redundant scheduler wake on process handoff#2329
Draft
pguyot wants to merge 8 commits into
Draft
scheduler: avoid redundant scheduler wake on process handoff#2329pguyot wants to merge 8 commits into
pguyot wants to merge 8 commits into
Conversation
This comment has been minimized.
This comment has been minimized.
ced6470 to
f2718d3
Compare
This comment has been minimized.
This comment has been minimized.
1d6b7b2 to
89566e2
Compare
This is especially useful with valgrind and on CI Signed-off-by: Paul Guyot <pguyot@kallisys.net>
https://ampcode.com/threads/T-019cb8b8-9e4c-7316-9566-c7e3f5f2b6db Fix a use-after-free race condition in the generic_unix socket driver's close handler, detected by Valgrind during CI gen_tcp tests. The close handler in socket_consume_mailbox used a two-phase locking pattern: it acquired the glb->listeners lock to NULL-out the socket_data listener pointers, released it, then called sys_unregister_listener (which re-acquires the lock) to remove the listener from the linked list. Between the unlock and re-lock, the event loop thread could also unlink the same listener node via process_listener_handler after the callback returned NULL. The subsequent list_remove in sys_unregister_listener then operated on stale prev/next pointers, corrupting the list or writing to freed memory. The fix makes the pointer detach and list unlink atomic under a single lock hold by introducing sys_unregister_listener_nolock — a variant that assumes the caller already holds the glb->listeners write lock. The close handler now NULLs the pointers, unlinks the listeners, and releases the lock before freeing the memory. This pattern is specific to generic_unix; ESP32 and RP2 use a single global listener for the socket driver subsystem and are not affected. Signed-off-by: Peter M <petermm@gmail.com>
That commit fixed the close-time double-unlink/use-after-free race in
generic_unix listener teardown. This change addresses a separate race in
listener registration, where a listener could become visible to the
event loop before socket_data published the corresponding pointer. Both
fixes are needed; this patch complements the earlier teardown fix rather
than replacing it.
Fix a race in the generic_unix socket driver where newly created
listeners were registered in the global listener list before the
corresponding socket_data->{active,passive}_listener pointer was
published.
If the event loop processed the listener in that window, the callback
could consume, free, or replace the listener before the socket driver
stored the pointer. The later assignment then left socket_data
pointing at stale listener memory, which could surface as random hangs
or corruption in gen_tcp tests, including timeouts waiting for the
server helper process to start.
Publish the listener pointer before calling sys_register_listener
in all affected paths:
active UDP receive listener setup
active TCP receive listener setup
passive recv/recvfrom listener setup
accept listener setup
This complements the earlier close-path fix by removing another
generic_unix listener lifecycle race.
Signed-off-by: Peter M <petermm@gmail.com>
Configure newly accepted generic_unix TCP sockets as nonblocking before publishing them to the socket driver machinery. If fcntl fails, close the accepted fd and reply with an error so callers never observe a partially initialized connection. Signed-off-by: Peter M <petermm@gmail.com>
If the owning process terminates between the accept() call and globalcontext_get_process_lock(), ctx will be NULL. The code immediately dereferences ctx->platform_data without checking, causing a segfault. Add a NULL check consistent with other callbacks (e.g. recv_callback), closing the accepted fd if needed and freeing the listener before returning. Signed-off-by: Peter M <petermm@gmail.com>
Serialize listener pointer publication and global listener registration under the listeners lock, so callbacks cannot observe a partially published listener or erase a replacement listener. Callbacks always free the listener being dispatched, while clearing socket_data only when it still points to that listener. Keep socket_data accesses under the process-table lock and clean up accepted socket contexts if the listening process terminates during callback processing. Signed-off-by: Peter M <petermm@gmail.com>
89566e2 to
f7bb25d
Compare
Signed-off-by: Paul Guyot <pguyot@kallisys.net>
If SMP is enabled, instead of waking the polling scheduler, grab the first ready process immediately if any. Also simplify scheduler_make_ready a little bit to reduce the number of locks. Also use semaphores on esp32 and rp2 to avoid saturating with signals. Also add a missing signal on macOS to avoid resources cleanup waiting for an unrelated wake. Since schedulers no longer poll events at every iteration, resources selected with enif_select and stopped with ERL_NIF_SELECT_STOP_SCHEDULED are now released asynchronously by the scheduler polling events, staying within the boundaries of the BEAM enif_select specification. Adjust test_file's test_gc accordingly. Signed-off-by: Paul Guyot <pguyot@kallisys.net>
f7bb25d to
994b49f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Continuation of:
If SMP is enabled, instead of waking the polling scheduler, grab the first ready process immediately if any.
Also simplify scheduler_make_ready a little bit to reduce the number of locks.
Since schedulers no longer poll events at every iteration, resources selected with enif_select and stopped with ERL_NIF_SELECT_STOP_SCHEDULED are now released asynchronously by the scheduler polling events, staying within the boundaries of the BEAM enif_select specification. Adjust test_file's test_gc accordingly.
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later