Merged
Conversation
Initial additions of the needed types and functions. (cherry picked from commit 95fc9cc)
(cherry picked from commit 37ea653)
(cherry picked from commit 5f0f0a9)
(cherry picked from commit c66873c)
(cherry picked from commit da3eeff)
…in steps (cherry picked from commit b252924)
… under internal commands (cherry picked from commit 597d0ac)
…s still have bugs (cherry picked from commit 58a15ab)
(cherry picked from commit f635c48)
(cherry picked from commit 7c5792a)
(cherry picked from commit d2cd1df)
(cherry picked from commit e64de66)
(cherry picked from commit f2e283d)
Avoid newer setuptools behavior in Linux and macOS workflows.
…o no need for the MR_ExecutionCtxSetDone() anymore
This reverts commit 5bbbef5.
This prevents leaking TLS config buffers and failed async contexts in error flows reported by Valgrind.
redisAsyncContext is not thread-safe. Calling redisAsyncFree from the main thread while the event loop thread processes callbacks can race and leak a parsed redisReply object. Dispatch the free to the event loop via MR_EventLoopAddTask, consistent with the existing SSL error disconnect paths. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
0fc565c to
3822764
Compare
When MR_ClusterFree tears down nodes during a topology change, in-flight executions with pending messages were left waiting for responses that would never arrive. Under Valgrind the 5s idle timeout often could not fire before process exit, leaking the execution and its parsed results (SeriesListReplyParser allocations). - MR_NodeFree: drain pendingMessages and notify each execution via MR_SetInternalCommandResults(node, NULL, execution) - MR_SetInternalCommandResults: handle NULL reply by marking all steps done and reporting a disconnect error - MR_FreeAsyncContext: free partially parsed reply tree from hiredis reader stack before calling redisAsyncFree (works around hiredis not cleaning up mid-parse state in redisReaderFree) All three changes verified clean by Valgrind on the test_asm_with_data_and_queries_during_migrations test. Co-authored-by: Cursor <cursoragent@cursor.com>
73138bf to
b3cbd9e
Compare
src/cluster.c
Outdated
| if(n->c){ | ||
| redisAsyncFree(n->c); | ||
| /* Defer redisAsyncFree via the event loop to avoid leaking | ||
| * reply objects during context teardown (verified by Valgrind). */ |
Collaborator
There was a problem hiding this comment.
I still don't understand the reasoning here. This code is cleaning up the leftovers upon freeing a cluster topology. There is no need to defer anything at this stage.
I think a more probable scenario for such a leak is that for some reason the MR_ClusterFree() was not called upon teardown (but this is just a guess. we need to add logs to actually understand exactly what is going on)
Contributor
Author
There was a problem hiding this comment.
Reverted. Agreed — we should add logs to verify whether MR_ClusterFree() is being called during teardown rather than adding custom cleanup.
Draining all pending messages and calling MR_SetInternalCommandResults for each caused crashes: regular (status) messages have executions with Reader/Mapper steps, and MR_PerformStepDoneOp asserts on non-InternalCommand step types. Only drain messages with FUNCTION_ID_INTERNAL set. Co-authored-by: Cursor <cursoragent@cursor.com>
- Remove MR_FreeAsyncContext and deferral: redisReaderFree is called during teardown; no need to defer redisAsyncFree at this stage - Remove MR_NodeFree pending-message drain: find root cause (e.g. MR_ClusterFree not called) rather than custom cleanup - Remove NULL reply handling in MR_SetInternalCommandResults: redundant Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
MR_ClusterFree was never called at process exit — only during cluster topology changes — causing Valgrind-reported leaks of nodes, connections, and related allocations. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose MR_Fini (calls MR_ClusterFini) as the public teardown API, add mr_fini to the Rust bindings, and register a deinit callback in the test module so cluster state is freed on shutdown. Co-authored-by: Cursor <cursoragent@cursor.com>
The mapper sleeps 30s on non-initiator shards but the execution times out after 2s. The sleeping thread pool thread prevents a clean shutdown within Valgrind's timeout. Co-authored-by: Cursor <cursoragent@cursor.com>
MR_Fini now calls mr_thpool_destroy before cluster cleanup so threads are joined on shutdown. The 30s sleep in UnevenWorkMapper was far longer than needed (max_idle is 2s) and prevented clean exit under Valgrind; 5s is sufficient to test the uneven-work path. Co-authored-by: Cursor <cursoragent@cursor.com>
…grind Remove mr_thpool_destroy from MR_Fini — it blocks shutdown waiting for the 30s sleeping thread. Restore original 30s sleep. Skip the test under Valgrind since the sleeping thread prevents clean exit (pre-existing issue, unrelated to cluster leak fix). Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix ca_cert double-free: was freeing *client_cert instead of *ca_cert - Fix async context leak: add redisAsyncFree(c) on connect error path Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Low Risk
Small, localized changes to cleanup/error paths; main risk is unintended free behavior if assumptions about allocation ownership are wrong.
Overview
Fixes two memory-management issues in
src/cluster.c: corrects the TLS config cleanup path to freeca_cert(instead of mistakenly freeingclient_certtwice), and frees theredisAsyncContextwhenredisAsyncConnectreturns an error to avoid leaking the async connection object.Written by Cursor Bugbot for commit b3a30c5. This will update automatically on new commits. Configure here.