Skip to content

Tom/fix memory leak#84

Merged
gabsow merged 38 commits intomasterfrom
tom/fixMemoryLeak
Feb 26, 2026
Merged

Tom/fix memory leak#84
gabsow merged 38 commits intomasterfrom
tom/fixMemoryLeak

Conversation

@gabsow
Copy link
Contributor

@gabsow gabsow commented Feb 18, 2026

Note

Low Risk
Small, localized changes to cleanup/error paths; main risk is unintended free behavior if assumptions about allocation ownership are wrong.

Overview
Fixes two memory-management issues in src/cluster.c: corrects the TLS config cleanup path to free ca_cert (instead of mistakenly freeing client_cert twice), and frees the redisAsyncContext when redisAsyncConnect returns an error to avoid leaking the async connection object.

Written by Cursor Bugbot for commit b3a30c5. This will update automatically on new commits. Configure here.

galcohen-redislabs and others added 27 commits February 3, 2026 20:07
Initial additions of the needed types and functions.

(cherry picked from commit 95fc9cc)
(cherry picked from commit c66873c)
… under internal commands

(cherry picked from commit 597d0ac)
(cherry picked from commit 7c5792a)
(cherry picked from commit d2cd1df)
(cherry picked from commit e64de66)
(cherry picked from commit f2e283d)
Avoid newer setuptools behavior in Linux and macOS workflows.
…o no need for the MR_ExecutionCtxSetDone() anymore
This reverts commit 5bbbef5.
This prevents leaking TLS config buffers and failed async contexts in error flows reported by Valgrind.
redisAsyncContext is not thread-safe. Calling redisAsyncFree from the
main thread while the event loop thread processes callbacks can race
and leak a parsed redisReply object. Dispatch the free to the event
loop via MR_EventLoopAddTask, consistent with the existing SSL error
disconnect paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
When MR_ClusterFree tears down nodes during a topology change,
in-flight executions with pending messages were left waiting for
responses that would never arrive.  Under Valgrind the 5s idle
timeout often could not fire before process exit, leaking the
execution and its parsed results (SeriesListReplyParser allocations).

- MR_NodeFree: drain pendingMessages and notify each execution via
  MR_SetInternalCommandResults(node, NULL, execution)
- MR_SetInternalCommandResults: handle NULL reply by marking all
  steps done and reporting a disconnect error
- MR_FreeAsyncContext: free partially parsed reply tree from
  hiredis reader stack before calling redisAsyncFree (works around
  hiredis not cleaning up mid-parse state in redisReaderFree)

All three changes verified clean by Valgrind on the
test_asm_with_data_and_queries_during_migrations test.

Co-authored-by: Cursor <cursoragent@cursor.com>
src/cluster.c Outdated
if(n->c){
redisAsyncFree(n->c);
/* Defer redisAsyncFree via the event loop to avoid leaking
* reply objects during context teardown (verified by Valgrind). */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the reasoning here. This code is cleaning up the leftovers upon freeing a cluster topology. There is no need to defer anything at this stage.
I think a more probable scenario for such a leak is that for some reason the MR_ClusterFree() was not called upon teardown (but this is just a guess. we need to add logs to actually understand exactly what is going on)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. Agreed — we should add logs to verify whether MR_ClusterFree() is being called during teardown rather than adding custom cleanup.

gabsow and others added 5 commits February 24, 2026 14:26
Draining all pending messages and calling MR_SetInternalCommandResults
for each caused crashes: regular (status) messages have executions with
Reader/Mapper steps, and MR_PerformStepDoneOp asserts on non-InternalCommand
step types. Only drain messages with FUNCTION_ID_INTERNAL set.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Remove MR_FreeAsyncContext and deferral: redisReaderFree is called
  during teardown; no need to defer redisAsyncFree at this stage
- Remove MR_NodeFree pending-message drain: find root cause (e.g.
  MR_ClusterFree not called) rather than custom cleanup
- Remove NULL reply handling in MR_SetInternalCommandResults: redundant

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
MR_ClusterFree was never called at process exit — only during
cluster topology changes — causing Valgrind-reported leaks of
nodes, connections, and related allocations.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose MR_Fini (calls MR_ClusterFini) as the public teardown API,
add mr_fini to the Rust bindings, and register a deinit callback
in the test module so cluster state is freed on shutdown.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

gabsow and others added 4 commits February 24, 2026 20:58
The mapper sleeps 30s on non-initiator shards but the execution
times out after 2s.  The sleeping thread pool thread prevents a
clean shutdown within Valgrind's timeout.

Co-authored-by: Cursor <cursoragent@cursor.com>
MR_Fini now calls mr_thpool_destroy before cluster cleanup so
threads are joined on shutdown.  The 30s sleep in UnevenWorkMapper
was far longer than needed (max_idle is 2s) and prevented clean
exit under Valgrind; 5s is sufficient to test the uneven-work path.

Co-authored-by: Cursor <cursoragent@cursor.com>
…grind

Remove mr_thpool_destroy from MR_Fini — it blocks shutdown waiting
for the 30s sleeping thread.  Restore original 30s sleep.  Skip
the test under Valgrind since the sleeping thread prevents clean
exit (pre-existing issue, unrelated to cluster leak fix).

Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix ca_cert double-free: was freeing *client_cert instead of *ca_cert
- Fix async context leak: add redisAsyncFree(c) on connect error path

Co-authored-by: Cursor <cursoragent@cursor.com>
@gabsow gabsow merged commit c21b080 into master Feb 26, 2026
4 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants