Tom/fix memory leak by gabsow · Pull Request #84 · RedisGears/LibMR

gabsow · 2026-02-18T15:07:18Z

Note

Low Risk
Small, localized changes to cleanup/error paths; main risk is unintended free behavior if assumptions about allocation ownership are wrong.

Overview
Fixes two memory-management issues in src/cluster.c: corrects the TLS config cleanup path to free ca_cert (instead of mistakenly freeing client_cert twice), and frees the redisAsyncContext when redisAsyncConnect returns an error to avoid leaking the async connection object.

^{Written by Cursor Bugbot for commit b3a30c5. This will update automatically on new commits. Configure here.}

Initial additions of the needed types and functions. (cherry picked from commit 95fc9cc)

(cherry picked from commit 37ea653)

(cherry picked from commit 5f0f0a9)

(cherry picked from commit c66873c)

(cherry picked from commit da3eeff)

…in steps (cherry picked from commit b252924)

… under internal commands (cherry picked from commit 597d0ac)

…s still have bugs (cherry picked from commit 58a15ab)

(cherry picked from commit f635c48)

(cherry picked from commit 7c5792a)

(cherry picked from commit d2cd1df)

(cherry picked from commit e64de66)

(cherry picked from commit f2e283d)

Avoid newer setuptools behavior in Linux and macOS workflows.

…o no need for the MR_ExecutionCtxSetDone() anymore

This reverts commit 5bbbef5.

This prevents leaking TLS config buffers and failed async contexts in error flows reported by Valgrind.

redisAsyncContext is not thread-safe. Calling redisAsyncFree from the main thread while the event loop thread processes callbacks can race and leak a parsed redisReply object. Dispatch the free to the event loop via MR_EventLoopAddTask, consistent with the existing SSL error disconnect paths. Co-authored-by: Cursor <cursoragent@cursor.com>

src/cluster.c

Co-authored-by: Cursor <cursoragent@cursor.com>

src/cluster.c

When MR_ClusterFree tears down nodes during a topology change, in-flight executions with pending messages were left waiting for responses that would never arrive. Under Valgrind the 5s idle timeout often could not fire before process exit, leaking the execution and its parsed results (SeriesListReplyParser allocations). - MR_NodeFree: drain pendingMessages and notify each execution via MR_SetInternalCommandResults(node, NULL, execution) - MR_SetInternalCommandResults: handle NULL reply by marking all steps done and reporting a disconnect error - MR_FreeAsyncContext: free partially parsed reply tree from hiredis reader stack before calling redisAsyncFree (works around hiredis not cleaning up mid-parse state in redisReaderFree) All three changes verified clean by Valgrind on the test_asm_with_data_and_queries_during_migrations test. Co-authored-by: Cursor <cursoragent@cursor.com>

src/cluster.c

galcohen-redislabs · 2026-02-24T09:39:49Z

src/cluster.c

    if(n->c){
-        redisAsyncFree(n->c);
+        /* Defer redisAsyncFree via the event loop to avoid leaking
+         * reply objects during context teardown (verified by Valgrind). */


I still don't understand the reasoning here. This code is cleaning up the leftovers upon freeing a cluster topology. There is no need to defer anything at this stage.
I think a more probable scenario for such a leak is that for some reason the MR_ClusterFree() was not called upon teardown (but this is just a guess. we need to add logs to actually understand exactly what is going on)

Reverted. Agreed — we should add logs to verify whether MR_ClusterFree() is being called during teardown rather than adding custom cleanup.

src/cluster.c

src/mr.c

Draining all pending messages and calling MR_SetInternalCommandResults for each caused crashes: regular (status) messages have executions with Reader/Mapper steps, and MR_PerformStepDoneOp asserts on non-InternalCommand step types. Only drain messages with FUNCTION_ID_INTERNAL set. Co-authored-by: Cursor <cursoragent@cursor.com>

- Remove MR_FreeAsyncContext and deferral: redisReaderFree is called during teardown; no need to defer redisAsyncFree at this stage - Remove MR_NodeFree pending-message drain: find root cause (e.g. MR_ClusterFree not called) rather than custom cleanup - Remove NULL reply handling in MR_SetInternalCommandResults: redundant Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

MR_ClusterFree was never called at process exit — only during cluster topology changes — causing Valgrind-reported leaks of nodes, connections, and related allocations. Co-authored-by: Cursor <cursoragent@cursor.com>

Expose MR_Fini (calls MR_ClusterFini) as the public teardown API, add mr_fini to the Rust bindings, and register a deinit callback in the test module so cluster state is freed on shutdown. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

src/mr.c

The mapper sleeps 30s on non-initiator shards but the execution times out after 2s. The sleeping thread pool thread prevents a clean shutdown within Valgrind's timeout. Co-authored-by: Cursor <cursoragent@cursor.com>

MR_Fini now calls mr_thpool_destroy before cluster cleanup so threads are joined on shutdown. The 30s sleep in UnevenWorkMapper was far longer than needed (max_idle is 2s) and prevented clean exit under Valgrind; 5s is sufficient to test the uneven-work path. Co-authored-by: Cursor <cursoragent@cursor.com>

…grind Remove mr_thpool_destroy from MR_Fini — it blocks shutdown waiting for the 30s sleeping thread. Restore original 30s sleep. Skip the test under Valgrind since the sleeping thread prevents clean exit (pre-existing issue, unrelated to cluster leak fix). Co-authored-by: Cursor <cursoragent@cursor.com>

- Fix ca_cert double-free: was freeing *client_cert instead of *ca_cert - Fix async context leak: add redisAsyncFree(c) on connect error path Co-authored-by: Cursor <cursoragent@cursor.com>

galcohen-redislabs and others added 27 commits February 3, 2026 20:07

MOD-13438 Add internal commands to the inner communication protocol

ffbdfd2

Initial additions of the needed types and functions. (cherry picked from commit 95fc9cc)

Adjust the includes, typedefs and function declarations

ca0af6b

(cherry picked from commit 37ea653)

Make MR_ClusterRegisterMsgReceiver() idempotent

7c45b96

(cherry picked from commit 5f0f0a9)

Spelling and other cleanups

b5154e0

(cherry picked from commit c66873c)

Send internal-command messages to current node as well

beaa220

(cherry picked from commit da3eeff)

Added support for parsing responses of internal commands and storing …

425c47c

…in steps (cherry picked from commit b252924)

WIP (doesn't compile now) - handling step-done and done notifications…

a702c0d

… under internal commands (cherry picked from commit 597d0ac)

Some cleanups. Now compiles but the done-step and done-execution part…

aa0980c

…s still have bugs (cherry picked from commit 58a15ab)

WTF? How this trivial bug survived for so many years??

b2f31de

(cherry picked from commit f635c48)

Now we can use the flag properly

33dd57b

(cherry picked from commit 7c5792a)

Mostly cleanups of funcs and structs

467712e

(cherry picked from commit d2cd1df)

minor cleanups

82b1a5e

(cherry picked from commit e64de66)

Aadded MR_ExecutionCtxSetDone()

2e70921

(cherry picked from commit f2e283d)

typo in the rust apis

4d8d66d

Minor log cleanup

ce40d25

Pin setuptools below 81 in CI

e4248ac

Avoid newer setuptools behavior in Linux and macOS workflows.

Added a comment for internal command callbacks registrations

8bca138

Retry AUTH in cases of a race that cause previous AUTH to fail

bdcb9ef

Refined the condition for when to retry AUTH

2683edb

Ride on the done.callback to avoid race between timeout and done; Als…

886d9ab

…o no need for the MR_ExecutionCtxSetDone() anymore

Merge branch 'master' into gal-13438-internal-commands-protocol

4ca8175

Don't bail out before sending NOTIFY_DONE when I'm not the initiator

ae91b57

Testing something

5bbbef5

Revert "Testing something"

e8e1bf2

This reverts commit 5bbbef5.

Fix cluster cleanup leaks in TLS and async connect paths.

c01e0fe

This prevents leaking TLS config buffers and failed async contexts in error flows reported by Valgrind.

Merge remote-tracking branch 'origin/master' into tom/fixMemoryLeak

6f12012

gabsow requested a review from galcohen-redislabs February 18, 2026 15:07

cursor bot reviewed Feb 18, 2026

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

galcohen-redislabs reviewed Feb 18, 2026

View reviewed changes

src/cluster.c Show resolved Hide resolved

galcohen-redislabs reviewed Feb 18, 2026

View reviewed changes

src/cluster.c Show resolved Hide resolved

galcohen-redislabs reviewed Feb 18, 2026

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

gabsow requested a review from galcohen-redislabs February 22, 2026 14:36

galcohen-redislabs reviewed Feb 22, 2026

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

Fix comment: free is deferred within the event loop, not across threads.

3822764

Co-authored-by: Cursor <cursoragent@cursor.com>

gabsow force-pushed the tom/fixMemoryLeak branch from 0fc565c to 3822764 Compare February 23, 2026 09:53

cursor bot reviewed Feb 23, 2026

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

gabsow requested a review from galcohen-redislabs February 23, 2026 11:27

gabsow force-pushed the tom/fixMemoryLeak branch from 73138bf to b3cbd9e Compare February 23, 2026 19:58

cursor bot reviewed Feb 23, 2026

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

galcohen-redislabs reviewed Feb 24, 2026

View reviewed changes

gabsow and others added 5 commits February 24, 2026 14:26

Remove extra blank line in MR_SetInternalCommandResults

c34df39

Co-authored-by: Cursor <cursoragent@cursor.com>

Add MR_ClusterFini to free cluster state on module teardown

344ad11

MR_ClusterFree was never called at process exit — only during cluster topology changes — causing Valgrind-reported leaks of nodes, connections, and related allocations. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor bot reviewed Feb 24, 2026

View reviewed changes

src/mr.c Outdated Show resolved Hide resolved

src/mr.c Outdated Show resolved Hide resolved

gabsow and others added 4 commits February 24, 2026 20:58

Skip testUnevenWork under Valgrind

c0888d2

The mapper sleeps 30s on non-initiator shards but the execution times out after 2s. The sleeping thread pool thread prevents a clean shutdown within Valgrind's timeout. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix cluster cleanup leaks in TLS and async connect paths.

b3a30c5

- Fix ca_cert double-free: was freeing *client_cert instead of *ca_cert - Fix async context leak: add redisAsyncFree(c) on connect error path Co-authored-by: Cursor <cursoragent@cursor.com>

gabsow merged commit c21b080 into master Feb 26, 2026
4 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tom/fix memory leak#84

Tom/fix memory leak#84
gabsow merged 38 commits intomasterfrom
tom/fixMemoryLeak

gabsow commented Feb 18, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

galcohen-redislabs Feb 24, 2026

Uh oh!

gabsow Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gabsow commented Feb 18, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

galcohen-redislabs Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gabsow Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gabsow commented Feb 18, 2026 •

edited by cursor bot

Loading