Fix concurrent client connections watch mode #1651

lucifercr07 · 2025-03-25T11:08:44Z

Issues:

watch_manager map is not thread safe thus, when multiple go-routines for watch mode are triggered it causes race condition and errors out.
cmdResNil was a shared object used across cmd package. At time multiple go-routines were updating the same object as part of r.R.Attrs.Fields["fingerprint"] = structpb.NewStringValue(strconv.FormatUint(uint64(c.Fingerprint()), 10)) or proto.Marshal which was causing concurrency error.

Testing done:

Validated with 900 concurrent watch connections with 10 updates for key/sec.
Observed no issues with watch response across clients.

Summary by CodeRabbit

New Features
- Improved command processing now provides a standardized response when no valid data is available.
Refactor
- Enhanced concurrent operations with robust synchronization in watch management, ensuring greater stability and data consistency during multi-threaded interactions.

…l obj

coderabbitai · 2025-03-25T11:08:52Z

Walkthrough

The changes update how nil objects are handled in the command module by replacing a static nil response with a dynamically generated one via a new function, GetNilRes(). Additionally, the code within the Execute method now uses this function instead of an unused variable. Separate modifications in the WatchManager add a sync.RWMutex and apply locking mechanisms across its methods to ensure thread safety during concurrent operations.

Changes

File(s)	Change Summary
`internal/cmd/cmd_get.go`, `internal/cmd/cmds.go`	Refactored nil response handling by introducing `GetNilRes()`, which encapsulates the creation of a nil `CmdRes`. The `Execute` method now calls this function instead of using a static variable.
`internal/server/…/watch_manager.go`	Added a `sync.RWMutex` field (`mu`) to `WatchManager` and incorporated locking (via `defer`) in methods like `RegisterThread`, `HandleWatch`, `HandleUnwatch`, `CleanupThreadWatchSubscriptions`, and `NotifyWatchers` for thread safety.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant CmdGet as cmdResFromObject
    participant NilRes as GetNilRes

    Caller->>CmdGet: Call cmdResFromObject(obj)
    alt obj is nil
         CmdGet->>NilRes: Request nil response
         NilRes-->>CmdGet: Return CmdRes with Response_VNil
    else obj is valid
         CmdGet-->>Caller: Processed non-nil response
    end
    CmdGet-->>Caller: Return final response

sequenceDiagram
    participant Thread as Thread
    participant WM as WatchManager

    Thread->>WM: Call method (e.g., RegisterThread)
    WM->>WM: Acquire mu lock (write)
    WM-->>WM: Update maps (clientWatchThreadMap, keyFPMap, fpClientMap)
    WM->>WM: Defer release of mu lock
    WM-->>Thread: Return result

Poem

I hop through lines of clever code,
With changes neat, a brand-new mode.
Nil responses now spring to life,
And mutex locks save threads from strife.
A bunny's cheer for every tweak,
CodeRabbit smiles—our changes speak!
🐇🌟

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

internal/cmd/cmd_get.go (1)
59-62: Improves thread safety by using per-request nil response

The change from cmdResNil to GetNilRes() addresses potential race conditions when multiple goroutines access the same shared object. This ensures that each request gets its own instance of the nil response object, preventing concurrent modifications.

Consider adding a small comment explaining why GetNilRes() is used instead of the static cmdResNil:
  if obj == nil {
+    // Use GetNilRes() instead of cmdResNil to avoid race conditions in concurrent environments
    return GetNilRes(), nil
  }
internal/cmd/cmds.go (1)
155-159: Good encapsulation of nil response creation

This new function properly encapsulates the creation of a nil response object, which helps maintain code consistency and addresses the concurrency issues mentioned in the PR.

Consider adding documentation to explain the purpose of this function:
+// GetNilRes returns a new CmdRes with a nil value response.
+// This function is used to ensure thread safety by creating a new response object
+// for each request instead of using a shared static object.
func GetNilRes() *CmdRes {
	return &CmdRes{R: &wire.Response{
		Value: &wire.Response_VNil{VNil: true},
	}}
}
internal/server/ironhawk/watch_manager.go (2)
36-42: Proper locking for RegisterThread

Adding mutex locking for the RegisterThread method ensures thread-safe modifications to the clientWatchThreadMap. The use of defer to unlock is a good practice to ensure the mutex is always released.

Since this method only writes to the map when t.Mode == "watch", consider using a read lock first and upgrading to a write lock only when necessary:
func (w *WatchManager) RegisterThread(t *IOThread) {
-	w.mu.Lock()
-	defer w.mu.Unlock()
+	if t.Mode == "watch" {
+		w.mu.Lock()
+		defer w.mu.Unlock()
+		w.clientWatchThreadMap[t.ClientID] = t
+	}
-	if t.Mode == "watch" {
-		w.clientWatchThreadMap[t.ClientID] = t
-	}
}
132-135: Thread safety added to NotifyWatchers

Adding mutex locking to the NotifyWatchers method ensures thread-safe access to the various maps when notifying watchers. This is a critical method for the watch functionality, and the locking improves reliability.

For optimal performance, you might consider using read locks for parts of this method that only read from the maps (before executing commands), as this would allow multiple concurrent reads:
func (w *WatchManager) NotifyWatchers(c *cmd.Cmd, shardManager *shardmanager.ShardManager, t *IOThread) {
-	w.mu.Lock()
-	defer w.mu.Unlock()
+	w.mu.RLock()
+	defer w.mu.RUnlock()

	key := c.Key()
	// Rest of the method...
}
However, since this method does modify maps in certain conditions (line 156), a more complex locking strategy with upgrading from read to write locks would be needed, which might add unnecessary complexity. The current approach with a single write lock is safer and easier to maintain.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64e24d4 and 16fe270.

📒 Files selected for processing (3)

internal/cmd/cmd_get.go (1 hunks)
internal/cmd/cmds.go (2 hunks)
internal/server/ironhawk/watch_manager.go (5 hunks)

🧰 Additional context used

🧬 Code Definitions (2)

internal/server/ironhawk/watch_manager.go (2)

internal/server/ironhawk/iothread.go (3)

t (36-84)

t (86-89)

IOThread (17-22)

internal/cmd/cmds.go (6)

c (31-33)

c (35-37)

c (39-44)

c (46-65)

c (148-148)

Cmd (23-29)

internal/cmd/cmd_get.go (1)

internal/cmd/cmds.go (1)

GetNilRes (155-159)

🔇 Additional comments (5)

internal/cmd/cmds.go (1)

46-48: Thread safety improvement for command execution

Changing from the static cmdResNil to a dynamically generated response through GetNilRes() is an excellent improvement for thread safety. This prevents race conditions in concurrent environments by ensuring each command execution gets its own fresh response object.

internal/server/ironhawk/watch_manager.go (4)

17-19: Thread safety improvement for WatchManager

Adding a mutex to the WatchManager struct is an excellent change that addresses race conditions when multiple goroutines interact with the watch manager's maps. This is a key fix for the concurrency issues described in the PR.

44-47: Thread safety added to HandleWatch

Adding mutex locking for the HandleWatch method ensures thread-safe access to the various maps modified in this method. Good use of defer to ensure the mutex is always unlocked.

77-80: Thread safety added to HandleUnwatch

The addition of mutex locking here addresses potential race conditions when multiple goroutines attempt to unsubscribe from watches concurrently. The use of defer ensures the mutex is properly released.

114-117: Thread safety added to CleanupThreadWatchSubscriptions

Adding mutex locking to the cleanup method ensures that the removal of watch subscriptions happens safely in concurrent environments. The consistent pattern of using defer for unlocking is good practice.

arpitbbhayani · 2025-03-25T11:41:46Z

internal/cmd/cmds.go

+func GetNilRes() *CmdRes {
+	return &CmdRes{R: &wire.Response{
+		Value: &wire.Response_VNil{VNil: true},
+	}}
+}
+


Why this function? why can't we reuse cmdResNil?

cmdResNil was a shared object used across cmd package. At time multiple go-routines were updating the same object as part of r.R.Attrs.Fields["fingerprint"] = structpb.NewStringValue(strconv.FormatUint(uint64(c.Fingerprint()), 10)) or proto.Marshal which was causing concurrency error.

Example it was returned in cmdResFromObject for different go-routines when there's no value for object

func cmdResFromObject(obj *object.Obj) (*CmdRes, error) { if obj == nil { return cmdResNil, nil }

Can be validated from below logs, they've same object address for different go-routines

2025-03-25T12:38:46+05:30 DBG Response object address address=0x104c9cea0 2025-03-25T12:38:46+05:30 DBG notifying watchers for key key=k1 watchers=141 2025-03-25T12:38:46+05:30 DBG command executed client_id=f94ec20d-28d1-454e-b505-b0e7e0f0ab0d cmd="HANDSHAKE f94ec20d-28d1-454e-b505-b0e7e0f0ab0d command" mode=command took_ns=334 2025-03-25T12:38:46+05:30 DBG command executed client_id=28cd16a8-8f5a-4924-b16f-437a067b9185 cmd="GET.WATCH k1" mode=command took_ns=417 2025-03-25T12:38:46+05:30 DBG Response object address address=0x104c9cea0 fatal error: concurrent map iteration and map write

arpitbbhayani · 2025-03-26T18:50:34Z

Thanks for the patch merged!

lucifercr07 added 2 commits March 24, 2025 17:03

Fix concurrent client connection issue watch mode

62ee1e3

Fix concurrent client connection issue watch mode for shared cmdResNi…

16fe270

…l obj

lucifercr07 mentioned this pull request Mar 25, 2025

[Bug] dice db crashing with a concurreny error #1637

Open

coderabbitai bot reviewed Mar 25, 2025

View reviewed changes

arpitbbhayani reviewed Mar 25, 2025

View reviewed changes

arpitbbhayani merged commit cf52728 into dicedb:master Mar 26, 2025
3 of 4 checks passed

arpitbbhayani mentioned this pull request Mar 27, 2025

Migrate ZADD #1613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix concurrent client connections watch mode #1651

Fix concurrent client connections watch mode #1651

Uh oh!

lucifercr07 commented Mar 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

arpitbbhayani Mar 25, 2025

Uh oh!

lucifercr07 Mar 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

arpitbbhayani commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix concurrent client connections watch mode #1651

Fix concurrent client connections watch mode #1651

Uh oh!

Conversation

lucifercr07 commented Mar 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues:

Testing done:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

arpitbbhayani Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

lucifercr07 Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arpitbbhayani commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lucifercr07 commented Mar 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

lucifercr07 Mar 25, 2025 •

edited

Loading