-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: rework message processing in CNode to prioritize quorum messages #6952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rework message processing in CNode to prioritize quorum messages #6952
Conversation
This update introduces a new queue for quorum-priority messages, allowing for better handling of critical network messages. The `MarkReceivedMsgsForProcessing` function now classifies incoming messages into quorum and normal queues, while the `PollMessage` function prioritizes messages from the quorum queue when available. This change enhances the efficiency of message processing and ensures that important messages are handled promptly.
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
WalkthroughThe patch separates incoming message handling into two internal queues per CNode: a quorum-priority queue and the existing normal processing queue. It adds CNode members Sequence DiagramsequenceDiagram
participant Net as Network Layer
participant Node as CNode
participant Quorum as m_msg_quorum_queue
participant Normal as m_msg_process_queue
participant Proc as Message Processor
Net->>Node: Completed message arrives
Node->>Node: IsQuorumPriorityMessage(msg_type)?
alt Quorum-priority
Node->>Quorum: Splice into quorum queue
Node->>Node: m_msg_quorum_queue_size++
else Normal
Node->>Normal: Splice into normal queue
Node->>Node: m_msg_process_queue_size++
end
Note right of Node: Backpressure = quorum_size + normal_size
opt PollMessage invocation
alt Quorum non-empty and ratio allows
Quorum-->>Node: Dequeue quorum message
Node->>Proc: Process quorum message
Node->>Node: quorum_size--, m_quorum_msg_count_since_normal++
else
Normal-->>Node: Dequeue normal message
Node->>Proc: Process normal message
Node->>Node: normal_size--, m_quorum_msg_count_since_normal = 0
end
Note right of Node: "more" = (quorum_size > 0) or (normal_size > 0)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6952.6c955b7e. A new comment will be made when the image is pushed. |
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6952.6c955b7e. The image should be on dockerhub soon. |
|
@coderabbitai; is this PR a good approach? The primary issue / goal is ensuring that islocks process and are generated as fast as possible, and are not blocked by processing transactions / misc inv messages. Are there other changes needed to ensure that is locks are generated, processed, and propagated as fast as possible? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 1292 🏁 Script executed: Length of output: 2649 🏁 Script executed: Length of output: 3557 🏁 Script executed: Length of output: 567 🏁 Script executed: Length of output: 2701 🏁 Script executed: Length of output: 12011 🏁 Script executed: Length of output: 831 🏁 Script executed: Length of output: 6915 @PastaPastaPasta, the dual-queue approach in this PR is a solid foundation for prioritizing ISLOCK processing, but there are several gaps that could limit its effectiveness. Let me break this down: ✅ What This PR Does WellISLOCKs are propagated using the INV/GETDATA subsystem, and your PR correctly prioritizes the actual
|
|
@coderabbitai; that's an interesting insight, but I believe the lack of |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 9465 🏁 Script executed: Length of output: 8292 🏁 Script executed: Length of output: 444 🏁 Script executed: Length of output: 15350 🏁 Script executed: Length of output: 6015 @PastaPastaPasta, you're absolutely right! Thank you for that clarification. I was overemphasizing the ISDLOCK message path when the primary propagation mechanism is indeed through QSIGREC (recovered signature) messages. Let me revise my assessment based on this correct understanding. Revised Analysis: The PR is Well-Targeted ✅Looking at the code in
Your PR already prioritizes the critical path:
Remaining ConsiderationsWhile the core quorum messaging is well-prioritized, there are a few areas to consider: 1. Transaction PropagationThe underlying transaction still needs to reach nodes before they can meaningfully process the ISLOCK. Currently,
2. Quorum Message VolumeMonitor whether the priority queue could get congested during high quorum activity (e.g., many simultaneous signing sessions). Consider:
3. Outbound Message Prioritization (Future Enhancement)This PR only prioritizes incoming messages. For network-wide propagation speed, consider (in a future PR):
VerdictYour approach is sound! By prioritizing QSIGREC and the supporting quorum signing messages, you're targeting the actual critical path for ISLOCK generation and propagation. The ISDLOCK prioritization is a nice-to-have for the fallback path, but the real value is in the quorum message prioritization. The main enhancement I'd suggest is adding observability - metrics or logging to verify that:
Would instrumenting the queue depths and processing times help validate the improvement? 🧠 Learnings used |
|
Performed testing on testnet under high load; results as follows: Improved Session Progression Speed: What it measures: Time from first sigShare received (first ProcessMessageSigShare call) to either:
Duration:
What it DOES measure:
Interpretation: Group 1's 21% faster successful session completion (127s vs 154s) means:
These results do appear to be statistically significant. However, it doesn't actually improve recovery rates, but the time it takes to get to recovery does appear to be significantly improved. |
|
|
||
| size_t nSizeAdded = 0; | ||
| for (const auto& msg : vRecvMsg) { | ||
| size_t nQuorumSizeAdded = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/nQuorumSizeAdded/quorum_size_added/`
Code-style recommends to use snake_case for local variables
…eue of processing
f97c762 to
1d16451
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/net.cpp (1)
4655-4657: Nit: local var naming already flaggednQuorumSizeAdded/nNormalSizeAdded don’t follow snake_case per prior note. If you want to address it here, rename to quorum_size_added/normal_size_added.
- size_t nQuorumSizeAdded = 0; - size_t nNormalSizeAdded = 0; + size_t quorum_size_added = 0; + size_t normal_size_added = 0; ... - nQuorumSizeAdded += msg.m_raw_message_size; + quorum_size_added += msg.m_raw_message_size; ... - nNormalSizeAdded += msg.m_raw_message_size; + normal_size_added += msg.m_raw_message_size; ... - m_msg_quorum_queue_size += nQuorumSizeAdded; + m_msg_quorum_queue_size += quorum_size_added; - m_msg_process_queue_size += nNormalSizeAdded; + m_msg_process_queue_size += normal_size_added;
🧹 Nitpick comments (2)
src/net.h (1)
2008-2026: Tighten helper signature and reduce header surface (optional)
- Consider taking std::string_view to avoid potential string constructions at call sites.
- If only used in net.cpp, moving this helper there (static) would trim header weight.
Example minimal change:
-// Helper function to determine if a message type should be prioritized in the quorum queue -inline bool IsQuorumPriorityMessage(const std::string& msg_type) +// Helper function to determine if a message type should be prioritized in the quorum queue +#include <string_view> +inline bool IsQuorumPriorityMessage(std::string_view msg_type) {No behavior change implied; purely an interface/compile-time optimization.
src/net.cpp (1)
4688-4728: Make prioritization adaptive/tunable and add light metrics (optional)
- Replace fixed QUORUM_TO_NORMAL_RATIO with size-aware balancing to prevent long bursts when normal backlog grows, e.g.:
- constexpr size_t QUORUM_TO_NORMAL_RATIO = 100; + constexpr size_t QUORUM_TO_NORMAL_RATIO = 100; // default + // Optional: also balance by relative queue sizes (favor normal if quorum backlog isn't dominant) ... - bool skip_quorum_processing = !m_msg_process_queue.empty() && - m_quorum_msg_count_since_normal >= QUORUM_TO_NORMAL_RATIO; + const bool ratio_limit = m_quorum_msg_count_since_normal >= QUORUM_TO_NORMAL_RATIO; + const bool size_bias = (m_msg_quorum_queue_size * 3 <= m_msg_process_queue_size); + bool skip_quorum_processing = !m_msg_process_queue.empty() && (ratio_limit || size_bias);
- Consider emitting gauges to validate impact in prod:
- bandwidth.queue.quorum.bytes = m_msg_quorum_queue_size
- bandwidth.queue.normal.bytes = m_msg_process_queue_size
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/net.cpp(1 hunks)src/net.h(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/net.hsrc/net.cpp
🧠 Learnings (11)
📓 Common learnings
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR #6761, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.
Learnt from: kwvg
Repo: dashpay/dash PR: 6718
File: test/functional/test_framework/test_framework.py:2102-2102
Timestamp: 2025-06-09T16:43:20.996Z
Learning: In the test framework consolidation PR (#6718), user kwvg prefers to limit functional changes to those directly related to MasternodeInfo, avoiding scope creep even for minor improvements like error handling consistency.
📚 Learning: 2025-10-02T18:29:54.756Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6840
File: src/net_processing.cpp:2882-2886
Timestamp: 2025-10-02T18:29:54.756Z
Learning: Across net_processing.cpp, once LLMQContext (m_llmq_ctx) is asserted non-null, its subcomponents (e.g., isman, qdkgsman, quorum_block_processor) are treated as initialized and used without extra null checks.
Applied to files:
src/net.hsrc/net.cpp
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.
Applied to files:
src/net.h
📚 Learning: 2025-10-05T20:38:28.457Z
Learnt from: knst
Repo: dashpay/dash PR: 6871
File: contrib/guix/libexec/build.sh:358-360
Timestamp: 2025-10-05T20:38:28.457Z
Learning: In the Dash repository, when backporting code from Bitcoin Core, typos and minor issues in comments should be kept as-is to reduce merge conflicts in future backports, even if they remain unfixed in Bitcoin Core's master branch.
Applied to files:
src/net.cpp
📚 Learning: 2025-02-06T14:34:30.466Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Applied to files:
src/net.cpp
📚 Learning: 2025-11-04T18:24:27.241Z
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.
Applied to files:
src/net.cpp
📚 Learning: 2025-07-23T09:30:34.631Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.h:5-6
Timestamp: 2025-07-23T09:30:34.631Z
Learning: Dash Core uses BITCOIN_ prefix for header guards as the standard convention, inherited from Bitcoin Core. Only a few BLS-specific files in src/bls/ use DASH_ prefix. The vast majority of files (385+) use BITCOIN_ prefix.
Applied to files:
src/net.cpp
🧬 Code graph analysis (1)
src/net.cpp (1)
src/net.h (4)
IsQuorumPriorityMessage(2009-2026)m_msg_process_queue_mutex(1096-1096)m_msg_process_queue_mutex(1098-1410)m_msg_process_queue_mutex(1099-1126)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux64-build / Build source
- GitHub Check: linux64_fuzz-build / Build source
- GitHub Check: x86_64-pc-linux-gnu_nowallet / Build depends
- GitHub Check: Lint / Run linters
- GitHub Check: x86_64-w64-mingw32 / Build depends
🔇 Additional comments (3)
src/net.h (1)
1097-1100: Quorum queue state in CNode: locking/memory accounting LGTMMembers are correctly guarded by m_msg_process_queue_mutex, and size counters align with m_recv_flood_size usage. No issues.
src/net.cpp (2)
4655-4682: Classification + backpressure logic looks solid
- Safe splice-from vRecvMsg, then single lock to splice into destination queues.
- Size accounting uses m_raw_message_size and fPauseRecv computed on combined sizes. Good.
4688-4728: Priority polling: correctness/fairness LGTM
- No starvation: ratio gate only applies when normal has items; counter reset on normal.
- ‘more’ correctly reflects either queue having backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/net.cpp (1)
4655-4671: Safer splice iteration: avoid post-increment in call expressionPre-increment the iterator into a temp before splice to sidestep sequencing pitfalls and improve readability.
- for (auto it = vRecvMsg.begin(); it != vRecvMsg.end();) { - auto& msg = *it; + for (auto it = vRecvMsg.begin(); it != vRecvMsg.end();) { + auto curr = it++; + auto& msg = *curr; // vRecvMsg contains only completed CNetMessage // the single possible partially deserialized message are held by TransportDeserializer if (IsQuorumPriorityMessage(msg.m_type)) { - quorumMsgs.splice(quorumMsgs.end(), vRecvMsg, it++); + quorumMsgs.splice(quorumMsgs.end(), vRecvMsg, curr); nQuorumSizeAdded += msg.m_raw_message_size; } else { - normalMsgs.splice(normalMsgs.end(), vRecvMsg, it++); + normalMsgs.splice(normalMsgs.end(), vRecvMsg, curr); nNormalSizeAdded += msg.m_raw_message_size; } }Note: knst already suggested renaming nQuorumSizeAdded → quorum_size_added for style. This change only addresses iteration safety/readability.
🧹 Nitpick comments (2)
src/net.h (1)
2008-2026: Minor: prefer string_view and noexcept; avoid repeated == chainsSaves tiny allocations and clarifies intent; no behavior change.
-inline bool IsQuorumPriorityMessage(const std::string& msg_type) +inline bool IsQuorumPriorityMessage(std::string_view msg_type) noexcept { - // LLMQ signing/data messages + // LLMQ signing/data messages if (msg_type == NetMsgType::QSIGSHARE || msg_type == NetMsgType::QBSIGSHARES || msg_type == NetMsgType::QSIGSHARESINV || msg_type == NetMsgType::QGETSIGSHARES || msg_type == NetMsgType::QSIGSESANN || msg_type == NetMsgType::QSIGREC) { return true; } // High-level lock messages (ChainLocks, InstantSend locks) if (msg_type == NetMsgType::CLSIG || msg_type == NetMsgType::ISDLOCK) { return true; } return false; }src/net.cpp (1)
4688-4728: Fairness heuristic: make ratio adaptive to queue sizesHard-coding QUORUM_TO_NORMAL_RATIO=100 can starve normal messages under sustained quorum load. Consider an adaptive gate, e.g., prioritize quorum while its depth dominates by a factor (like 3x), otherwise process normal sooner. Keeps bursts when normal is empty.
- constexpr size_t QUORUM_TO_NORMAL_RATIO = 100; - bool skip_quorum_processing = !m_msg_process_queue.empty() && - m_quorum_msg_count_since_normal >= QUORUM_TO_NORMAL_RATIO; + // Prefer quorum when it meaningfully dominates; still allow bursts when normal is empty. + const bool both_have_msgs = !m_msg_quorum_queue.empty() && !m_msg_process_queue.empty(); + const bool quorum_dominates = m_msg_quorum_queue.size() * 3 > m_msg_process_queue.size(); + bool skip_quorum_processing = both_have_msgs && !quorum_dominates;Optionally add gauges for queue depths to confirm behavior under load.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/net.cpp(1 hunks)src/net.h(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/net.hsrc/net.cpp
🧠 Learnings (10)
📓 Common learnings
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR #6761, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.
Learnt from: kwvg
Repo: dashpay/dash PR: 6718
File: test/functional/test_framework/test_framework.py:2102-2102
Timestamp: 2025-06-09T16:43:20.996Z
Learning: In the test framework consolidation PR (#6718), user kwvg prefers to limit functional changes to those directly related to MasternodeInfo, avoiding scope creep even for minor improvements like error handling consistency.
📚 Learning: 2025-10-02T18:29:54.756Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6840
File: src/net_processing.cpp:2882-2886
Timestamp: 2025-10-02T18:29:54.756Z
Learning: Across net_processing.cpp, once LLMQContext (m_llmq_ctx) is asserted non-null, its subcomponents (e.g., isman, qdkgsman, quorum_block_processor) are treated as initialized and used without extra null checks.
Applied to files:
src/net.hsrc/net.cpp
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.
Applied to files:
src/net.h
📚 Learning: 2025-10-05T20:38:28.457Z
Learnt from: knst
Repo: dashpay/dash PR: 6871
File: contrib/guix/libexec/build.sh:358-360
Timestamp: 2025-10-05T20:38:28.457Z
Learning: In the Dash repository, when backporting code from Bitcoin Core, typos and minor issues in comments should be kept as-is to reduce merge conflicts in future backports, even if they remain unfixed in Bitcoin Core's master branch.
Applied to files:
src/net.cpp
📚 Learning: 2025-02-06T14:34:30.466Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Applied to files:
src/net.cpp
📚 Learning: 2025-11-04T18:24:27.241Z
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.
Applied to files:
src/net.cpp
🧬 Code graph analysis (2)
src/net.h (1)
src/protocol.h (1)
NetMsgType(60-301)
src/net.cpp (1)
src/net.h (4)
IsQuorumPriorityMessage(2009-2026)m_msg_process_queue_mutex(1096-1096)m_msg_process_queue_mutex(1098-1410)m_msg_process_queue_mutex(1099-1126)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: linux64_nowallet-build / Build source
- GitHub Check: mac-build / Build source
- GitHub Check: win64-build / Build source
- GitHub Check: linux64_tsan-build / Build source
- GitHub Check: arm-linux-build / Build source
- GitHub Check: linux64_ubsan-build / Build source
- GitHub Check: linux64_sqlite-build / Build source
- GitHub Check: linux64-build / Build source
- GitHub Check: linux64_fuzz-build / Build source
🔇 Additional comments (2)
src/net.h (1)
1097-1100: Quorum queue fields: locking and accounting look correctGUARDED_BY annotations match m_msg_process_queue_mutex; sizes initialized; aligns with usage in net.cpp. LGTM.
src/net.cpp (1)
4727-4728: Return flag ‘more’ accounts for both queuesCorrectly reports more work if either queue is non-empty. LGTM.
Co-authored-by: UdjinM6 <UdjinM6@users.noreply.github.com>
|
Guix Automation has failed due to the HEAD commit not being signed by an authorized core-team member. Please rebase and sign or push a new empty signed commit to allow Guix build to happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/net.cpp (1)
4655-4656: Apply snake_case naming convention for local variables.As previously noted by knst, these local variables should follow the snake_case convention rather than camelCase with an 'n' prefix.
Apply this diff:
- size_t nQuorumSizeAdded = 0; - size_t nNormalSizeAdded = 0; + size_t quorum_size_added = 0; + size_t normal_size_added = 0;And update their usage accordingly at lines 4667, 4670, 4677, and 4679.
🧹 Nitpick comments (1)
src/net.cpp (1)
4691-4691: Consider documenting the ratio constant selection.The 100:1 ratio strongly prioritizes quorum messages. While this appears to have been tested on testnet (per PR objectives), a comment documenting the rationale and tradeoffs would help future maintainers understand this design choice.
Consider adding:
// Process 100 quorum messages for every 1 normal message when both queues have data. // This strongly prioritizes quorum-critical messages (QSIGREC, CLSIG, ISDLOCK, etc.) // while still ensuring forward progress on normal messages. When the normal queue is // empty, quorum messages are processed in bursts without ratio limits. constexpr size_t QUORUM_TO_NORMAL_RATIO = 100;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/net.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/net.cpp
🧠 Learnings (6)
📓 Common learnings
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR #6761, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.
Learnt from: kwvg
Repo: dashpay/dash PR: 6718
File: test/functional/test_framework/test_framework.py:2102-2102
Timestamp: 2025-06-09T16:43:20.996Z
Learning: In the test framework consolidation PR (#6718), user kwvg prefers to limit functional changes to those directly related to MasternodeInfo, avoiding scope creep even for minor improvements like error handling consistency.
📚 Learning: 2025-10-02T18:29:54.756Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6840
File: src/net_processing.cpp:2882-2886
Timestamp: 2025-10-02T18:29:54.756Z
Learning: Across net_processing.cpp, once LLMQContext (m_llmq_ctx) is asserted non-null, its subcomponents (e.g., isman, qdkgsman, quorum_block_processor) are treated as initialized and used without extra null checks.
Applied to files:
src/net.cpp
📚 Learning: 2025-10-05T20:38:28.457Z
Learnt from: knst
Repo: dashpay/dash PR: 6871
File: contrib/guix/libexec/build.sh:358-360
Timestamp: 2025-10-05T20:38:28.457Z
Learning: In the Dash repository, when backporting code from Bitcoin Core, typos and minor issues in comments should be kept as-is to reduce merge conflicts in future backports, even if they remain unfixed in Bitcoin Core's master branch.
Applied to files:
src/net.cpp
📚 Learning: 2025-02-06T14:34:30.466Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Applied to files:
src/net.cpp
📚 Learning: 2025-11-04T18:24:27.241Z
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.
Applied to files:
src/net.cpp
📚 Learning: 2025-07-23T09:30:34.631Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.h:5-6
Timestamp: 2025-07-23T09:30:34.631Z
Learning: Dash Core uses BITCOIN_ prefix for header guards as the standard convention, inherited from Bitcoin Core. Only a few BLS-specific files in src/bls/ use DASH_ prefix. The vast majority of files (385+) use BITCOIN_ prefix.
Applied to files:
src/net.cpp
🧬 Code graph analysis (1)
src/net.cpp (1)
src/net.h (4)
IsQuorumPriorityMessage(2009-2026)m_msg_process_queue_mutex(1096-1096)m_msg_process_queue_mutex(1098-1410)m_msg_process_queue_mutex(1099-1126)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: mac-build / Build source
- GitHub Check: win64-build / Build source
- GitHub Check: linux64_sqlite-build / Build source
- GitHub Check: linux64_ubsan-build / Build source
- GitHub Check: linux64-build / Build source
- GitHub Check: linux64_fuzz-build / Build source
- GitHub Check: linux64_tsan-build / Build source
- GitHub Check: linux64_nowallet-build / Build source
- GitHub Check: arm-linux-build / Build source
- GitHub Check: Lint / Run linters
🔇 Additional comments (2)
src/net.cpp (2)
4660-4682: Well-structured message classification and queuing.The dual-queue classification approach is clean and efficient:
- Single-pass classification using
IsQuorumPriorityMessage()- Efficient
splice()operations avoid copying- Proper lock ordering: classify outside lock, then splice under lock
- Backpressure correctly computed over combined queue sizes
This implementation properly prioritizes quorum-critical messages (QSIGREC, QSIGSHARE, QBSIGSHARES, QSIGSESANN, QGETSIGSHARES, QSIGSHARESINV, CLSIG, ISDLOCK) while maintaining backpressure semantics.
4688-4728: Ratio-based polling logic correctly implements prioritization with fairness.The implementation effectively balances quorum message priority with normal message forward progress:
Burst mode optimization (lines 4695-4696): When the normal queue is empty, process quorum messages without ratio limits, maximizing throughput for high-priority messages.
Fairness guarantee (lines 4695-4696, 4707): Ratio enforcement only applies when both queues have messages, ensuring normal messages are eventually processed (maximum delay: 100 quorum message processing times).
Correct counter management:
- Only incremented when both queues non-empty (line 4707)
- Reset after processing normal message (line 4723)
Accurate "more" flag (lines 4713, 4727): Correctly checks both queues because the quorum queue may be skipped due to ratio enforcement, not emptiness.
The approach provides strong prioritization for quorum-critical messages while preventing starvation of normal traffic.
UdjinM6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 627a8a5
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6952.627a8a59. A new comment will be made when the image is pushed. |
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6952.627a8a59. The image should be on dockerhub soon. |
kwvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 627a8a5
knst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
post-utACK
nit is still relevant.
Issue being fixed or feature implemented
This update introduces a new queue for quorum-priority messages, allowing for better handling of critical network messages. The
MarkReceivedMsgsForProcessingfunction now classifies incoming messages into quorum and normal queues, while thePollMessagefunction prioritizes messages from the quorum queue when available. This change enhances the efficiency of message processing and ensures that important messages are handled promptly.What was done?
Describe your changes in detail
How Has This Been Tested?
Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran
to see how your change affects other areas of the code, etc.
Breaking Changes
Please describe any breaking changes your code introduces
Checklist:
Go over all the following points, and put an
xin all the boxes that apply.