Skip to content

refactor: break circular dependency over net_processing and dkgsessionhandler#7314

Open
knst wants to merge 12 commits into
dashpay:developfrom
knst:refactor-peermanager-handlers-dkg
Open

refactor: break circular dependency over net_processing and dkgsessionhandler#7314
knst wants to merge 12 commits into
dashpay:developfrom
knst:refactor-peermanager-handlers-dkg

Conversation

@knst
Copy link
Copy Markdown
Collaborator

@knst knst commented May 8, 2026

Issue being fixed or feature implemented

This PR is continuous of #7247
This PR is not direct dependency of kernel project.

This PR aim to resolve next issues:

  1. constructor of PeerManager uses references to unique_ptr to multiple objects that will be initialized later, such as:

    const std::unique_ptr<ActiveContext>& active_ctx,
    const std::unique_ptr<CDeterministicMNManager>& dmnman,
    const std::unique_ptr<CJWalletManager>& cj_walletman,
    const std::unique_ptr<LLMQContext>& llmq_ctx,
    const std::unique_ptr<llmq::ObserverContext>& observer_ctx,
    

That's a fragile design that has multiple assumptions about already initialized members and their life term

  1. Implementation of state machine for DKG mechanism and p2p implementation is tightly connected.

What was done?

  • CDKGSessionManager is reduced to a pure state class, it owns DB and provides 2 new helper: ForEachHandler / DoForHandler
  • CDKGSessionHandler and ActiveDKGSessionHandler loses its threading and ProcessMessage members
  • MessageProcessingResult usages are dropped from llmq/ consensus code
  • PeerManager forgot about Observer/ActiveContext and lost 2 unique_ptr& from constructor
  • new NetHandler NetDKG is introduced which takes responsibilities for p2p communications for DKG works and for running threads

How Has This Been Tested?

  • Run unit, functional tests
  • Run test/lint/lint-circular-dependencies.py linter

Removed circular dependency over dkgsessionhandler <-> net_processing

Breaking Changes

N/A

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@knst knst added this to the 24 milestone May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

⚠️ Potential Merge Conflicts Detected

This PR has potential conflicts with the following open PRs:

Please coordinate with the authors of these PRs to avoid merge conflicts.

@thepastaclaw
Copy link
Copy Markdown

thepastaclaw commented May 8, 2026

Review Gate

Commit: f4b6aae8

  • Debounce: 2626m ago (need 30m)

  • CI checks: build failure: linux64_multiprocess-build / Build source, mac-build / Build source, linux64_fuzz-build / Build source

  • CodeRabbit review: comment found

  • Off-peak hours: peak window (5am-11am PT) — currently 06:46 AM PT Tuesday

  • Run review now (check to override)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Walkthrough

This PR separates LLMQ DKG networking and thread management into a new llmq::NetDKG handler (and NetDKGStub), changes CDKGSession/ActiveDKGSession phase hooks to return std::optional payloads instead of enqueueing, refactors pending-message APIs to accept pre-serialized payloads, updates CDKGSessionManager/ActiveDKGSessionHandler interfaces, simplifies Active/Observer context constructors and init wiring, and decouples PeerManager from ActiveContext by passing a CActiveMasternodeManager* nodeman. It also adds debug helpers and a spork check for enabling DKG.

Sequence Diagram(s)

sequenceDiagram
  participant Node as DKG Node
  participant NetDKG as NetDKG Handler
  participant SessionMgr as CDKGSessionManager
  participant SessionHdlr as CDKGSessionHandler
  participant ActiveDKG as ActiveDKGSession

  Node->>NetDKG: ProcessMessage(MSG_QUORUM_CONTRIB)
  NetDKG->>SessionMgr: route message (llmqType, quorumIndex)
  SessionMgr->>SessionHdlr: enqueue(serialized, hash)
  Note over SessionHdlr: Batches message in pending queues

  NetDKG->>NetDKG: HandleDKGRound()
  loop per_phase
    NetDKG->>SessionHdlr: ProcessPendingMessageBatch()
    SessionHdlr->>ActiveDKG: Contribute()/VerifyAndCommit()
    ActiveDKG-->>SessionHdlr: std::optional<Message>
    SessionHdlr->>NetDKG: RelayInvToParticipants()
  end
Loading
sequenceDiagram
  participant Init as Initialization
  participant ActiveCtx as ActiveContext
  participant PeerMgr as PeerManager
  participant NetDKG as NetDKG Handler
  participant Spork as CSporkManager

  Init->>ActiveCtx: construct(dmnman, qman, qsnapman, sigman)
  Init->>PeerMgr: make(nodeman=active_ctx ? active_ctx->nodeman.get() : nullptr)
  Note over PeerMgr: m_nodeman set
  Init->>NetDKG: construct(sporkman, dkgsman, qman)
  Init->>Spork: IsQuorumDKGEnabled()
  Note over NetDKG: Check spork for DKG enabled
  Init->>ActiveCtx: Start()
  Note over ActiveCtx: Start() takes no connman/peerman args
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • dashpay/dash#7056: Related changes to CDKGSessionManager constructor/lifecycle and quorums_watch wiring.
  • dashpay/dash#7065: Related ActiveContext API and lifecycle refactors affecting DKG handler setup.

Suggested reviewers

  • UdjinM6
  • PastaPastaPasta
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor: break circular dependency over net_processing and dkgsessionhandler' directly and clearly describes the main objective of the changeset.
Description check ✅ Passed The description comprehensively explains the issue being fixed, what was done, testing performed, and references related PRs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/active/dkgsession.cpp (1)

106-111: ⚡ Quick win

Move the sent* debug updates to the actual enqueue/broadcast path.

These methods now only build and return a message. Setting sentContributions, sentComplaint, sentJustification, and sentPrematureCommitment here records a successful send before NetDKG has actually serialized and queued/broadcast the payload.

Also applies to: 292-297, 382-387, 539-544

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/active/dkgsession.cpp` around lines 106 - 111, The
dkgDebugManager.UpdateLocalSessionStatus calls inside the message-builder
functions (e.g., setting CDKGDebugSessionStatus::statusBits.sentContributions,
sentComplaint, sentJustification, sentPrematureCommitment) must be removed from
those builders (the functions that build and return qc/messages) and moved into
the actual send path inside NetDKG — i.e., the code that serializes and
enqueues/broadcasts the payload. Locate the UpdateLocalSessionStatus calls in
the builders and delete them there, then add equivalent UpdateLocalSessionStatus
updates immediately after NetDKG performs the serialization/queuing/broadcast so
the debug flags reflect a real successful send.
src/llmq/debug.cpp (1)

213-228: 💤 Low value

Optional: make MarkAborted idempotent w.r.t. nTime.

MarkAborted's lambda always returns true, so each call bumps localStatus.nTime even when the session was already marked aborted. MarkPhaseAdvanced already does the right thing (returns changed). For consistency and to avoid spurious timestamp updates if the helper is invoked more than once on the same aborted session, consider returning a real changed flag.

♻️ Proposed change
 void CDKGDebugManager::MarkAborted(Consensus::LLMQType llmqType, int quorumIndex)
 {
     UpdateLocalSessionStatus(llmqType, quorumIndex, [&](CDKGDebugSessionStatus& status) {
+        if (status.statusBits.aborted) return false;
         status.statusBits.aborted = true;
         return true;
     });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/debug.cpp` around lines 213 - 228, MarkAborted currently always
returns true from its UpdateLocalSessionStatus lambda which forces
localStatus.nTime to update every call; change the lambda in
CDKGDebugManager::MarkAborted to compute a changed flag by comparing
status.statusBits.aborted with the new value, set status.statusBits.aborted =
true, and return that changed flag (i.e., return status.statusBits.aborted was
previously false). This makes MarkAborted idempotent like MarkPhaseAdvanced and
avoids spurious nTime updates.
src/llmq/net_dkg.cpp (1)

449-482: 💤 Low value

Inconsistent dynamic_cast usage between Start() and Interrupt(); consider tightening shutdown.

Start() uses the throwing reference form (dynamic_cast<ActiveDKGSessionHandler&>) while Interrupt() uses the safe pointer form. Both iterate the same handler set and both early-return on m_active == nullptr, so the invariant is identical and the two should agree.

The reference form also has a small resilience gap: if the cast were ever to throw mid-iteration, the threads already pushed into m_phase_threads would never be joined, because ~NetDKG() only calls DisconnectManagers() (line 254), not Stop(). Either use the pointer form here as well, or have the destructor call Stop() defensively so a partially-initialized state still cleans up.

♻️ Proposed alignment with `Interrupt()`
     m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) {
-        auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base);
-        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex());
-        m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] {
-            util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); });
-        });
+        auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base);
+        if (!Assume(handler != nullptr)) return;
+        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type), handler->QuorumIndex());
+        m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] {
+            util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); });
+        });
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 449 - 482, Start() uses
dynamic_cast<ActiveDKGSessionHandler&> which can throw partway through filling
m_phase_threads and leave threads unjoined; make Start() mirror Interrupt() by
using the non-throwing pointer form (dynamic_cast<ActiveDKGSessionHandler*>)
when iterating m_qdkgsman.ForEachHandler so you only create threads for valid
handlers and avoid exceptions during the loop, ensuring m_phase_threads remains
consistent for later Stop() join; update the lambda in NetDKG::Start to check
the pointer, capture it safely, and call PhaseHandlerThread(handler) with the
pointer/ref as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/active/dkgsession.cpp`:
- Around line 106-111: The dkgDebugManager.UpdateLocalSessionStatus calls inside
the message-builder functions (e.g., setting
CDKGDebugSessionStatus::statusBits.sentContributions, sentComplaint,
sentJustification, sentPrematureCommitment) must be removed from those builders
(the functions that build and return qc/messages) and moved into the actual send
path inside NetDKG — i.e., the code that serializes and enqueues/broadcasts the
payload. Locate the UpdateLocalSessionStatus calls in the builders and delete
them there, then add equivalent UpdateLocalSessionStatus updates immediately
after NetDKG performs the serialization/queuing/broadcast so the debug flags
reflect a real successful send.

In `@src/llmq/debug.cpp`:
- Around line 213-228: MarkAborted currently always returns true from its
UpdateLocalSessionStatus lambda which forces localStatus.nTime to update every
call; change the lambda in CDKGDebugManager::MarkAborted to compute a changed
flag by comparing status.statusBits.aborted with the new value, set
status.statusBits.aborted = true, and return that changed flag (i.e., return
status.statusBits.aborted was previously false). This makes MarkAborted
idempotent like MarkPhaseAdvanced and avoids spurious nTime updates.

In `@src/llmq/net_dkg.cpp`:
- Around line 449-482: Start() uses dynamic_cast<ActiveDKGSessionHandler&> which
can throw partway through filling m_phase_threads and leave threads unjoined;
make Start() mirror Interrupt() by using the non-throwing pointer form
(dynamic_cast<ActiveDKGSessionHandler*>) when iterating
m_qdkgsman.ForEachHandler so you only create threads for valid handlers and
avoid exceptions during the loop, ensuring m_phase_threads remains consistent
for later Stop() join; update the lambda in NetDKG::Start to check the pointer,
capture it safely, and call PhaseHandlerThread(handler) with the pointer/ref as
appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 757eb414-ab77-46e9-b643-a3f32d98e788

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd84aa and 53be42b.

📒 Files selected for processing (25)
  • src/Makefile.am
  • src/active/context.cpp
  • src/active/context.h
  • src/active/dkgsession.cpp
  • src/active/dkgsession.h
  • src/active/dkgsessionhandler.cpp
  • src/active/dkgsessionhandler.h
  • src/init.cpp
  • src/llmq/debug.cpp
  • src/llmq/debug.h
  • src/llmq/dkgsession.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionhandler.h
  • src/llmq/dkgsessionmgr.cpp
  • src/llmq/dkgsessionmgr.h
  • src/llmq/net_dkg.cpp
  • src/llmq/net_dkg.h
  • src/llmq/observer.cpp
  • src/llmq/observer.h
  • src/llmq/options.cpp
  • src/llmq/options.h
  • src/net_processing.cpp
  • src/net_processing.h
  • src/test/util/setup_common.cpp
  • test/lint/lint-circular-dependencies.py
💤 Files with no reviewable changes (1)
  • test/lint/lint-circular-dependencies.py

@knst knst force-pushed the refactor-peermanager-handlers-dkg branch from 53be42b to f4b6aae Compare May 10, 2026 18:07
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
src/llmq/net_dkg.cpp (6)

100-110: 💤 Low value

Defensive null-check in fallback verification loop.

In the per-message fallback verification loop (lines 100-110), member is dereferenced at line 106 (member->dmn->pdmnState->pubKeyOperator.Get()) without a null check. The reasoning is that any nodeId whose GetMember() returned nullptr in the first loop was already inserted into ret (line 54) and is filtered by ret.count(nodeId) (line 101). That holds today, but an in-flight membership change between the two GetMember calls (or future refactors) would silently NPE here.

A cheap guard would future-proof this:

🛡️ Proposed defensive check
         auto member = session.GetMember(msg->proTxHash);
+        if (!member) {
+            ret.emplace(nodeId);
+            continue;
+        }
         bool valid = msg->sig.VerifyInsecure(member->dmn->pdmnState->pubKeyOperator.Get(), msg->GetSignHash());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 100 - 110, The fallback verification loop
can dereference a null member returned by session.GetMember; update the loop
that iterates over messages to defensively check the result of
session.GetMember(msg->proTxHash) before using
member->dmn->pdmnState->pubKeyOperator.Get(): if member is null, insert nodeId
into ret and continue, otherwise perform msg->sig.VerifyInsecure(...,
msg->GetSignHash()) as before; this prevents a potential NPE if membership
changes between the two GetMember calls.

466-473: 💤 Low value

Stop() is not idempotent w.r.t. unjoinable threads after early failure.

Stop() calls Interrupt() then joins all threads. If any thread in m_phase_threads is in a non-joinable state (e.g., already joined or std::thread default-constructed because creation failed), t.join() is skipped via the joinable() guard — good. But if a thread is running and throws an unhandled exception before stop is requested, PhaseHandlerThread only catches AbortPhaseException (line 492); any other exception will terminate the thread per std::terminate. The catch-all loop in PhaseHandlerThread does not include a generic catch (...).

Consider adding a final catch (const std::exception&) to log and break the loop cleanly, so Stop() always sees joinable but exited threads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 466 - 473, Stop() can hang on threads that
terminated due to unexpected exceptions because PhaseHandlerThread only catches
AbortPhaseException; add a broad exception handler inside PhaseHandlerThread's
main loop (after the AbortPhaseException catch) that catches const
std::exception& (and optionally catch(...) as a fallback), logs the error with
context, and breaks/returns cleanly so the std::thread object exits normally and
becomes joinable for Stop() to join; reference PhaseHandlerThread,
AbortPhaseException, m_phase_threads and Interrupt() when locating where to add
the catch and logging.

244-244: 💤 Low value

Redundant temporary in m_active construction.

std::make_unique<ActiveDKG>(ActiveDKG{...}) constructs a temporary ActiveDKG and then move/copy-constructs another inside make_unique. You can forward the args directly:

♻️ Proposed simplification
-    m_active{std::make_unique<ActiveDKG>(ActiveDKG{dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman})}
+    m_active{std::make_unique<ActiveDKG>(dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman)}

This requires ActiveDKG to have an aggregate-equivalent constructor or a matching one; if it currently relies on aggregate brace-init, add an explicit constructor.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` at line 244, The m_active member is being initialized
with a redundant temporary via std::make_unique<ActiveDKG>(ActiveDKG{...});
change the call to forward the constructor args directly (e.g.
std::make_unique<ActiveDKG>(dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman,
connman)) so no extra temporary is created, and if ActiveDKG does not currently
have a matching constructor add an explicit constructor on ActiveDKG that
accepts (dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman) to allow
direct in-place construction.

216-255: ⚖️ Poor tradeoff

Two near-identical constructors invite drift.

The observer (lines 216-231) and active (lines 233-253) constructors share initialization of m_qdkgsman, m_qman, m_sporkman, m_chainman, m_quorums_watch, the m_qdkgsman.InitializeHandlers(...) call, and the closing m_qman.ConnectManagers(...) call. The only differences are:

  • m_active (nullptr vs constructed)
  • the lambda type returned by InitializeHandlers (CDKGSessionHandler vs ActiveDKGSessionHandler)

Consider delegating the active constructor to the observer one, or factor common init into a private helper, to keep the two paths (and the ConnectManagers/DisconnectManagers lifecycle) in sync going forward.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 216 - 255, The two NetDKG constructors
duplicate common initialization (m_qdkgsman, m_qman, m_sporkman, m_chainman,
m_quorums_watch, the InitializeHandlers/ConnectManagers lifecycle) and should be
unified: extract the shared setup into a private helper (e.g., InitCommon or
InitializeManagers) that takes a
std::function<std::unique_ptr<CDKGSessionHandler>(const Consensus::LLMQParams&,
int)> or a template/lambda wrapper, or else implement constructor delegation
from the active constructor to the observer constructor and only set m_active
afterward; update the observer constructor to call the helper (or be the
delegated target) and change the active constructor to only construct m_active
and supply the ActiveDKGSessionHandler-producing lambda to the shared
InitializeHandlers call, ensuring m_qman.ConnectManagers and
m_qdkgsman.InitializeHandlers remain in one place.

451-464: ⚡ Quick win

dynamic_cast<ActiveDKGSessionHandler&> at line 458 will throw on type mismatch.

Start() is guarded by m_active == nullptr (line 452), and the active-mode constructor (lines 246-251) installs ActiveDKGSessionHandler instances, so in practice the cast succeeds. However, m_qdkgsman.InitializeHandlers(...) is the only invariant that ties m_active != nullptr to "all handlers are ActiveDKGSessionHandler", and it lives outside this class. If that invariant is ever violated (a future code path that mixes handler types, or a partially initialized state), dynamic_cast on a reference will throw std::bad_cast from inside this lambda chain and may abort the node.

Interrupt() (line 479) already uses the safer pointer form. Consider mirroring that here:

🛡️ Safer cast in `Start()`
     m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) {
-        auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base);
-        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex());
-        m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] {
-            util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); });
-        });
+        auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base);
+        if (!handler) return;
+        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type),
+                                            handler->QuorumIndex());
+        m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] {
+            util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); });
+        });
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 451 - 464, The dynamic_cast in Start()
currently uses dynamic_cast<ActiveDKGSessionHandler&> inside the
m_qdkgsman.ForEachHandler lambda which will throw std::bad_cast on mismatch;
change it to use the pointer form dynamic_cast<ActiveDKGSessionHandler*>
(matching Interrupt()) and check for nullptr before using handler: if the cast
returns nullptr, log or skip creating the phase thread for that handler instead
of dereferencing/throwing, otherwise capture the pointer and pass it to
PhaseHandlerThread; update references in the lambda to use the pointer (e.g.,
handler->QuorumIndex(), PhaseHandlerThread(*handler)) so Start() no longer can
throw from a bad_cast.

257-294: 💤 Low value

vRecv is moved into pm later — make the rewind/peek pattern explicit.

The flow at lines 291-294 reads llmqType (1 byte) and quorumHash (32 bytes) and then rewinds both, with the goal of leaving vRecv untouched so the entire payload (including header bytes) can be hashed and stored at lines 349-352. This works, but:

  • The rewind order (uint256 then uint8_t) is the reverse of read order; that's intentional for Rewind but easy to break in the future.
  • A short comment would make the intent (peek-only) obvious to maintainers.
♻️ Suggested clarifying comment
+    // Peek llmqType + quorumHash without consuming so the full payload can be re-serialized
+    // and hashed below at line 349.
     Consensus::LLMQType llmqType;
     uint256 quorumHash;
     vRecv >> llmqType;
     vRecv >> quorumHash;
     vRecv.Rewind(sizeof(uint256));
     vRecv.Rewind(sizeof(uint8_t));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 257 - 294, The code in
NetDKG::ProcessMessage reads llmqType and quorumHash from vRecv and then calls
vRecv.Rewind in the reverse order to restore the stream for later hashing; make
this intent explicit and less fragile by adding a short comment above the
reads/Rewind calls describing that we are only peeking at the header bytes
(uint8_t then uint256) and that the subsequent Rewind calls intentionally undo
the reads so the full payload remains unchanged for hashing (used later when
moving vRecv into pm). Optionally, reorder the Rewind calls to mirror the read
order (Rewind(sizeof(uint8_t)) then Rewind(sizeof(uint256))) to reduce future
mistakes, while keeping the explanatory comment.
src/llmq/dkgsessionhandler.h (1)

112-121: 💤 Low value

Duplicate public: access specifiers.

Lines 112 and 121 both declare public: with no intervening private:/protected: block, making the second specifier redundant. Consider consolidating into a single section.

♻️ Proposed cleanup
 public:
     const Consensus::LLMQParams& params;

     // Do not guard these, they protect their internals themselves
     CDKGPendingMessages pendingContributions;
     CDKGPendingMessages pendingComplaints;
     CDKGPendingMessages pendingJustifications;
     CDKGPendingMessages pendingPrematureCommitments;

-public:
     explicit CDKGSessionHandler(const Consensus::LLMQParams& _params);
     virtual ~CDKGSessionHandler();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/dkgsessionhandler.h` around lines 112 - 121, The class header
contains two consecutive "public:" access specifiers (surrounding members like
params and CDKGPendingMessages
pendingContributions/pendingComplaints/pendingJustifications/pendingPrematureCommitments),
making the second one redundant; remove the duplicate "public:" and consolidate
these members under a single public section so there is only one "public:"
before params and the pending* members.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llmq/dkgsession.cpp`:
- Around line 627-629: The empty stubs for CDKGSession::FinalizeCommitments()
and CDKGSession::FinalizeSingleCommitment() silently disable producing mineable
commitments in HandleDKGRound; either restore the original finalize logic into
these two methods (port the pre-refactor bodies into
CDKGSession::FinalizeCommitments and CDKGSession::FinalizeSingleCommitment so
they return the actual CFinalCommitment(s) used by
qblockman.AddMineableCommitment(...)), or if this work is intentionally
deferred, replace the stubs with an explicit TODO and a runtime-safe guard: log
an error (process/ulogs) and assert or throw so the failure is visible (do not
silently return {}), referencing the methods by name (FinalizeCommitments and
FinalizeSingleCommitment) and ensuring HandleDKGRound’s expectations are met.

In `@src/llmq/net_dkg.cpp`:
- Around line 384-411: AlreadyHave reports DKG invs seen by observer nodes
(m_quorums_watch) but ProcessGetData currently returns early when m_active ==
nullptr causing watchers to claim possession but refuse getdata; remove the
early "if (m_active == nullptr) return false;" check from NetDKG::ProcessGetData
so that m_qdkgsman.GetContribution/GetComplaint/etc. are dynamically dispatched
(base virtuals will return false in non-serving handlers), and keep or update
the explanatory comment to note that observer handlers will be consulted via
m_qdkgsman even when m_active is null.
- Around line 375-379: The current code calls
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when DoForHandler indicates
no session handler (dispatched == false), which is too punitive; change the
penalty to 10 (i.e., m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10)) or
alternatively remove the ban and replace it with a LogPrint/LogPrintf so
legitimate peers slightly ahead aren't banned—update the branch in NetDKG where
dispatched is checked (the block that logs "NetDKG -- no session handlers for
quorumIndex" and calls PeerMisbehaving) to apply the lower score or silent log
as discussed.

---

Nitpick comments:
In `@src/llmq/dkgsessionhandler.h`:
- Around line 112-121: The class header contains two consecutive "public:"
access specifiers (surrounding members like params and CDKGPendingMessages
pendingContributions/pendingComplaints/pendingJustifications/pendingPrematureCommitments),
making the second one redundant; remove the duplicate "public:" and consolidate
these members under a single public section so there is only one "public:"
before params and the pending* members.

In `@src/llmq/net_dkg.cpp`:
- Around line 100-110: The fallback verification loop can dereference a null
member returned by session.GetMember; update the loop that iterates over
messages to defensively check the result of session.GetMember(msg->proTxHash)
before using member->dmn->pdmnState->pubKeyOperator.Get(): if member is null,
insert nodeId into ret and continue, otherwise perform
msg->sig.VerifyInsecure(..., msg->GetSignHash()) as before; this prevents a
potential NPE if membership changes between the two GetMember calls.
- Around line 466-473: Stop() can hang on threads that terminated due to
unexpected exceptions because PhaseHandlerThread only catches
AbortPhaseException; add a broad exception handler inside PhaseHandlerThread's
main loop (after the AbortPhaseException catch) that catches const
std::exception& (and optionally catch(...) as a fallback), logs the error with
context, and breaks/returns cleanly so the std::thread object exits normally and
becomes joinable for Stop() to join; reference PhaseHandlerThread,
AbortPhaseException, m_phase_threads and Interrupt() when locating where to add
the catch and logging.
- Line 244: The m_active member is being initialized with a redundant temporary
via std::make_unique<ActiveDKG>(ActiveDKG{...}); change the call to forward the
constructor args directly (e.g. std::make_unique<ActiveDKG>(dmnman, mn_metaman,
dkgdbgman, qblockman, qsnapman, connman)) so no extra temporary is created, and
if ActiveDKG does not currently have a matching constructor add an explicit
constructor on ActiveDKG that accepts (dmnman, mn_metaman, dkgdbgman, qblockman,
qsnapman, connman) to allow direct in-place construction.
- Around line 216-255: The two NetDKG constructors duplicate common
initialization (m_qdkgsman, m_qman, m_sporkman, m_chainman, m_quorums_watch, the
InitializeHandlers/ConnectManagers lifecycle) and should be unified: extract the
shared setup into a private helper (e.g., InitCommon or InitializeManagers) that
takes a std::function<std::unique_ptr<CDKGSessionHandler>(const
Consensus::LLMQParams&, int)> or a template/lambda wrapper, or else implement
constructor delegation from the active constructor to the observer constructor
and only set m_active afterward; update the observer constructor to call the
helper (or be the delegated target) and change the active constructor to only
construct m_active and supply the ActiveDKGSessionHandler-producing lambda to
the shared InitializeHandlers call, ensuring m_qman.ConnectManagers and
m_qdkgsman.InitializeHandlers remain in one place.
- Around line 451-464: The dynamic_cast in Start() currently uses
dynamic_cast<ActiveDKGSessionHandler&> inside the m_qdkgsman.ForEachHandler
lambda which will throw std::bad_cast on mismatch; change it to use the pointer
form dynamic_cast<ActiveDKGSessionHandler*> (matching Interrupt()) and check for
nullptr before using handler: if the cast returns nullptr, log or skip creating
the phase thread for that handler instead of dereferencing/throwing, otherwise
capture the pointer and pass it to PhaseHandlerThread; update references in the
lambda to use the pointer (e.g., handler->QuorumIndex(),
PhaseHandlerThread(*handler)) so Start() no longer can throw from a bad_cast.
- Around line 257-294: The code in NetDKG::ProcessMessage reads llmqType and
quorumHash from vRecv and then calls vRecv.Rewind in the reverse order to
restore the stream for later hashing; make this intent explicit and less fragile
by adding a short comment above the reads/Rewind calls describing that we are
only peeking at the header bytes (uint8_t then uint256) and that the subsequent
Rewind calls intentionally undo the reads so the full payload remains unchanged
for hashing (used later when moving vRecv into pm). Optionally, reorder the
Rewind calls to mirror the read order (Rewind(sizeof(uint8_t)) then
Rewind(sizeof(uint256))) to reduce future mistakes, while keeping the
explanatory comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d8950d10-63a8-42c9-9f3f-6963acd754d2

📥 Commits

Reviewing files that changed from the base of the PR and between 53be42b and f4b6aae.

📒 Files selected for processing (16)
  • src/active/context.cpp
  • src/active/context.h
  • src/active/dkgsession.cpp
  • src/init.cpp
  • src/llmq/dkgsession.cpp
  • src/llmq/dkgsession.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionhandler.h
  • src/llmq/dkgsessionmgr.cpp
  • src/llmq/dkgsessionmgr.h
  • src/llmq/net_dkg.cpp
  • src/llmq/net_dkg.h
  • src/llmq/observer.cpp
  • src/llmq/observer.h
  • src/llmq/options.cpp
  • src/llmq/options.h
🚧 Files skipped from review as they are similar to previous changes (12)
  • src/llmq/options.cpp
  • src/llmq/observer.cpp
  • src/llmq/options.h
  • src/llmq/net_dkg.h
  • src/llmq/dkgsessionmgr.cpp
  • src/active/context.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/observer.h
  • src/llmq/dkgsessionmgr.h
  • src/llmq/dkgsession.h
  • src/active/dkgsession.cpp
  • src/init.cpp

Comment thread src/llmq/dkgsession.cpp
Comment on lines +627 to +629
std::vector<CFinalCommitment> CDKGSession::FinalizeCommitments() { return {}; }

CFinalCommitment CDKGSession::FinalizeSingleCommitment() { return {}; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Empty FinalizeCommitments() and FinalizeSingleCommitment() will silently break DKG output.

Both implementations return an empty vector and a default-constructed (null) CFinalCommitment respectively. These are the only paths in HandleDKGRound (see src/llmq/net_dkg.cpp lines 522-527 and 594-599) that produce mineable commitments via qblockman.AddMineableCommitment(...). With these stubs:

  • FinalizeSingleCommitment() returns {} which IsNull() reports true → branch is skipped, no single-member commitment is ever produced.
  • FinalizeCommitments() returns an empty vector → the for loop never executes, no multi-member commitment is ever relayed/mined.

If this is intentional (work-in-progress to be filled in by a follow-up commit/PR), please add a TODO/FIXME and a clear log/assert so the regression is not invisible at runtime; otherwise DKG silently never completes a quorum on this branch.

Could you confirm whether the original (pre-refactor) finalize logic is supposed to land in this PR or in a follow-up? I can help port the previous body if you'd like.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/dkgsession.cpp` around lines 627 - 629, The empty stubs for
CDKGSession::FinalizeCommitments() and CDKGSession::FinalizeSingleCommitment()
silently disable producing mineable commitments in HandleDKGRound; either
restore the original finalize logic into these two methods (port the
pre-refactor bodies into CDKGSession::FinalizeCommitments and
CDKGSession::FinalizeSingleCommitment so they return the actual
CFinalCommitment(s) used by qblockman.AddMineableCommitment(...)), or if this
work is intentionally deferred, replace the stubs with an explicit TODO and a
runtime-safe guard: log an error (process/ulogs) and assert or throw so the
failure is visible (do not silently return {}), referencing the methods by name
(FinalizeCommitments and FinalizeSingleCommitment) and ensuring HandleDKGRound’s
expectations are met.

Comment thread src/llmq/net_dkg.cpp
Comment on lines +375 to +379
if (!dispatched) {
LogPrintf("NetDKG -- no session handlers for quorumIndex [%d]\n", quorumIndex);
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100);
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

PeerMisbehaving(100) for missing handler may be too punitive.

If DoForHandler returns false because no session handler exists for {llmqType, quorumIndex} (e.g., handlers haven't been initialized yet for this combination, or this node hasn't discovered that quorum is mineable), we instantly ban the peer with score 100. Earlier paths use 10 for "we might be lagging behind" cases (e.g., line 319). If a remote peer is slightly ahead, we'd ban legitimate participants.

Consider lowering to 10, or returning silently with a LogPrint, since the peer's message itself was structurally valid up to this point.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 375 - 379, The current code calls
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when DoForHandler indicates
no session handler (dispatched == false), which is too punitive; change the
penalty to 10 (i.e., m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10)) or
alternatively remove the ban and replace it with a LogPrint/LogPrintf so
legitimate peers slightly ahead aren't banned—update the branch in NetDKG where
dispatched is checked (the block that logs "NetDKG -- no session handlers for
quorumIndex" and calls PeerMisbehaving) to apply the lower score or silent log
as discussed.

Comment thread src/llmq/net_dkg.cpp
Comment on lines +384 to +411
bool NetDKG::AlreadyHave(const CInv& inv)
{
switch (inv.type) {
case MSG_QUORUM_CONTRIB:
case MSG_QUORUM_COMPLAINT:
case MSG_QUORUM_JUSTIFICATION:
case MSG_QUORUM_PREMATURE_COMMITMENT: {
if (!IsQuorumDKGEnabled(m_sporkman)) return false;
bool seen = false;
m_qdkgsman.ForEachHandler([&](CDKGSessionHandler& h) {
if (seen) return;
if (h.pendingContributions.HasSeen(inv.hash) || h.pendingComplaints.HasSeen(inv.hash) ||
h.pendingJustifications.HasSeen(inv.hash) || h.pendingPrematureCommitments.HasSeen(inv.hash)) {
seen = true;
}
});
return seen;
}
}
return false;
}

bool NetDKG::ProcessGetData(CNode& pfrom, const CInv& inv, CConnman& connman, const CNetMsgMaker& msgMaker)
{
// Default implementations of GetContribution and the other virtual methods
// return false in observer mode; m_active is only an early exit and does
// not affect logic.
if (m_active == nullptr) return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Asymmetry between AlreadyHave and ProcessGetData for observer-mode (qwatch) nodes.

AlreadyHave (line 391) only gates on IsQuorumDKGEnabled(m_sporkman) and walks every handler's pending sets — observer/m_quorums_watch nodes will report true for DKG inv hashes they've seen.

ProcessGetData (line 411), however, short-circuits to false whenever m_active == nullptr, even though m_quorums_watch == true nodes do receive and forward DKG messages. The result: a watching node will tell peers "I already have this" via AlreadyHave, but then refuse to serve it via getdata. Peers can be left waiting/timing out instead of fetching from another source.

The inline comment ("default implementations of GetContribution... return false in observer mode") is true for the base CDKGSessionHandler, but the dispatch goes through m_qdkgsman.GetContribution(...) which dynamic-dispatches to the active handler in active mode. For pure m_quorums_watch=true (no m_active) the base virtuals do return false anyway — so the early return is redundant for that case and harmful for the truly-observer case.

Consider either:

  1. Removing the m_active == nullptr early return and relying on the virtual dispatch to return false naturally, or
  2. Symmetrically gating AlreadyHave so it returns false when m_active == nullptr.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 384 - 411, AlreadyHave reports DKG invs
seen by observer nodes (m_quorums_watch) but ProcessGetData currently returns
early when m_active == nullptr causing watchers to claim possession but refuse
getdata; remove the early "if (m_active == nullptr) return false;" check from
NetDKG::ProcessGetData so that m_qdkgsman.GetContribution/GetComplaint/etc. are
dynamically dispatched (base virtuals will return false in non-serving
handlers), and keep or update the explanatory comment to note that observer
handlers will be consulted via m_qdkgsman even when m_active is null.

@PastaPastaPasta PastaPastaPasta marked this pull request as draft May 12, 2026 13:53
@github-actions
Copy link
Copy Markdown

This pull request has conflicts, please rebase.

@knst knst force-pushed the refactor-peermanager-handlers-dkg branch 2 times, most recently from 248ccaf to c1c4e2a Compare May 12, 2026 14:22
@github-actions
Copy link
Copy Markdown

This pull request has conflicts, please rebase.

knst added 12 commits May 19, 2026 21:07
It shows the hidden circular dependency and tidy up list of includes
- removed method CDKGPendingMessages::Misbehaving(NodeId, int, PeerManager&), ProcessPendingMessageBatch calls peerman.Misbehaving(...) directly
- renamed PushPendingMessage<Message>(NodeId, Message&, PeerManager&) to PushOwnPendingMessage for clear distinction of path with node=-1 (self made)
…from PeerManager

Re-ordered initialization of PeerManager and ActiveContext / ObserverContext, PeerManager::make now takes nodeman raw ptr (or nullptr).

It resolves several circular dependencies over net_processing and removes several unique_ptr<T&> work-arounds from PeerManager
It helps to drop dependency of llmq/dkgsessionhandler on network code
 - moved implementation of ProcessMessage and AlreadyHave to NetDKG
 - drop usages of MessageProcessingResult in CDKGSessionManager
 - introduced a new helper DoForHandler
@knst knst force-pushed the refactor-peermanager-handlers-dkg branch from c1c4e2a to 22d5cec Compare May 19, 2026 14:09
@knst knst marked this pull request as ready for review May 19, 2026 14:09
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/llmq/net_dkg.cpp (2)

375-379: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Penalty of 100 for missing handler may be too punitive.

At line 319, a penalty of 10 is applied for unknown quorumHash with the rationale "we might be lagging behind." The same rationale applies here: if DoForHandler returns false because no session handler exists for {llmqType, quorumIndex}, this node may simply not have initialized handlers for that quorum yet.

Using 100 here risks instantly banning legitimate participants who are slightly ahead.

Proposed fix
     if (!dispatched) {
         LogPrintf("NetDKG -- no session handlers for quorumIndex [%d]\n", quorumIndex);
-        m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100);
+        // NOTE: do not insta-ban, we might be lagging behind
+        m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10);
         return;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 375 - 379, The current NetDKG branch
penalizes peers with m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when
dispatched is false; lower the penalty to match the earlier unknown quorumHash
case (use 10) and keep the log message, because absence of a handler likely
means we're lagging rather than malicious behavior—modify the call in the NetDKG
handler where dispatched is false to use 10 instead of 100 (the symbols to
change: variable quorumIndex, boolean dispatched, and method
m_peer_manager->PeerMisbehaving).

384-411: ⚠️ Potential issue | 🟡 Minor | ⚖️ Poor tradeoff

Asymmetry between AlreadyHave and ProcessGetData for observer-mode nodes.

AlreadyHave (line 391) only gates on IsQuorumDKGEnabled(m_sporkman) and walks every handler's pending sets, so observer nodes (m_quorums_watch=true, m_active=nullptr) will report true for DKG inv hashes they've seen.

ProcessGetData (line 411) short-circuits to false when m_active == nullptr, meaning observer nodes will tell peers "I already have this" but then refuse to serve it via getdata.

Consider either:

  1. Removing the m_active == nullptr early return and relying on virtual dispatch to return false naturally, or
  2. Symmetrically gating AlreadyHave to return false when m_active == nullptr.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 384 - 411, AlreadyHave reports DKG invs as
present for observer nodes while ProcessGetData refuses to serve them when
m_active == nullptr, causing asymmetry; to fix, add a symmetric guard in
NetDKG::AlreadyHave that returns false when m_active == nullptr (or when
m_quorums_watch indicates observer mode) before calling
IsQuorumDKGEnabled/ForEachHandler so observers won't claim they have items they
won't serve; alternatively remove the early return in NetDKG::ProcessGetData so
virtual handlers decide, but prefer adding the m_active/nullptr check in
AlreadyHave to preserve current ProcessGetData behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/llmq/net_dkg.cpp`:
- Around line 375-379: The current NetDKG branch penalizes peers with
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when dispatched is false;
lower the penalty to match the earlier unknown quorumHash case (use 10) and keep
the log message, because absence of a handler likely means we're lagging rather
than malicious behavior—modify the call in the NetDKG handler where dispatched
is false to use 10 instead of 100 (the symbols to change: variable quorumIndex,
boolean dispatched, and method m_peer_manager->PeerMisbehaving).
- Around line 384-411: AlreadyHave reports DKG invs as present for observer
nodes while ProcessGetData refuses to serve them when m_active == nullptr,
causing asymmetry; to fix, add a symmetric guard in NetDKG::AlreadyHave that
returns false when m_active == nullptr (or when m_quorums_watch indicates
observer mode) before calling IsQuorumDKGEnabled/ForEachHandler so observers
won't claim they have items they won't serve; alternatively remove the early
return in NetDKG::ProcessGetData so virtual handlers decide, but prefer adding
the m_active/nullptr check in AlreadyHave to preserve current ProcessGetData
behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8ef86bf0-4e53-424c-a970-64325e97fd7d

📥 Commits

Reviewing files that changed from the base of the PR and between f4b6aae and 22d5cec.

📒 Files selected for processing (26)
  • src/Makefile.am
  • src/active/context.cpp
  • src/active/context.h
  • src/active/dkgsession.cpp
  • src/active/dkgsession.h
  • src/active/dkgsessionhandler.cpp
  • src/active/dkgsessionhandler.h
  • src/init.cpp
  • src/llmq/debug.cpp
  • src/llmq/debug.h
  • src/llmq/dkgsession.cpp
  • src/llmq/dkgsession.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionhandler.h
  • src/llmq/dkgsessionmgr.cpp
  • src/llmq/dkgsessionmgr.h
  • src/llmq/net_dkg.cpp
  • src/llmq/net_dkg.h
  • src/llmq/observer.cpp
  • src/llmq/observer.h
  • src/llmq/options.cpp
  • src/llmq/options.h
  • src/net_processing.cpp
  • src/net_processing.h
  • src/test/util/setup_common.cpp
  • test/lint/lint-circular-dependencies.py
💤 Files with no reviewable changes (1)
  • test/lint/lint-circular-dependencies.py
🚧 Files skipped from review as they are similar to previous changes (20)
  • src/llmq/options.h
  • src/llmq/observer.cpp
  • src/Makefile.am
  • src/llmq/debug.h
  • src/llmq/observer.h
  • src/llmq/options.cpp
  • src/active/context.h
  • src/llmq/dkgsessionmgr.h
  • src/active/dkgsession.h
  • src/active/dkgsessionhandler.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionmgr.cpp
  • src/llmq/dkgsession.cpp
  • src/llmq/dkgsession.h
  • src/net_processing.h
  • src/llmq/net_dkg.h
  • src/init.cpp
  • src/active/dkgsession.cpp
  • src/llmq/dkgsessionhandler.h
  • src/net_processing.cpp

Copy link
Copy Markdown

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Refactor cleanly splits DKG network handling into NetDKG/NetDKGStub and reduces CDKGSessionManager to a state holder; no blocking issues found. The notable regression is that on the active-masternode path the startup tip is delivered to CDSNotificationInterface twice. Remaining items are minor cleanups and an architectural suggestion.

Reviewed commit: 22d5cec

🟡 3 suggestion(s) | 💬 3 nitpick(s)

6 finding(s) posted in the review body

suggestion: Active-node startup runs CDSNotificationInterface::InitializeCurrentBlockTip twice

src/init.cpp (lines 2451-2540)

On the active-masternode path the import thread first calls g_ds_notification_interface->InitializeCurrentBlockTip(tip, ibd) directly at line 2457, then later calls GetMainSignals().InitializeCurrentBlockTip(tip, ibd) at line 2540 after nodeman->Init(). Because g_ds_notification_interface is registered as a CValidationInterface at init.cpp:2213, the broadcast at line 2540 dispatches back through CDSNotificationInterface::InitializeCurrentBlockTip, which in turn invokes SynchronousUpdatedBlockTip and UpdatedBlockTip a second time for the same tip — re-running CDeterministicMNManager::UpdatedBlockTip, CMasternodeSync::UpdatedBlockTip, CDSTXManager::UpdatedBlockTip, and CGovernanceManager::UpdatedBlockTip. Either temporarily unregister g_ds_notification_interface around the second broadcast, or have the second call notify only the subscribers that depend on a valid proTxHash.

nitpick: Duplicate forward declarations in dkgsessionmgr.h

src/llmq/dkgsessionmgr.h (lines 35-45)

Five classes are forward-declared twice in this header block (CDKGSessionHandler, CDKGContribution, CDKGComplaint, CDKGJustification, CDKGPrematureCommitment). The second copies appear after class CQuorumSnapshotManager; and look like an accidental copy/paste from the include-cleanup commit. Legal but noisy.

💡 Suggested change
class CDKGComplaint;
class CDKGContribution;
class CDKGJustification;
class CDKGPrematureCommitment;
class CDKGSessionHandler;
class CQuorumSnapshotManager;
suggestion: DKG inv types now go through MSG_DSQ's cj_walletman lookup before reaching handlers

src/net_processing.cpp (lines 2378-2387)

The combined switch arm groups MSG_QUORUM_CONTRIB/COMPLAINT/JUSTIFICATION/PREMATURE_COMMITMENT with MSG_DSQ, so every DKG inv first calls m_cj_walletman->hasQueue(inv.hash) before iterating m_handlers. The hash spaces are independent in practice so a false positive is astronomically unlikely, but the structural coupling is misleading and the lookup is wasted work on every DKG inv (hot during DKG phases). Either split MSG_DSQ into its own case, or iterate m_handlers first and let the CoinJoin pre-check live behind MSG_DSQ.

suggestion: NetDKG has no destructor that joins m_phase_threads

src/llmq/net_dkg.cpp (lines 466-473)

NetDKG::Stop() joins m_phase_threads and is currently invoked through PeerManagerImpl::RemoveHandlers() before ~NetDKG runs. If a NetDKG is ever destroyed outside that PeerManager-owned lifecycle (tests, future refactors), the base ~NetHandler runs only the static Interrupt()/Stop() (derived vtable is gone), and the phase threads — which capture references to ActiveDKG-owned objects (dmnman, mn_metaman, qsnapman, connman) — outlive their captured state. A ~NetDKG() that calls Stop() explicitly would make this robust to lifecycle changes.

nitpick: Unreachable guard: NetDKG is never installed with !is_masternode && !m_quorums_watch

src/llmq/net_dkg.cpp (lines 278-282)

At init.cpp:2232-2245, NetDKG is only registered when active_ctx is set (active masternode, m_active != nullptr) or observer_ctx is set (observer mode, constructed with quorums_watch=true); everything else gets NetDKGStub. The !is_masternode && !m_quorums_watch branch therefore cannot be reached on a live NetDKG. Either drop it and rely on NetDKGStub, or add a short comment explaining it as defense-in-depth.

nitpick: No targeted tests for NetDKG/NetDKGStub misbehavior contract

src/llmq/net_dkg.h (lines 52-123)

Existing functional DKG tests cover the happy paths, but the new NetDKG/NetDKGStub split introduces distinct misbehavior scores (10 vs 100) depending on mode (stub vs observer-without-watch vs QWATCH-from-non-masternode). A small unit test exercising each PeerMisbehaving branch would prevent future regressions in the ban-score contract.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

- [SUGGESTION] In `src/init.cpp`:2451-2540: Active-node startup runs CDSNotificationInterface::InitializeCurrentBlockTip twice
  On the active-masternode path the import thread first calls `g_ds_notification_interface->InitializeCurrentBlockTip(tip, ibd)` directly at line 2457, then later calls `GetMainSignals().InitializeCurrentBlockTip(tip, ibd)` at line 2540 after `nodeman->Init()`. Because `g_ds_notification_interface` is registered as a `CValidationInterface` at init.cpp:2213, the broadcast at line 2540 dispatches back through `CDSNotificationInterface::InitializeCurrentBlockTip`, which in turn invokes `SynchronousUpdatedBlockTip` and `UpdatedBlockTip` a second time for the same tip — re-running `CDeterministicMNManager::UpdatedBlockTip`, `CMasternodeSync::UpdatedBlockTip`, `CDSTXManager::UpdatedBlockTip`, and `CGovernanceManager::UpdatedBlockTip`. Either temporarily unregister `g_ds_notification_interface` around the second broadcast, or have the second call notify only the subscribers that depend on a valid proTxHash.
- [NITPICK] In `src/llmq/dkgsessionmgr.h`:35-45: Duplicate forward declarations in dkgsessionmgr.h
  Five classes are forward-declared twice in this header block (CDKGSessionHandler, CDKGContribution, CDKGComplaint, CDKGJustification, CDKGPrematureCommitment). The second copies appear after `class CQuorumSnapshotManager;` and look like an accidental copy/paste from the include-cleanup commit. Legal but noisy.
- [SUGGESTION] In `src/net_processing.cpp`:2378-2387: DKG inv types now go through MSG_DSQ's cj_walletman lookup before reaching handlers
  The combined switch arm groups MSG_QUORUM_CONTRIB/COMPLAINT/JUSTIFICATION/PREMATURE_COMMITMENT with MSG_DSQ, so every DKG inv first calls `m_cj_walletman->hasQueue(inv.hash)` before iterating `m_handlers`. The hash spaces are independent in practice so a false positive is astronomically unlikely, but the structural coupling is misleading and the lookup is wasted work on every DKG inv (hot during DKG phases). Either split MSG_DSQ into its own case, or iterate `m_handlers` first and let the CoinJoin pre-check live behind MSG_DSQ.
- [SUGGESTION] In `src/llmq/net_dkg.cpp`:466-473: NetDKG has no destructor that joins m_phase_threads
  `NetDKG::Stop()` joins `m_phase_threads` and is currently invoked through `PeerManagerImpl::RemoveHandlers()` before `~NetDKG` runs. If a `NetDKG` is ever destroyed outside that PeerManager-owned lifecycle (tests, future refactors), the base `~NetHandler` runs only the static `Interrupt()/Stop()` (derived vtable is gone), and the phase threads — which capture references to `ActiveDKG`-owned objects (dmnman, mn_metaman, qsnapman, connman) — outlive their captured state. A `~NetDKG()` that calls `Stop()` explicitly would make this robust to lifecycle changes.
- [NITPICK] In `src/llmq/net_dkg.cpp`:278-282: Unreachable guard: NetDKG is never installed with !is_masternode && !m_quorums_watch
  At init.cpp:2232-2245, NetDKG is only registered when `active_ctx` is set (active masternode, `m_active != nullptr`) or `observer_ctx` is set (observer mode, constructed with `quorums_watch=true`); everything else gets NetDKGStub. The `!is_masternode && !m_quorums_watch` branch therefore cannot be reached on a live NetDKG. Either drop it and rely on NetDKGStub, or add a short comment explaining it as defense-in-depth.
- [NITPICK] In `src/llmq/net_dkg.h`:52-123: No targeted tests for NetDKG/NetDKGStub misbehavior contract
  Existing functional DKG tests cover the happy paths, but the new NetDKG/NetDKGStub split introduces distinct misbehavior scores (10 vs 100) depending on mode (stub vs observer-without-watch vs QWATCH-from-non-masternode). A small unit test exercising each PeerMisbehaving branch would prevent future regressions in the ban-score contract.

Inline posting failed; posted verified findings in the top-level review body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants