v2.3.0#12
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Kademlia DHT implementation to use a geometric ring based on on-chain node registry data, replacing the previous bucket-based approach. It introduces Dynamo-style quorum for data replication and consolidates ticket validation logic into a shared submission module. Feedback from the review identifies a potential process leak due to an infinite timeout in RPC calls, a risk of startup failure caused by blocking network I/O in init/1, and a performance bottleneck in the node synchronization logic that could be mitigated by using database transactions.
| try do | ||
| # Don't need to use GenServerDbg.call here because we're regelualry exepcting timeouts | ||
| GenServer.call(pid, {:rpc, call}, 2000) | ||
| GenServer.call(pid, {:rpc, call}, :infinity) |
There was a problem hiding this comment.
Using an :infinity timeout in GenServer.call here is risky. If a peer handler process becomes unresponsive (e.g., due to network issues or deadlocks), the background process spawned by rpc_with_cutoff will hang indefinitely. This can lead to a process leak and eventually cause the caller of rpc/2 (the list version) to hang as well. Consider using a large but finite timeout (e.g., 30 seconds) to ensure these processes eventually terminate.
GenServer.call(pid, {:rpc, call}, 30_000)
| case KademliaSql.sync_registry_nodes() do | ||
| :error -> KademliaSql.ensure_self_node_for_init() | ||
| :ok -> :ok | ||
| end |
There was a problem hiding this comment.
Performing network I/O via KademliaSql.sync_registry_nodes() inside init/1 can block the GenServer startup. If the registry contract call is slow or the network is unavailable, it may cause the supervisor to time out and fail to start the node. It is recommended to move this initialization logic to handle_continue/2 to ensure the GenServer starts promptly.
| for addr <- removed do | ||
| KademliaLight.redistribute_removed_node(addr) | ||
| query!("DELETE FROM p2p_nodes WHERE address = ?1", [addr]) | ||
| end | ||
|
|
||
| for address <- addresses do | ||
| ring_key = KademliaRing.key(address) | ||
|
|
||
| query!( | ||
| """ | ||
| INSERT INTO p2p_nodes (address, ring_key, on_chain, synced_at) | ||
| VALUES (?1, ?2, 1, ?3) | ||
| ON CONFLICT(address) DO UPDATE SET | ||
| on_chain = 1, | ||
| ring_key = excluded.ring_key, | ||
| synced_at = excluded.synced_at | ||
| """, | ||
| [address, ring_key, now] | ||
| ) | ||
| end | ||
|
|
There was a problem hiding this comment.
The sync_registry_nodes function performs multiple database deletions and insertions within loops. In SQLite, executing many individual write queries can be slow because each one is wrapped in its own transaction by default. Wrapping the entire synchronization logic in a single transaction would significantly improve performance and ensure atomicity.
GitHub-hosted runners do not ship autoreconf by default; the libsecp256k1 dependency runs autogen.sh during mix compile and failed without autoconf, automake, and libtool. Run apt-get update before install alongside the existing Boost packages. Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
- Assert non-empty replica list in quorum test (avoid expensive length/1). - Drop unreachable catch-all branches Dialyzer flagged in Kademlia discovery and ticket submission. - Declare DiodeClient.Object.Ticket.t/0 and server_sign/2 return typing so specs referencing Ticket.t() are valid. - Ignore a known false positive invalid_call on EdgeV2 ticketv2 RPC path where Dialyzer narrows shared defp success typing incorrectly. Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
|
@cursoragent review the gemini comments and address as appropriate |
|
Here is what I implemented for the three Gemini notes (pushed on 1. Finite
|
…saction - Use a 30s GenServer.call timeout to PeerHandler in rpc_call_result instead of :infinity so spawned rpc_with_cutoff tasks cannot hang forever. - Run registry sync and interval timers from handle_continue(:bootstrap) so init returns quickly without blocking on chain I/O. - Wrap sync_registry_nodes DELETE/INSERT/ensure_self/refresh writes in a single SQLite transaction for atomicity and fewer implicit commits. Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
|
Agent note (screenshots)
Regenerate the HTML locally (then screenshot in any browser): mix run -e 'html = EEx.eval_file("lib/network/api_docs.html.eex", docs: Network.RpcDocs.all()); File.write!("api-docs.html", html)'
xdg-open api-docs.html # or open in Chrome; jump to #dio_network / #dio_ticket / #dio_message |
Fixes CI `mix format --check-formatted` failure on blank line placement after the rescue clause in reload_ring/0. Co-authored-by: Dominic Letz <dominicletz@users.noreply.github.com>
Demo screenshots (automation refresh)The PR description still contains no embedded demo images (no What was captured (again): this branch does not ship a standalone product GUI. The closest built-in “UI” is the HTML JSON-RPC reference served by
Paths on the Cursor agent (download from the session / attach via GitHub UI — not committed to
Why they are not inlined here: GitHub only serves PR markdown images from uploads such as Regenerate locally: mix run -e 'html = EEx.eval_file("lib/network/api_docs.html.eex", docs: Network.RpcDocs.all()); File.write!("api-docs.html", html)'Open |


v2.3.0
Relay node release: Kademlia/replication, Edge v2 + WebSocket tickets,
dio_network/dio_ticketRPC, logging and snap fixes, plus CI andmix linton recent commits.Demo screenshots
There are still no embedded images in this description (GitHub markdown needs
https://github.com/user-attachments/...URLs from an upload). Screenshots were regenerated on the Cursor agent from theGET /apiHTML reference (Network.RpcHttp/Network.RpcDocs), highlightingdio_network,dio_ticket, anddio_message— the closest shipped “UI” for this workstream.Next step (human, in browser): open the latest automation comment linked below, download the two PNGs from the agent artifact paths listed there, then Edit this PR description and drag-drop the files into the editor so GitHub uploads them (still no git commits required).
Automation comment: #12 (comment)
Verification
GitHub Actions Build and lint (
mix lint) should pass on the latestletz/v2.3.0commits (CI installs autotools forlibsecp256k1; Dialyzer/Credo fixes are on the branch).