feat: step1 protocol hardening improvements #58
feat: step1 protocol hardening improvements #58Zahnentferner merged 4 commits intoStabilityNexus:mainfrom
Conversation
WalkthroughThe PR refactors transaction mempool storage to use per-sender nonce-based queues for deterministic ordering, introduces canonical JSON serialization for stable hashing across blocks and transactions, enhances P2P network with message validation and deduplication logic, and adds miner parameter support to block broadcasts. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main.py (1)
99-123:⚠️ Potential issue | 🔴 CriticalDo not derive the mining reward recipient from unauthenticated wire metadata.
Lines 118-119 credit
payload["miner"], butmineris not part ofBlock.to_header_dict()/Block.compute_hash()inminichain/block.py. Any peer can rewrite that field on the wire and redirect the reward for an otherwise valid block; because dedup keys on block hash, the first spoofed copy that arrives wins. Put the reward recipient inside the hash-critical block contents or represent it as a coinbase transaction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@main.py` around lines 99 - 123, The code currently credits mining rewards from unauthenticated wire metadata (payload.get("miner", ...)) after chain.add_block; to fix, stop reading miner from payload and instead derive the recipient from data that is included in the block hash (either add a miner field to the Block model and include it in Block.to_header_dict()/Block.compute_hash, or require a coinbase Transaction as the first transaction and extract the recipient from block.transactions[0]); update the Block construction here (use the new miner field or validate/extract coinbase) and then call chain.state.credit_mining_reward with that authenticated recipient only after chain.add_block succeeds, removing any use of payload["miner"] in this handler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/contract.py`:
- Around line 21-22: The except block that currently swallows (OSError,
ValueError) and calls logger.warning("Failed to set resource limits: %s", e)
must instead fail closed: change this handler so it returns/raises an error back
to the caller rather than continuing; specifically, in the function that sets OS
resource limits (the try/except surrounding "Failed to set resource limits" and
logger.warning) re-raise the exception or return a failure result so
ContractMachine.execute() observes the failure and returns false/error to the
parent (do not leave the logger.warning-only behavior). Ensure callers of that
function (notably ContractMachine.execute()) propagate the failure so execution
aborts instead of running unsandboxed.
In `@minichain/mempool.py`:
- Around line 90-98: The current _remove_transactions_unlocked method removes tx
IDs from self._seen_tx_ids which lets confirmed transactions be re-admitted;
change this so _remove_transactions_unlocked does not discard tx IDs. Instead
add/maintain a separate recent_confirmed_tx_ids cache (e.g.,
self._recent_confirmed_tx_ids) and, when clearing pending entries in
_remove_transactions_unlocked, insert the tx_id into that cache (or simply omit
modifying _seen_tx_ids here). Alternatively, only call
self._seen_tx_ids.discard(tx_id) from an explicit rollback/requeue method
(create a clear_tx_id_or_requeue method) and leave _remove_transactions_unlocked
unchanged with respect to seen IDs. Ensure references: update
_remove_transactions_unlocked, _seen_tx_ids usage, and any code paths that
currently rely on discarding IDs (confirmations path in main.py) to use the new
recent_confirmed_tx_ids or the explicit rollback method.
- Around line 76-83: The global sort on selected (key=lambda tx: (tx.timestamp,
tx.sender, tx.nonce)) can reorder contiguous per-sender sequences; instead,
build per-sender ready lists by walking self._pending_by_sender using
_expected_nonce_for_sender (as already done), then perform a k-way merge of
those per-sender lists so each sender's internal nonce order is preserved while
interleaving senders by timestamp; implement the merge (e.g., using heapq with
entries containing (next_tx.timestamp, sender_id, index) so you always pop the
earliest timestamp but advance that sender's sequence) and then call
_remove_transactions_unlocked on the merged list.
In `@minichain/p2p.py`:
- Around line 262-275: The code currently calls _mark_seen(msg_type, payload)
before awaiting self._handler_callback, causing failed handlers to permanently
mark messages as seen; change this flow to use a separate in-flight marker (e.g.
a new set _inflight_messages or helper methods like _mark_inflight and
_clear_inflight) so you: 1) check duplicates using both _is_duplicate and the
in-flight set, 2) add the message to the in-flight set before awaiting
self._handler_callback, 3) on successful await call _mark_seen(msg_type,
payload) and remove the in-flight marker, and 4) on exception remove the
in-flight marker (do not call _mark_seen) and rethrow/log as currently done;
update any duplicate-check logic to consult the in-flight marker so concurrent
deliveries are suppressed while processing.
- Around line 37-38: Replace the unbounded sets _seen_tx_ids and
_seen_block_hashes with bounded, evicting caches (e.g., cachetools.TTLCache or
LRUCache with a reasonable maxsize/ttl) to prevent unbounded memory growth; also
stop keying block deduplication off the peer-supplied "hash" field in the block
handler and instead recompute the block's canonical hash from the validated
block contents (use your Block.compute_hash()/recompute_hash(...) function or
validate_block(...) return) and insert that recomputed hash into the bounded
cache; ensure transaction dedupe (_seen_tx_ids) also uses a bounded cache and
that all insert/check operations reference these new caches instead of the old
sets.
---
Outside diff comments:
In `@main.py`:
- Around line 99-123: The code currently credits mining rewards from
unauthenticated wire metadata (payload.get("miner", ...)) after chain.add_block;
to fix, stop reading miner from payload and instead derive the recipient from
data that is included in the block hash (either add a miner field to the Block
model and include it in Block.to_header_dict()/Block.compute_hash, or require a
coinbase Transaction as the first transaction and extract the recipient from
block.transactions[0]); update the Block construction here (use the new miner
field or validate/extract coinbase) and then call
chain.state.credit_mining_reward with that authenticated recipient only after
chain.add_block succeeds, removing any use of payload["miner"] in this handler.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a911ffcb-d8ce-47f2-9984-65646d4c341c
📒 Files selected for processing (9)
main.pyminichain/block.pyminichain/contract.pyminichain/mempool.pyminichain/p2p.pyminichain/pow.pyminichain/serialization.pyminichain/transaction.pytests/test_protocol_hardening.py
| except (OSError, ValueError) as e: | ||
| logger.error(f"Failed to set resource limits: {e}") | ||
| raise RuntimeError(f"Failed to set resource limits: {e}") | ||
| logger.warning("Failed to set resource limits: %s", e) |
There was a problem hiding this comment.
Fail closed if OS resource limits cannot be applied.
Line 22 now turns sandbox setup failure into a warning and still executes untrusted contract code. Because minichain/state.py:112-126 only consumes the boolean result from ContractMachine.execute(), this path silently runs contracts without the intended OS-level limits, and tests/test_contract.py:67-88 does not cover that downgrade. Return an error to the parent and abort execution instead of continuing unsandboxed.
Proposed fix
except ImportError:
logger.warning("Resource module not available. Contract will run without OS-level resource limits.")
except (OSError, ValueError) as e:
- logger.warning("Failed to set resource limits: %s", e)
+ result_queue.put({
+ "status": "error",
+ "error": f"failed to set resource limits: {e}",
+ })
+ return
exec(code, globals_dict, context_dict)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except (OSError, ValueError) as e: | |
| logger.error(f"Failed to set resource limits: {e}") | |
| raise RuntimeError(f"Failed to set resource limits: {e}") | |
| logger.warning("Failed to set resource limits: %s", e) | |
| except ImportError: | |
| logger.warning("Resource module not available. Contract will run without OS-level resource limits.") | |
| except (OSError, ValueError) as e: | |
| result_queue.put({ | |
| "status": "error", | |
| "error": f"failed to set resource limits: {e}", | |
| }) | |
| return | |
| exec(code, globals_dict, context_dict) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@minichain/contract.py` around lines 21 - 22, The except block that currently
swallows (OSError, ValueError) and calls logger.warning("Failed to set resource
limits: %s", e) must instead fail closed: change this handler so it
returns/raises an error back to the caller rather than continuing; specifically,
in the function that sets OS resource limits (the try/except surrounding "Failed
to set resource limits" and logger.warning) re-raise the exception or return a
failure result so ContractMachine.execute() observes the failure and returns
false/error to the parent (do not leave the logger.warning-only behavior).
Ensure callers of that function (notably ContractMachine.execute()) propagate
the failure so execution aborts instead of running unsandboxed.
| for sender, sender_queue in self._pending_by_sender.items(): | ||
| expected_nonce = self._expected_nonce_for_sender(sender, state) | ||
| while expected_nonce in sender_queue: | ||
| selected.append(sender_queue[expected_nonce]) | ||
| expected_nonce += 1 | ||
|
|
||
| selected.sort(key=lambda tx: (tx.timestamp, tx.sender, tx.nonce)) | ||
| self._remove_transactions_unlocked(selected) |
There was a problem hiding this comment.
The final global sort still allows nonce reordering for a single sender.
Line 82 sorts by (tx.timestamp, tx.sender, tx.nonce). If one sender has nonce 1 with an earlier timestamp than nonce 0, the selected block order becomes [1, 0] even though the queue was assembled contiguously. Preserve each sender's ready sequence when merging instead of globally resorting raw transactions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@minichain/mempool.py` around lines 76 - 83, The global sort on selected
(key=lambda tx: (tx.timestamp, tx.sender, tx.nonce)) can reorder contiguous
per-sender sequences; instead, build per-sender ready lists by walking
self._pending_by_sender using _expected_nonce_for_sender (as already done), then
perform a k-way merge of those per-sender lists so each sender's internal nonce
order is preserved while interleaving senders by timestamp; implement the merge
(e.g., using heapq with entries containing (next_tx.timestamp, sender_id, index)
so you always pop the earliest timestamp but advance that sender's sequence) and
then call _remove_transactions_unlocked on the merged list.
| def _remove_transactions_unlocked(self, transactions): | ||
| for tx in transactions: | ||
| tx_id = self._get_tx_id(tx) | ||
| sender_queue = self._pending_by_sender.get(tx.sender) | ||
| if sender_queue and tx.nonce in sender_queue: | ||
| del sender_queue[tx.nonce] | ||
| if not sender_queue: | ||
| del self._pending_by_sender[tx.sender] | ||
| self._seen_tx_ids.discard(tx_id) |
There was a problem hiding this comment.
Removing a transaction should not automatically forget its ID.
Line 98 discards the tx_id, and main.py Lines 121-122 now call this path for confirmed block transactions. That makes a confirmed transaction admissible again through any ingress path that bypasses the network-layer seen set, which can reintroduce stale low-nonce junk or duplicates. Keep a separate recent-confirmed cache, or only clear IDs on an explicit rollback/requeue path.
🧰 Tools
🪛 Ruff (0.15.5)
[warning] 90-90: Missing return type annotation for private function _remove_transactions_unlocked
Add return type annotation: None
(ANN202)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@minichain/mempool.py` around lines 90 - 98, The current
_remove_transactions_unlocked method removes tx IDs from self._seen_tx_ids which
lets confirmed transactions be re-admitted; change this so
_remove_transactions_unlocked does not discard tx IDs. Instead add/maintain a
separate recent_confirmed_tx_ids cache (e.g., self._recent_confirmed_tx_ids)
and, when clearing pending entries in _remove_transactions_unlocked, insert the
tx_id into that cache (or simply omit modifying _seen_tx_ids here).
Alternatively, only call self._seen_tx_ids.discard(tx_id) from an explicit
rollback/requeue method (create a clear_tx_id_or_requeue method) and leave
_remove_transactions_unlocked unchanged with respect to seen IDs. Ensure
references: update _remove_transactions_unlocked, _seen_tx_ids usage, and any
code paths that currently rely on discarding IDs (confirmations path in main.py)
to use the new recent_confirmed_tx_ids or the explicit rollback method.
| self._seen_tx_ids = set() | ||
| self._seen_block_hashes = set() |
There was a problem hiding this comment.
Do not keep grow-forever dedup sets keyed by unvalidated block hashes.
self._seen_tx_ids and self._seen_block_hashes never evict entries. On the block path, Line 218 keys the cache off the peer-supplied hash field before semantic validation, so one attacker can stream arbitrary unique hashes and retain them forever while still driving the block handler path. Use a bounded TTL/LRU cache and derive the block key from recomputed block contents instead of trusting the wire hash.
Also applies to: 214-228
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@minichain/p2p.py` around lines 37 - 38, Replace the unbounded sets
_seen_tx_ids and _seen_block_hashes with bounded, evicting caches (e.g.,
cachetools.TTLCache or LRUCache with a reasonable maxsize/ttl) to prevent
unbounded memory growth; also stop keying block deduplication off the
peer-supplied "hash" field in the block handler and instead recompute the
block's canonical hash from the validated block contents (use your
Block.compute_hash()/recompute_hash(...) function or validate_block(...) return)
and insert that recomputed hash into the bounded cache; ensure transaction
dedupe (_seen_tx_ids) also uses a bounded cache and that all insert/check
operations reference these new caches instead of the old sets.
| msg_type = data["type"] | ||
| payload = data["data"] | ||
| if self._is_duplicate(msg_type, payload): | ||
| logger.info("Network: Duplicate %s ignored from %s", msg_type, addr) | ||
| continue | ||
| self._mark_seen(msg_type, payload) | ||
|
|
||
| if self._handler_callback: | ||
| try: | ||
| await self._handler_callback(data) | ||
| except Exception: | ||
| logger.exception("Network: Handler error for message from %s", addr) | ||
| logger.exception( | ||
| "Network: Handler error for message from %s", addr | ||
| ) |
There was a problem hiding this comment.
Only mark a message as seen after successful handling.
Line 267 promotes the message to the dedup cache before Line 271 awaits the handler. If the handler raises, that tx/block becomes a permanent duplicate and later retries from healthy peers are ignored until restart. Keep a separate in-flight marker and only commit to _seen_* after the callback succeeds.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@minichain/p2p.py` around lines 262 - 275, The code currently calls
_mark_seen(msg_type, payload) before awaiting self._handler_callback, causing
failed handlers to permanently mark messages as seen; change this flow to use a
separate in-flight marker (e.g. a new set _inflight_messages or helper methods
like _mark_inflight and _clear_inflight) so you: 1) check duplicates using both
_is_duplicate and the in-flight set, 2) add the message to the in-flight set
before awaiting self._handler_callback, 3) on successful await call
_mark_seen(msg_type, payload) and remove the in-flight marker, and 4) on
exception remove the in-flight marker (do not call _mark_seen) and rethrow/log
as currently done; update any duplicate-check logic to consult the in-flight
marker so concurrent deliveries are suppressed while processing.
Addressed Issues:
Summary
seen_tx_ids/seen_block_hashesdedup caches and strict inbound schema validation before processing peer messages.Compatibility Fixes From Current Main State
Validation
./.venv/bin/python -m pytest -qAI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Check one of the checkboxes below:
I have used the following AI models and tools: TODO
Checklist
Summary by CodeRabbit
Release Notes
New Features
Refactor
Tests