Skip to content

feat: step1 protocol hardening improvements #58

Merged
Zahnentferner merged 4 commits intoStabilityNexus:mainfrom
arunabha003:step1-protocol-hardening
Mar 13, 2026
Merged

feat: step1 protocol hardening improvements #58
Zahnentferner merged 4 commits intoStabilityNexus:mainfrom
arunabha003:step1-protocol-hardening

Conversation

@arunabha003
Copy link
Contributor

@arunabha003 arunabha003 commented Mar 11, 2026

Addressed Issues:

Summary

  • Network reliability bundle: add seen_tx_ids / seen_block_hashes dedup caches and strict inbound schema validation before processing peer messages.
  • Mempool correctness bundle: implement sender-aware nonce queues so block selection never violates per-sender nonce ordering.
  • Deterministic consensus bundle: move signing and hash-critical payloads to one canonical serializer so all nodes compute identical bytes for tx signatures, tx ids, and block/header hashes.

Compatibility Fixes From Current Main State

  • align block validation with the current wire format
  • sync state on outbound peer connect as well
  • include miner identity in block broadcast

Validation

  • ./.venv/bin/python -m pytest -q

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • [ x] This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Checklist

  • [x ] My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • [x ] My code follows the project's code style and conventions
  • [ x] If applicable, I have made corresponding changes or additions to the documentation
  • [ x] If applicable, I have made corresponding changes or additions to tests
  • [x ] My changes generate no new warnings or errors
  • [ x] I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • [x ] I have read the Contribution Guidelines
  • [x ] Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • [x ] I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced transaction ordering in mempool with improved nonce-based queue management for more reliable transaction processing.
    • Improved P2P network message validation and duplicate detection for enhanced protocol robustness.
  • Refactor

    • Optimized deterministic JSON serialization for consistent and reliable hashing across the network.
    • Improved resource limit handling with non-fatal logging.
  • Tests

    • Added comprehensive protocol hardening and consensus validation tests.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Walkthrough

The PR refactors transaction mempool storage to use per-sender nonce-based queues for deterministic ordering, introduces canonical JSON serialization for stable hashing across blocks and transactions, enhances P2P network with message validation and deduplication logic, and adds miner parameter support to block broadcasts.

Changes

Cohort / File(s) Summary
Canonical JSON Serialization
minichain/serialization.py, minichain/transaction.py, minichain/block.py, minichain/pow.py
Introduces canonical_json_dumps/bytes/hash utilities for deterministic JSON serialization; replaces ad-hoc JSON hashing with canonical approach across transaction tx_id computation, block header hashing, and PoW mining iterations.
Mempool Refactoring
minichain/mempool.py, main.py
Restructures internal storage from flat list to per-sender dictionary; implements nonce-based deduplication and contiguous nonce selection; adds get_transactions_for_block(state), remove_transactions(transactions), and __len__ public methods; changes transaction removal from drain-all to selective removal.
P2P Network Enhancements
minichain/p2p.py, main.py
Adds message validation helpers, duplicate tracking via _seen_tx_ids/_seen_block_hashes, inbound message deduplication, and SUPPORTED_MESSAGE_TYPES constant; extends broadcast_block signature to accept optional miner parameter; implements outbound peer synchronization callback.
Contract Error Handling
minichain/contract.py
Converts RuntimeError exception on resource limit failure to warning log, allowing non-exceptional completion of setup flow.
Protocol Hardening Tests
tests/test_protocol_hardening.py
Adds comprehensive test suite covering deterministic consensus (canonical JSON and block hashing), mempool nonce queue ordering with gap handling, and P2P message validation including schema enforcement and duplicate detection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

Python Lang

Suggested reviewers

  • Zahnentferner

Poem

🐰 Hoppy hashing now flows through the chain,
Canonical JSON keeps the ledger sane,
Per-sender nonces queue up just right,
P2P guards messages day and night—
Test suite blooms with hardened might! ✨

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: step1 protocol hardening improvements' is specific and accurately summarizes the main focus of the PR—protocol hardening enhancements including dedup caches, schema validation, nonce queues, and deterministic serialization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main.py (1)

99-123: ⚠️ Potential issue | 🔴 Critical

Do not derive the mining reward recipient from unauthenticated wire metadata.

Lines 118-119 credit payload["miner"], but miner is not part of Block.to_header_dict() / Block.compute_hash() in minichain/block.py. Any peer can rewrite that field on the wire and redirect the reward for an otherwise valid block; because dedup keys on block hash, the first spoofed copy that arrives wins. Put the reward recipient inside the hash-critical block contents or represent it as a coinbase transaction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 99 - 123, The code currently credits mining rewards
from unauthenticated wire metadata (payload.get("miner", ...)) after
chain.add_block; to fix, stop reading miner from payload and instead derive the
recipient from data that is included in the block hash (either add a miner field
to the Block model and include it in Block.to_header_dict()/Block.compute_hash,
or require a coinbase Transaction as the first transaction and extract the
recipient from block.transactions[0]); update the Block construction here (use
the new miner field or validate/extract coinbase) and then call
chain.state.credit_mining_reward with that authenticated recipient only after
chain.add_block succeeds, removing any use of payload["miner"] in this handler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@minichain/contract.py`:
- Around line 21-22: The except block that currently swallows (OSError,
ValueError) and calls logger.warning("Failed to set resource limits: %s", e)
must instead fail closed: change this handler so it returns/raises an error back
to the caller rather than continuing; specifically, in the function that sets OS
resource limits (the try/except surrounding "Failed to set resource limits" and
logger.warning) re-raise the exception or return a failure result so
ContractMachine.execute() observes the failure and returns false/error to the
parent (do not leave the logger.warning-only behavior). Ensure callers of that
function (notably ContractMachine.execute()) propagate the failure so execution
aborts instead of running unsandboxed.

In `@minichain/mempool.py`:
- Around line 90-98: The current _remove_transactions_unlocked method removes tx
IDs from self._seen_tx_ids which lets confirmed transactions be re-admitted;
change this so _remove_transactions_unlocked does not discard tx IDs. Instead
add/maintain a separate recent_confirmed_tx_ids cache (e.g.,
self._recent_confirmed_tx_ids) and, when clearing pending entries in
_remove_transactions_unlocked, insert the tx_id into that cache (or simply omit
modifying _seen_tx_ids here). Alternatively, only call
self._seen_tx_ids.discard(tx_id) from an explicit rollback/requeue method
(create a clear_tx_id_or_requeue method) and leave _remove_transactions_unlocked
unchanged with respect to seen IDs. Ensure references: update
_remove_transactions_unlocked, _seen_tx_ids usage, and any code paths that
currently rely on discarding IDs (confirmations path in main.py) to use the new
recent_confirmed_tx_ids or the explicit rollback method.
- Around line 76-83: The global sort on selected (key=lambda tx: (tx.timestamp,
tx.sender, tx.nonce)) can reorder contiguous per-sender sequences; instead,
build per-sender ready lists by walking self._pending_by_sender using
_expected_nonce_for_sender (as already done), then perform a k-way merge of
those per-sender lists so each sender's internal nonce order is preserved while
interleaving senders by timestamp; implement the merge (e.g., using heapq with
entries containing (next_tx.timestamp, sender_id, index) so you always pop the
earliest timestamp but advance that sender's sequence) and then call
_remove_transactions_unlocked on the merged list.

In `@minichain/p2p.py`:
- Around line 262-275: The code currently calls _mark_seen(msg_type, payload)
before awaiting self._handler_callback, causing failed handlers to permanently
mark messages as seen; change this flow to use a separate in-flight marker (e.g.
a new set _inflight_messages or helper methods like _mark_inflight and
_clear_inflight) so you: 1) check duplicates using both _is_duplicate and the
in-flight set, 2) add the message to the in-flight set before awaiting
self._handler_callback, 3) on successful await call _mark_seen(msg_type,
payload) and remove the in-flight marker, and 4) on exception remove the
in-flight marker (do not call _mark_seen) and rethrow/log as currently done;
update any duplicate-check logic to consult the in-flight marker so concurrent
deliveries are suppressed while processing.
- Around line 37-38: Replace the unbounded sets _seen_tx_ids and
_seen_block_hashes with bounded, evicting caches (e.g., cachetools.TTLCache or
LRUCache with a reasonable maxsize/ttl) to prevent unbounded memory growth; also
stop keying block deduplication off the peer-supplied "hash" field in the block
handler and instead recompute the block's canonical hash from the validated
block contents (use your Block.compute_hash()/recompute_hash(...) function or
validate_block(...) return) and insert that recomputed hash into the bounded
cache; ensure transaction dedupe (_seen_tx_ids) also uses a bounded cache and
that all insert/check operations reference these new caches instead of the old
sets.

---

Outside diff comments:
In `@main.py`:
- Around line 99-123: The code currently credits mining rewards from
unauthenticated wire metadata (payload.get("miner", ...)) after chain.add_block;
to fix, stop reading miner from payload and instead derive the recipient from
data that is included in the block hash (either add a miner field to the Block
model and include it in Block.to_header_dict()/Block.compute_hash, or require a
coinbase Transaction as the first transaction and extract the recipient from
block.transactions[0]); update the Block construction here (use the new miner
field or validate/extract coinbase) and then call
chain.state.credit_mining_reward with that authenticated recipient only after
chain.add_block succeeds, removing any use of payload["miner"] in this handler.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a911ffcb-d8ce-47f2-9984-65646d4c341c

📥 Commits

Reviewing files that changed from the base of the PR and between ff5e83a and 7c72503.

📒 Files selected for processing (9)
  • main.py
  • minichain/block.py
  • minichain/contract.py
  • minichain/mempool.py
  • minichain/p2p.py
  • minichain/pow.py
  • minichain/serialization.py
  • minichain/transaction.py
  • tests/test_protocol_hardening.py

Comment on lines 21 to +22
except (OSError, ValueError) as e:
logger.error(f"Failed to set resource limits: {e}")
raise RuntimeError(f"Failed to set resource limits: {e}")
logger.warning("Failed to set resource limits: %s", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail closed if OS resource limits cannot be applied.

Line 22 now turns sandbox setup failure into a warning and still executes untrusted contract code. Because minichain/state.py:112-126 only consumes the boolean result from ContractMachine.execute(), this path silently runs contracts without the intended OS-level limits, and tests/test_contract.py:67-88 does not cover that downgrade. Return an error to the parent and abort execution instead of continuing unsandboxed.

Proposed fix
         except ImportError:
             logger.warning("Resource module not available. Contract will run without OS-level resource limits.")
         except (OSError, ValueError) as e:
-            logger.warning("Failed to set resource limits: %s", e)
+            result_queue.put({
+                "status": "error",
+                "error": f"failed to set resource limits: {e}",
+            })
+            return
 
         exec(code, globals_dict, context_dict)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except (OSError, ValueError) as e:
logger.error(f"Failed to set resource limits: {e}")
raise RuntimeError(f"Failed to set resource limits: {e}")
logger.warning("Failed to set resource limits: %s", e)
except ImportError:
logger.warning("Resource module not available. Contract will run without OS-level resource limits.")
except (OSError, ValueError) as e:
result_queue.put({
"status": "error",
"error": f"failed to set resource limits: {e}",
})
return
exec(code, globals_dict, context_dict)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@minichain/contract.py` around lines 21 - 22, The except block that currently
swallows (OSError, ValueError) and calls logger.warning("Failed to set resource
limits: %s", e) must instead fail closed: change this handler so it
returns/raises an error back to the caller rather than continuing; specifically,
in the function that sets OS resource limits (the try/except surrounding "Failed
to set resource limits" and logger.warning) re-raise the exception or return a
failure result so ContractMachine.execute() observes the failure and returns
false/error to the parent (do not leave the logger.warning-only behavior).
Ensure callers of that function (notably ContractMachine.execute()) propagate
the failure so execution aborts instead of running unsandboxed.

Comment on lines +76 to +83
for sender, sender_queue in self._pending_by_sender.items():
expected_nonce = self._expected_nonce_for_sender(sender, state)
while expected_nonce in sender_queue:
selected.append(sender_queue[expected_nonce])
expected_nonce += 1

selected.sort(key=lambda tx: (tx.timestamp, tx.sender, tx.nonce))
self._remove_transactions_unlocked(selected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The final global sort still allows nonce reordering for a single sender.

Line 82 sorts by (tx.timestamp, tx.sender, tx.nonce). If one sender has nonce 1 with an earlier timestamp than nonce 0, the selected block order becomes [1, 0] even though the queue was assembled contiguously. Preserve each sender's ready sequence when merging instead of globally resorting raw transactions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@minichain/mempool.py` around lines 76 - 83, The global sort on selected
(key=lambda tx: (tx.timestamp, tx.sender, tx.nonce)) can reorder contiguous
per-sender sequences; instead, build per-sender ready lists by walking
self._pending_by_sender using _expected_nonce_for_sender (as already done), then
perform a k-way merge of those per-sender lists so each sender's internal nonce
order is preserved while interleaving senders by timestamp; implement the merge
(e.g., using heapq with entries containing (next_tx.timestamp, sender_id, index)
so you always pop the earliest timestamp but advance that sender's sequence) and
then call _remove_transactions_unlocked on the merged list.

Comment on lines +90 to +98
def _remove_transactions_unlocked(self, transactions):
for tx in transactions:
tx_id = self._get_tx_id(tx)
sender_queue = self._pending_by_sender.get(tx.sender)
if sender_queue and tx.nonce in sender_queue:
del sender_queue[tx.nonce]
if not sender_queue:
del self._pending_by_sender[tx.sender]
self._seen_tx_ids.discard(tx_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Removing a transaction should not automatically forget its ID.

Line 98 discards the tx_id, and main.py Lines 121-122 now call this path for confirmed block transactions. That makes a confirmed transaction admissible again through any ingress path that bypasses the network-layer seen set, which can reintroduce stale low-nonce junk or duplicates. Keep a separate recent-confirmed cache, or only clear IDs on an explicit rollback/requeue path.

🧰 Tools
🪛 Ruff (0.15.5)

[warning] 90-90: Missing return type annotation for private function _remove_transactions_unlocked

Add return type annotation: None

(ANN202)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@minichain/mempool.py` around lines 90 - 98, The current
_remove_transactions_unlocked method removes tx IDs from self._seen_tx_ids which
lets confirmed transactions be re-admitted; change this so
_remove_transactions_unlocked does not discard tx IDs. Instead add/maintain a
separate recent_confirmed_tx_ids cache (e.g., self._recent_confirmed_tx_ids)
and, when clearing pending entries in _remove_transactions_unlocked, insert the
tx_id into that cache (or simply omit modifying _seen_tx_ids here).
Alternatively, only call self._seen_tx_ids.discard(tx_id) from an explicit
rollback/requeue method (create a clear_tx_id_or_requeue method) and leave
_remove_transactions_unlocked unchanged with respect to seen IDs. Ensure
references: update _remove_transactions_unlocked, _seen_tx_ids usage, and any
code paths that currently rely on discarding IDs (confirmations path in main.py)
to use the new recent_confirmed_tx_ids or the explicit rollback method.

Comment on lines +37 to +38
self._seen_tx_ids = set()
self._seen_block_hashes = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not keep grow-forever dedup sets keyed by unvalidated block hashes.

self._seen_tx_ids and self._seen_block_hashes never evict entries. On the block path, Line 218 keys the cache off the peer-supplied hash field before semantic validation, so one attacker can stream arbitrary unique hashes and retain them forever while still driving the block handler path. Use a bounded TTL/LRU cache and derive the block key from recomputed block contents instead of trusting the wire hash.

Also applies to: 214-228

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@minichain/p2p.py` around lines 37 - 38, Replace the unbounded sets
_seen_tx_ids and _seen_block_hashes with bounded, evicting caches (e.g.,
cachetools.TTLCache or LRUCache with a reasonable maxsize/ttl) to prevent
unbounded memory growth; also stop keying block deduplication off the
peer-supplied "hash" field in the block handler and instead recompute the
block's canonical hash from the validated block contents (use your
Block.compute_hash()/recompute_hash(...) function or validate_block(...) return)
and insert that recomputed hash into the bounded cache; ensure transaction
dedupe (_seen_tx_ids) also uses a bounded cache and that all insert/check
operations reference these new caches instead of the old sets.

Comment on lines +262 to +275
msg_type = data["type"]
payload = data["data"]
if self._is_duplicate(msg_type, payload):
logger.info("Network: Duplicate %s ignored from %s", msg_type, addr)
continue
self._mark_seen(msg_type, payload)

if self._handler_callback:
try:
await self._handler_callback(data)
except Exception:
logger.exception("Network: Handler error for message from %s", addr)
logger.exception(
"Network: Handler error for message from %s", addr
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only mark a message as seen after successful handling.

Line 267 promotes the message to the dedup cache before Line 271 awaits the handler. If the handler raises, that tx/block becomes a permanent duplicate and later retries from healthy peers are ignored until restart. Keep a separate in-flight marker and only commit to _seen_* after the callback succeeds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@minichain/p2p.py` around lines 262 - 275, The code currently calls
_mark_seen(msg_type, payload) before awaiting self._handler_callback, causing
failed handlers to permanently mark messages as seen; change this flow to use a
separate in-flight marker (e.g. a new set _inflight_messages or helper methods
like _mark_inflight and _clear_inflight) so you: 1) check duplicates using both
_is_duplicate and the in-flight set, 2) add the message to the in-flight set
before awaiting self._handler_callback, 3) on successful await call
_mark_seen(msg_type, payload) and remove the in-flight marker, and 4) on
exception remove the in-flight marker (do not call _mark_seen) and rethrow/log
as currently done; update any duplicate-check logic to consult the in-flight
marker so concurrent deliveries are suppressed while processing.

@Zahnentferner Zahnentferner merged commit e935c94 into StabilityNexus:main Mar 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants