Skip to content

Conversation

@ordishs
Copy link
Collaborator

@ordishs ordishs commented Nov 14, 2025

Summary

Optimizes multi-record transaction handling in the UTXO store with expression-based filtering, children-before-master ordering, and automatic recovery for incomplete transactions.

Changes

1. Retry Phase 2 Cleanup on Recovery Attempts

  • Modified clearCreatingFlag() to attempt cleanup even when all records already exist
  • Enables recovery from partial Phase 2 (flag clearing) failures
  • Prevents creating flags from remaining set indefinitely

2. Expression-Based Filtering with Children-Before-Master Ordering

  • Performance: Eliminated read phase - single Aerospike roundtrip instead of two (50% reduction)
  • Server-side filtering: Uses Aerospike expression ExpBinExists to only update records with the creating bin
  • Atomic completion indicator: Children records cleared before master record
    • Master's creating flag absence = all children complete
    • Enables O(1) existence checks for multi-record transactions

3. Auto-Recovery for Incomplete Transactions

  • Added Creating field to meta.Data struct
  • Subtree validation now checks master record's creating flag
  • Transactions with creating=true treated as "missing" → triggers automatic re-processing
  • Self-healing: partial failures automatically recovered on next transaction arrival

Technical Details

Two-Phase Commit Protocol:

  1. Phase 1: Create all records with creating=true → notify block assembly once
  2. Phase 2: Clear creating flags (children first, then master)

Recovery Scenarios:

  • First attempt succeeds completely → Master cleared last → Complete
  • First attempt fails partway → Partial records remain → Next attempt completes Phase 1 → Clears all flags
  • All records exist with creating flags → Auto-recovery triggers re-processing → Cleanup succeeds

Files Changed:

  • stores/utxo/aerospike/create.go - Expression filtering, children-before-master ordering
  • stores/utxo/aerospike/get.go - BatchDecorate handler for Creating field
  • stores/utxo/meta/data.go - Added Creating bool field
  • stores/utxo/Interface.go - Added fields.Creating to MetaFields
  • services/subtreevalidation/processTxMetaUsingStore.go - Auto-recovery logic

Test Plan

  • ✅ All existing tests passing (meta, aerospike, subtreevalidation)
  • ✅ Linting clean
  • ✅ Multi-record transaction creation and recovery tested

@github-actions
Copy link
Contributor

github-actions bot commented Nov 14, 2025

🤖 Claude Code Review

Status: Complete

This PR implements a robust two-phase commit protocol for multi-record UTXO transactions with auto-recovery. The implementation is well-designed with comprehensive error handling and documentation.

Key Design Strengths:

  • Expression-based filtering eliminates unnecessary read operations (50% performance improvement)
  • Children-before-master clearing makes the master record an atomic completion indicator
  • Multiple self-healing paths ensure eventual consistency without manual intervention
  • Lua script integration properly clears creating flags during mining

No critical issues found. The code follows project conventions, includes extensive inline documentation explaining the recovery philosophy, and handles edge cases appropriately (e.g., FILTERED_OUT as success, partial failures, retry scenarios).

The auto-recovery mechanism is sound: incomplete transactions are detected via the Creating flag and automatically re-processed through subtree validation, ensuring system reliability.

@ordishs
Copy link
Collaborator Author

ordishs commented Nov 14, 2025

Documentation Added to Address Review Feedback

Thank you for the thorough review! I've added comprehensive documentation to explain why returning success despite incomplete Phase 2 is the correct design choice.

Key Documentation Additions

1. Function-Level Documentation (storeExternallyWithLock)

Added detailed explanation of the two-phase commit protocol and error handling philosophy (lines 793-824):

  • Why Phase 1 success = overall success
  • How the system self-heals through multiple recovery paths
  • Why returning error would break atomicity and create worse problems

2. Partial Success Case Documentation (lines 900-934)

Comprehensive explanation covering:

  • Transaction IS persisted: All records exist with complete data
  • Spending is safely protected: Per-record Lua checks prevent premature spending
  • Auto-recovery is self-healing: System recovers via subtreevalidation, setMined, and retry paths
  • Returning error creates worse problems: Would break atomicity and confuse callers
  • Block assembly notification is correct: About existence, not spendability

3. Recovery Path Documentation (lines 888-905)

Explains the "All exist" recovery path that enables self-healing:

  • How retry attempts complete Phase 2 cleanup
  • Why this is critical for eventual consistency
  • How it prevents requiring manual intervention

Why This Design Is Correct

The implementation is architecturally sound because:

  1. Transaction Existence ≠ UTXO Spendability: These are orthogonal concerns

    • Block assembly needs to know about transaction existence (for fees, templates)
    • Spend operations enforce spendability via per-UTXO creating flag checks
  2. Defense in Depth: Multiple layers of protection

    • Creating flags prevent premature spending (Lua script checks)
    • Auto-recovery triggers on re-encounter (subtreevalidation)
    • Retry paths complete cleanup ("All exist" path)
    • Mining clears flags (setMined)
  3. Atomicity Preservation: Phase 1 success means the transaction is persisted

    • Returning error would falsely indicate failure
    • Would cause retry loops and KEY_EXISTS_ERROR confusion
    • Would prevent block assembly notification
  4. Self-Healing Architecture: No manual intervention required

    • System automatically recovers through normal operation
    • Eventual consistency guaranteed
    • Graceful degradation under failure

Alternative Approaches Considered

Retry with exponential backoff: Would block goroutines, potentially for long periods, and could still fail. Current approach is non-blocking and self-healing.

Return error on Phase 2 failure: Would break atomicity (Phase 1 complete but system reports failure) and require callers to implement complex retry logic.

The current design provides the best balance of correctness, performance, and operational simplicity.

@ordishs ordishs requested a review from icellan November 14, 2025 14:12
@ordishs ordishs merged commit 8388637 into bsv-blockchain:main Nov 16, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants