The Migration Engine performs one-way migrations from a source node tree to a destination node tree. It is not a two-way sync engine; implementing bidirectional sync would require a completely different algorithm.
Traversal is based on the Breadth-First Search (BFS) algorithm, chosen for its predictability and ability to coordinate two trees in lockstep. Each traversal operation is isolated into discrete, non-recursive tasks so that workloads can be parallelized and queued efficiently.
-
List Children From the current node, list immediate children only (no recursion). Identify each child's type—whether it can contain further nodes (recursive nodes) or represents a terminal node.
-
Apply Filters Run each child through the configured filter rules using its ID, path, and context. Nodes that fail a rule are recorded along with the failure reason but are not scheduled for further traversal. This keeps each task stateless and lightweight.
-
Record Results Children that pass filtering are batched through per-queue buffers and flushed to BoltDB before the next leasing pass. This guarantees deterministic visibility without keeping additional in-memory round queues.
Depth-First Search (DFS) is memory-efficient, but it's less suited to managing two trees in parallel. BFS, while it requires storing all nodes at the current level, provides better control, checkpointing, and fault recovery. The Migration Engine (ME) serializes traversal data to BoltDB after each round, keeping memory use bounded while preserving full traversal context.
- The source and destination trees are traversed simultaneously.
- Each source node's children are compared directly with the destination's corresponding node.
- Throughput is limited by the slower of the two systems (e.g., 1000 nodes/s source vs 10 nodes/s destination).
- Harder to resume after interruptions because state exists only in memory.
- Example of this pattern: Rclone.
- Efficient but fragile for long migrations.
- Source and destination are traversed in rounds.
- Round 0: traverse the source root and list its children.
- Round 1: traverse those children; destination traversal remains coordinated behind the source.
- The destination queue is coordinated to stay at least 3 rounds behind the source via the
QueueCoordinator. - When the destination processes its corresponding level, it compares existing nodes against the expected list from the source.
- Extra items in the destination are logged but not traversed further.
- The destination can run as fast as possible while staying coordinated with the source.
- Because each round is batched and stored in BoltDB, the system can resume exactly where it left off after a crash.
- This maximizes both safety and throughput.
The Node Migration Engine operates in two passes:
- Discovery Phase – Traverses both node trees to identify what exists, what's missing, and what conflicts may occur.
- Execution Phase – After user review and approval, performs the actual creation or transfer of missing nodes.
This design gives users complete visibility and control before any data movement occurs.
The Migration Engine includes a comprehensive YAML-based configuration system that automatically saves migration state at critical milestones, enabling pause and resume functionality.
Migration state is automatically persisted to a YAML config file (default: {database_path}.yaml) at key points:
- Root Selection - When source and destination roots are set
- Roots Seeded - After root tasks are seeded into BoltDB
- Traversal Started - When queues are initialized and ready
- Round Advancement - When source or destination rounds advance during traversal
- Traversal Complete - When migration finishes
You can save and load migration sessions from YAML files:
Save (Serialization):
yamlCfg, err := migration.NewMigrationConfigYAML(cfg, status)
err = migration.SaveMigrationConfig("migration.yaml", yamlCfg)Load (Deserialization):
// Load YAML config for inspection
yamlCfg, err := migration.LoadMigrationConfig("migration.yaml")
// Or reconstruct full config for resuming
cfg, err := migration.LoadMigrationConfigFromYAML("migration.yaml", adapterFactory)
result, err := migration.LetsMigrate(cfg) // Resume migrationThe YAML config stores:
- Migration metadata (ID, timestamps)
- Current state (status, last rounds/levels)
- Service configurations (source, destination, embedded service configs)
- Migration options (workers, retries, etc.)
- Logging and database settings
This enables:
- Pause and Resume - Stop a migration and resume later from the exact checkpoint
- State Inspection - Review migration progress without running
- Configuration Portability - Move migration configs between environments
- Audit Trail - Track when migrations were created, modified, and completed
See pkg/migration/README.md for detailed documentation on the configuration system.
The Migration Engine uses BoltDB as its operational database for storing traversal state, node metadata, and logs.
BoltDB was chosen for its simplicity, reliability, and perfect fit with the BFS traversal model:
- Single-Writer Model - Guarantees consistent reads immediately after writes; eliminates race conditions
- Bucket Hierarchies - Natural support for organizing data by structure (SRC/DST, levels, status)
- Atomic Consistency - Single B-tree ensures all-or-nothing transaction semantics
- Simple Transactions - Clean, predictable transaction model
- Embedded - No external dependencies or services required
- Crash Resilience - ACID transactions with automatic recovery
/SRC
/nodes → pathHash: NodeState JSON (canonical data)
/children → parentHash: []childHash JSON (tree relationships)
/levels
/00000000
/pending → pathHash: empty (membership set)
/successful → pathHash: empty
/failed → pathHash: empty
/00000001/...
/DST
/nodes
/children
/levels
/00000000
/pending
/successful
/failed
/not_on_src → pathHash: empty (DST-specific status)
/00000001/...
/LOGS
/trace → uuid: LogEntry JSON
/debug → uuid: LogEntry JSON
/info → uuid: LogEntry JSON
/warning → uuid: LogEntry JSON
/error → uuid: LogEntry JSON
/critical → uuid: LogEntry JSON
-
Separation of Concerns
- Node data lives in
/nodes(single source of truth) - Status membership tracked in
/levels/{level}/{status}buckets - Tree relationships in
/childrenbuckets
- Node data lives in
-
Status Transitions are Simple
bucket.Delete(pathHash) // Remove from /pending bucket.Put(pathHash, []) // Add to /successful
-
No Key Encoding
- Structure is explicit in buckets, not encoded in key names
- Path hashes are simple lookup keys
- Hierarchy is natural and navigable
See pkg/db/README.md for detailed documentation on the database layer.