Skip to content

Moving Data Between Backends

Petrus Pradella edited this page Jun 18, 2026 · 2 revisions

Moving Data Between Backends

What this page covers: copying entities from one Storage to another with StorageTransfer — the full builder surface, the three ErrorPolicy modes, what the TransferReport tells you (and why its future never fails for expected errors), the two-argument descriptor(...) for renaming a collection or changing codec mid-flight, and a maintenance-window cutover playbook.

📌 Note — the transfer only reads the source. It copies, never moves; deleting the source is a separate, explicit action you take afterwards. Re-running is safe (every backend upserts).


The 30-second version

import br.com.finalcraft.everydatabase.transfer.*;

TransferReport report = StorageTransfer.builder()
        .from(oldLocalFileStorage)               // read-only source
        .to(newSqlStorage)                       // write target
        .descriptor(PLAYERS)                     // one or more collections to copy
        .descriptor(ACCOUNTS)
        .build()
        .execute()                               // CompletableFuture<TransferReport>
        .join();

if (report.success()) {
    System.out.printf("Done: %d entities in %dms%n", report.totalEntities(), report.durationMs());
} else {
    report.errors().forEach(e -> System.err.printf("[%s] %s%n", e.collection(), e.cause().getMessage()));
}

That's it: source, target, the descriptors you want copied, execute(). Everything else has a sane default. execute() returns a CompletableFuture<TransferReport> like every I/O call in the library — see The Async API; we .join() here for brevity.

💡 Tip — the same EntityDescriptor you already use for CRUD is what you register here. The source repository and target repository are derived from it, so the entity type, key extractor and codec are guaranteed to match on both sides.


Install

StorageTransfer lives in everydatabase-core — no extra dependency. Coordinates on Installation.


The builder, field by field

StorageTransfer.builder() returns a fluent Builder. Mandatory: from, to, and at least one descriptor. Everything else has a default.

Method Default What it does
from(Storage source) (required) The storage to read from. Never modified.
to(Storage target) (required) The storage to write into.
descriptor(EntityDescriptor<K,V>) (≥1 required) A collection to copy (same descriptor both sides).
descriptor(EntityDescriptor<K,V> src, EntityDescriptor<K,V> dst) Copy with a rename / codec change (see below).
batchSize(int) 500 Entities per saveAll on the target. Higher = fewer round-trips, more memory. Must be >= 1.
errorPolicy(ErrorPolicy) FAIL_FAST How write failures are handled (see ErrorPolicy).
applyTargetMigrations(boolean) true If the target is SchemaAwareStorage, run migrate() during pre-flight.
failIfTargetCollectionNotEmpty(boolean) true Abort a collection if the target already has data (count() > 0).
verifyCounts(boolean) true After each collection, assert entitiesWritten == sourceCount; fail the report if they diverge.
progressListener(Consumer<TransferProgress>) null Called after every batch with a progress snapshot.
build() Validates and returns the StorageTransfer. Missing source/target/descriptor throws IllegalStateException.

A fully-specified transfer:

StorageTransfer transfer = StorageTransfer.builder()
        .from(oldLocalFileStorage)
        .to(newSqlStorage)
        .descriptor(PLAYERS)
        .descriptor(ACCOUNTS)
        .batchSize(1000)
        .applyTargetMigrations(true)             // run the target's migrations first
        .failIfTargetCollectionNotEmpty(true)    // refuse to overwrite existing target data
        .verifyCounts(true)                      // assert written == source count
        .errorPolicy(ErrorPolicy.FAIL_FAST)
        .progressListener(p ->
            System.out.printf("%s: %d/%d (%dms)%n", p.collection(), p.done(), p.total(), p.elapsedMs()))
        .build();

TransferReport report = transfer.execute().join();

⚠️ GotchaapplyTargetMigrations, failIfTargetCollectionNotEmpty and verifyCounts all default to true (the safe choices). You opt out of the safety rails, you don't opt in. Set failIfTargetCollectionNotEmpty(false) only when you deliberately want to merge into a populated target — and pair it with ErrorPolicy.SKIP_EXISTING so you don't clobber what's there.

progressListener receives a TransferProgress after each batch: collection(), done() (entities written so far in this collection), total() (the source count() snapshot taken at start), and elapsedMs(). Within a collection, done only grows and total is fixed.


ErrorPolicy — three failure modes

import br.com.finalcraft.everydatabase.transfer.ErrorPolicy;
Policy On a write failure Target safety Speed
FAIL_FAST (default) first exception aborts the whole transfer unstarted collections untouched batched
CONTINUE record the error, keep going with remaining batches/collections best-effort; partial writes possible batched
SKIP_EXISTING write entity-by-entity, exists()-check each key, skip ones already present never overwrites existing target data slower (2 round-trips/entity)
  • FAIL_FAST is the safest default: it stops the moment anything goes wrong, so a partial, inconsistent target can't silently form. The report carries exactly one TransferError.
  • CONTINUE is "best effort, inspect afterwards." All failures land in report.errors(); report.success() is false if any error was recorded.
  • SKIP_EXISTING is the non-destructive merge mode. It abandons the batch path and writes one entity at a time, calling Repository.exists(key) first and skipping keys already on the target. Use it with failIfTargetCollectionNotEmpty(false) to fold new data into a populated collection without touching what's there.
// Merge new players into an already-populated target, preserving existing rows:
StorageTransfer.builder()
        .from(src).to(dst)
        .descriptor(PLAYERS)
        .failIfTargetCollectionNotEmpty(false)   // the target already has data — allow it
        .errorPolicy(ErrorPolicy.SKIP_EXISTING)  // ...but only add keys that aren't there yet
        .build()
        .execute()
        .join();

📌 Note — under SKIP_EXISTING, verifyCounts relaxes its check from entitiesWritten == sourceCount to entitiesWritten <= sourceCount, because skipped entities are expected and should not flag the report as failed.


TransferReport — the future does not fail for expected errors

This is the contract to internalize: execute()'s future completes normally even when the transfer itself failed. Expected failures (a write blew up, a count mismatched, the target collection wasn't empty) are data, collected inside the report — not exceptions thrown out of the future. An exception escaping the future means an unexpected JVM-level failure, not a transfer error.

So you always check report.success():

TransferReport report = transfer.execute().join();   // does NOT throw for a failed transfer

if (report.success()) {
    System.out.printf("Transferred %d entities across %d collections in %dms%n",
            report.totalEntities(), report.collections().size(), report.durationMs());

    for (CollectionStats s : report.collections().values()) {
        System.out.printf("  %s -> %s: %d/%d written, target %d -> %d, %dms%n",
                s.sourceCollection(), s.targetCollection(),
                s.entitiesWritten(), s.sourceCount(),
                s.targetCountBefore(), s.targetCountAfter(), s.durationMs());
    }
} else {
    for (TransferError e : report.errors()) {
        System.err.printf("[%s] key=%s: %s%n", e.collection(), e.key(), e.cause().getMessage());
    }
}

What the report exposes:

Member Meaning
success() true only if no error was recorded and all count verifications passed.
totalEntities() Sum of entitiesWritten() across every collection.
durationMs() Wall-clock from the first pre-flight check to the last verification.
collections() Map<String, CollectionStats> keyed by source collection name, in registration order.
errors() List<TransferError>; empty on success. With FAIL_FAST, at most one entry.

Each CollectionStats carries sourceCollection(), targetCollection(), sourceCount(), targetCountBefore(), targetCountAfter(), entitiesWritten(), and durationMs() — enough to verify a clean copy: targetCountAfter() - targetCountBefore() should equal sourceCount() (or be smaller under SKIP_EXISTING).

A TransferError carries collection(), key() (the entity key, or null for a global/collection-level failure like a pre-flight abort or count mismatch), and cause() (the Throwable).

⚠️ Gotcha — don't write try { transfer.execute().join(); } catch (...) { /* handle failure */ } and expect to catch a failed transfer there. The future succeeds; the failure is in the report. A thrown exception is a bug-level surprise, not a transfer error.


Renaming a collection or changing codec mid-transfer

The two-argument descriptor(sourceDescriptor, targetDescriptor) decouples how the entity is read from how it's written. Both descriptors must share the same <K, V> types so a decoded source entity can be written to the target without conversion — but their collection name and codec may differ.

Two real uses:

// 1. Rename the collection during the move ("legacy_players" on disk -> "players" in SQL).
EntityDescriptor<UUID, PlayerData> SRC = EntityDescriptor.builder(UUID.class, PlayerData.class)
        .collection("legacy_players")
        .keyExtractor(PlayerData::getUuid)
        .codec(JacksonJsonCodec.pretty(PlayerData.class))   // human-readable (indented) JSON on the old file store
        .build();

EntityDescriptor<UUID, PlayerData> DST = EntityDescriptor.builder(UUID.class, PlayerData.class)
        .collection("players")
        .keyExtractor(PlayerData::getUuid)
        .codec(new JacksonJsonCodec<>(PlayerData.class))    // compact JSON for the SQL column
        .build();

StorageTransfer.builder()
        .from(oldLocalFileStorage)
        .to(newSqlStorage)
        .descriptor(SRC, DST)                               // read SRC, write DST
        .build()
        .execute()
        .join();
// 2. Change codec only (YAML files -> JSON in SQL), same collection name on both sides.
//    The source descriptor uses JacksonYamlCodec (LocalFile-only); the target uses JSON.
.descriptor(yamlPlayersDesc, jsonPlayersDesc)

📌 Note — a codec change like this is exactly why the per-side descriptor exists: SQL/Mongo/InMemory require a JSON codec, while only LocalFile accepts YAML. Read with the source-appropriate codec, write with the target-appropriate one. See Codecs and Choosing a Backend.


Maintenance-window cutover playbook

StorageTransfer is built for a maintenance-mode migration — the kind where you freeze writes, copy the data, then point the application at the new backend on restart. A typical run:

  1. Freeze writes. Put the application in maintenance mode (whitelist / read-only) and flush any in-memory caches so the source on disk is the source of truth. (If you use the manager module, this is where you stop write-through traffic — see Caching & References.)

  2. Open both storages and init() them. Same EntityDescriptors you use in production.

  3. Run the transfer with the safety rails on (the defaults):

    TransferReport report = StorageTransfer.builder()
            .from(currentStorage)
            .to(newStorage)
            .descriptor(PLAYERS)
            .descriptor(ACCOUNTS)
            .applyTargetMigrations(true)             // bring the target schema up first
            .failIfTargetCollectionNotEmpty(true)    // a fresh target must be empty
            .verifyCounts(true)                      // counts must match exactly
            .errorPolicy(ErrorPolicy.FAIL_FAST)      // stop on the first problem
            .progressListener(p -> log(p.collection(), p.done(), p.total()))
            .build()
            .execute()
            .join();
  4. Gate the cutover on report.success(). If false, inspect report.errors(), fix the cause, and re-run — the target is still empty (FAIL_FAST left unstarted collections untouched), or you can wipe and retry. Do not flip the application over on a failed report.

  5. Cut over. On a successful report, change the application's storage config to the new backend and restart. The source remains intact as a rollback safety net until you're confident.

💡 Tip — to resume into a partially-populated target instead of starting fresh (e.g. a previous run died midway), flip to failIfTargetCollectionNotEmpty(false) + ErrorPolicy.SKIP_EXISTING: it only adds the keys that aren't there yet, leaving already-copied entities untouched.

📌 Note — transfer activity is observable. The TRANSFER log topic emits begin / per-collection progress / completion events, and the transfer mirrors those onto the target storage's log config. Turn them up with StorageLogConfig.defaults().level(StorageLogTopic.TRANSFER, StorageLogLevel.INFO) — see Logging & Diagnostics.


See also

  • The Async APIexecute() returns a CompletableFuture; composition and .join() semantics.
  • Choosing a Backend — pick the source and target; data-at-rest formats and capability matrix.
  • Codecs — JSON vs YAML, why the two-arg descriptor lets you change codec mid-transfer.
  • Schema Migrations — what applyTargetMigrations(true) runs before the copy.
  • Logging & Diagnostics — the TRANSFER topic and how to watch a transfer's progress.
  • CRUD Operations — the saveAll / exists semantics the transfer is built on.
  • Gotchas & Pitfalls — the "future never fails" contract and the default-on safety rails.

Clone this wiki locally