Skip to content

Use structured Automerge maps instead of single 'value' string key #7

@kitplummer

Description

@kitplummer

Summary

All documents are stored in Automerge with a single "value" key containing a serialized JSON string. This collapses Automerge's rich CRDT merge semantics into effectively last-write-wins per entire document, losing the ability to merge concurrent field-level changes.

Current State

src/node.rs:275-307put_document():

let mut tx = doc.transaction();
tx.put(automerge::ROOT, "value", json_data)?;  // Entire JSON as one scalar string
tx.commit();

src/node.rs:357-365extract_json_from_automerge():

fn extract_json_from_automerge(doc: &automerge::Automerge) -> Option<String> {
    match doc.get(automerge::ROOT, "value") {
        Ok(Some((automerge::Value::Scalar(s), _))) => match s.as_ref() {
            automerge::ScalarValue::Str(s) => Some(s.to_string()),
            _ => None,
        },
        _ => None,
    }
}

The problem

When Node A updates {"status": "ready", "version": "1.0"} and Node B concurrently updates {"status": "ready", "version": "2.0"} — Automerge sees these as two competing writes to the same scalar "value" key and picks one arbitrarily. The losing write is entirely discarded.

With structured Automerge maps, Node A writing status: "ready" and Node B writing version: "2.0" would merge cleanly because they're independent keys.

Proposed Approach

  1. Structured write: Instead of tx.put(ROOT, "value", json_string), walk the JSON object and create Automerge map entries:

    fn put_json_to_automerge(tx: &mut Transaction, obj: automerge::ObjId, value: &serde_json::Value) {
        match value {
            Value::Object(map) => {
                for (k, v) in map {
                    // tx.put_object() for nested objects, tx.put() for scalars
                }
            }
            Value::Array(arr) => { /* tx.insert() for list elements */ }
            Value::String(s) => { tx.put(obj, key, s)?; }
            // ... other scalar types
        }
    }
  2. Structured read: Reconstruct JSON from the Automerge document tree instead of reading a single string key

  3. Migration: Support reading both formats during transition:

    • If doc has "value" string key → old format, read as-is
    • If doc has structured keys → new format, reconstruct JSON
    • On next write, always use new format
  4. Collection opt-in (alternative): Allow collections to declare their merge strategy:

    • "value" mode (current) for opaque blobs where LWW is fine
    • "structured" mode for documents where field-level merge matters

Trade-offs

Benefits:

  • Concurrent writes to different fields merge correctly
  • Better CRDT utilization — this is what Automerge is designed for
  • Reduced data loss during network partitions

Costs:

  • More complex serialization/deserialization
  • Automerge history grows faster (per-field operations vs per-document)
  • Need migration path from existing stored documents
  • Nested JSON (arrays, deeply nested objects) adds complexity

Impact

Most valuable for collections with frequent concurrent writes:

  • platforms/* — multiple fields updated by watcher vs. manual status changes
  • commands/* — status transitions from different sources
  • Less critical for deployments/* where a single agent owns each document

Files

  • src/node.rs:275-307put_document() (rewrite serialization)
  • src/node.rs:357-365extract_json_from_automerge() (rewrite deserialization)
  • src/service.rs:150-310 — typed collection helpers (may simplify with structured maps)
  • proto/sidecar.proto — no changes needed (JSON string interface stays the same for gRPC clients)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions