Skip to content
Will Riley edited this page Aug 21, 2016 · 28 revisions

Dump of ideas for the project:

  • use cases
    • large scale, single-sharded applications
    • data structures
      • chat logs
      • document store (json w/ embedded types)
      • structured text
      • game states?
      • for some of these (chat logs/game states) we'll need to be able to randomly access parts of a document
  • high performance features
    • op composing combined with snapshot framing -- useful for syncing game state between game clients that need 'frames' to determine source of truth, or hot documents that process a lot of changes
  • tools for data management
    • schema versioning -- old clients still work with new data
      • in migration framework, define mapping from old version -> new version
      • calculate rollback automatically/have user specify rollback plan
      • roll back schema when sending old clients data
      • migrate schema forward when writing ops from old clients
      • could either upgrade on read, or migrate all at once (probs go with former since we don't need to implement queries yet)
      • should work in theory?
    • computed properties / tools for managing denormalized data
    • should be able to roll out new version of app without disconnecting clients
      • application logic will be decoupled from the database, so will happen naturally through architectural decision
    • should handle draft flow where you copy the document's current state -> make change -> write full as well
    • handle state bundling problem for isomorphic js rendering -- be able to request documents at specific versions?
  • some way to use directly from the outside via websockets
    • alternatively, make it easy to integrate into a server stack
      • Browsers have limited number of HTTP connections per server so this might be better
    • need to think through security implications
    • middleware for access control
      • query validation
      • read/write permissions
  • API for tagging ops related to user action so you can implement undo functionality (transactions)
  • Should it be postgres style, or riak style?
    • could use ops as a mechanism for efficient sync if distributed between nodes
    • if riak style:
      • how to query data globally? Maybe central elasticsearch index? Is that a robust solution?
        • How would subscriptions work? Would it be too chattery if a node had to publish things to other subscribed vnodes that are handling clients?
        • having aggregates done on an external service could induce query lag -- can do distributed queries on each node or use direct pubsub updates to mitigate
          • on the flip side, being able to asynchronously update an index would be awesome since operations can be fast-inserted, then we can update the index after a delay (similar to a debounce)
          • going down the debounce route would mean we need to make sure the index is up to date on startup though
      • what to do if ops get lost during a vnode failure / netsplit?
  • Handle object replacement merge issue in a more systematic way
    • CRDTs / OT and data sync
      • could have users provide a schema which dictates the underlying CRDT behavior (riak allows this -- see how they did it)
        • this way apps can declaratively describe how conflicts could be resolved
        • should probably force users to do this anyway so that we can support offline sync easily
      • what can objects represent at a high level? Can we differentiate between these representations in the data type?
  • serialization into LevelDB

Things I still need to read about:

Potentially useful reading:

Reference implementations:

Clone this wiki locally