Skip to content
Ashley Davis edited this page Jun 14, 2026 · 2 revisions

Database Code

This page describes the architecture and implementation of the packages/bdb database library used by Photosphere.

Overview

packages/bdb is the core database library for Photosphere. It provides a BSON-based document store with sharding, B-tree sort indexes, and a three-level merkle tree hierarchy for efficient sync and integrity verification.

The library is used by:

  • packages/api and packages/node-api, read and write operations on photo/video metadata (queries, import, sync, replicate, repair, verify)
  • apps/bdb-cli, inspection and repair of database files on disk

Package Structure

Key source files and their roles:

File Purpose
src/lib/database.ts BsonDatabase, IBsonDatabase, top-level entry point; owns the database merkle tree cache
src/lib/collection.ts BsonCollection, IBsonCollection, a named collection of records; owns shard caches and collection merkle tree
src/lib/shard.ts BsonShard, IShard, a single shard bucket; owns record map, dirty flag, and shard-level merkle ref
src/lib/sort-index.ts SortIndex, ISortIndex, B-tree sort index for a single field; owns leaf page cache
src/lib/merkle-tree.ts Helper functions for building, loading, and saving merkle trees at each level
src/lib/merkle-tree-ref.ts MerkleRef, IMerkleRef, lazily-loaded, committable handle for a merkle tree; used at shard, collection, and database level
src/lib/merge-records.ts Timestamp-based record merging used during sync
src/lib/update-metadata.ts Updates per-field timestamp metadata during writes
src/lib/update-fields.ts Applies partial field updates to records

Data Model

Records are stored as IInternalRecord objects on disk and exposed as IRecord to callers. The internal format carries extra metadata:

interface IInternalRecord {
    _id: string;               // UUID
    fields: Record<string, any>; // the record's data fields
    metadata: IMetadata;       // timestamp metadata for sync
}

Each field can carry an independent timestamp so that sync can merge changes at field granularity (see Record Merging).

Sharding

Records are distributed across shards based on the MD5 hash of the record's _id modulo 100. Each shard is a single binary file containing all records in that shard, serialized in BSON format.

Sharding exists because storing one file per record would create excessive filesystem overhead for large collections (100,000+ records). With 100 shards, each shard file holds ~1,000 records on average.

Shard files are stored at:

<bsonDbPath>/collections/<collectionName>/shards/<shardId>

See Database-Format.md for the on-disk file format.

Sort Indexes

Sort indexes enable efficient sorted pagination without scanning all records. Each index covers one field in one direction (asc or desc) and supports the data types date, string, and number.

Internally each index is a B-tree whose leaf pages are linked in sorted order so sequential scans don't require traversing the tree. On disk each index lives in its own directory:

  • The tree structure file: indexes/<collectionName>/<fieldName>_<direction>/tree.dat
  • One leaf page file per page, named by its page ID with no extension: indexes/<collectionName>/<fieldName>_<direction>/<pageId>

A sort index is accessed via collection.sortIndex(fieldName, direction), which returns an ISortIndex with these operations:

  • ensure(collection, type, progressCallback?), builds the index from all existing records; a no-op if already built
  • exists(), reports whether the index has been built
  • getPage(pageId?), returns one page of sorted records (the first page when pageId is omitted)
  • findByValue(value), returns all records matching an exact value
  • findByRange(options), returns records within a range
  • drop(), deletes the index from disk

Merkle Trees

Three levels of merkle trees track the state of all data for efficient sync and integrity verification:

Database merkle tree
  └─ Collection merkle tree (one per collection)
       └─ Shard merkle tree (one per shard)
            └─ Record hashes (one per record)

When any record changes, only the affected shard, collection, and database merkle nodes need to be updated, O(log n) rather than a full rebuild.

Sync works by comparing database merkle hashes first. If they match, nothing has changed. If they differ, the sync descends the tree to find exactly which shards differ, then transfers only those records.

The Commit/Flush Pattern

All data loaded from disk is kept in an in-memory cache. All write operations update only the cache and set dirty flags, nothing touches disk until commit() is called.

Dirty flag propagation

Dirty flags propagate upward automatically:

  1. SortIndex becomes dirty → calls onDirty() callback → BsonCollection.markDirty()
  2. BsonCollection becomes dirty (shard, merkle, or sort index) → calls onDirty() callback → BsonDatabase.dirty = true
  3. Each markDirty() is idempotent, fires the upward callback only on the first transition per commit cycle

commit()

Flushes all dirty data to disk in this order:

  1. Write dirty shard files
  2. Write dirty shard merkle trees
  3. Write collection merkle tree (if dirty)
  4. Commit all sort indexes (write dirty leaf pages and tree structure)
  5. Update and write database merkle tree

After commit(), dirty flags are cleared but the in-memory cache remains populated. Subsequent reads are served from cache without disk I/O.

flush()

Ejects all cached data from memory. Throws if there are uncommitted changes, the caller must commit() first.

Use flush() before acquiring a write lock to ensure no stale data from a previous session is in memory.

The write-lock pattern

await database.flush();           // eject stale cache before acquiring lock
await acquireWriteLock(...);
try {
    await collection.insertOne(...);
    await collection.updateOne(...);
    await database.commit();      // write everything to disk before releasing
} finally {
    await releaseWriteLock(...);
}

This pattern is used in apply-database-ops.ts, import.ts, and sync.ts (all under packages/node-api/src/lib/).

Write Locking

Write locks prevent concurrent writes from multiple processes (e.g. two sync operations running simultaneously). Locks are file-based and scoped to a session ID. The lock implementation lives in packages/api/src/lib/write-lock.ts.

The pattern is always: flush() → acquire lock → write operations → commit() → release lock.

Files that acquire write locks (all under packages/node-api/src/lib/):

  • apply-database-ops.ts, applying metadata ops from a client
  • import.ts, importing newly uploaded assets
  • sync.ts, syncing between source and target databases

Record Merging

During sync, records from two databases may both have modifications. mergeRecords() in merge-records.ts performs a field-level merge based on timestamps: for each field, the version with the newer timestamp wins.

This is why each IInternalRecord carries metadata with per-field timestamps, it enables correct three-way merges without a conflict resolution UI.

Clone this wiki locally