-
Notifications
You must be signed in to change notification settings - Fork 0
Database Code
This page describes the architecture and implementation of the packages/bdb database library used by Photosphere.
packages/bdb is the core database library for Photosphere. It provides a BSON-based document store with sharding, B-tree sort indexes, and a three-level merkle tree hierarchy for efficient sync and integrity verification.
The library is used by:
-
packages/apiandpackages/node-api, read and write operations on photo/video metadata (queries, import, sync, replicate, repair, verify) -
apps/bdb-cli, inspection and repair of database files on disk
Key source files and their roles:
| File | Purpose |
|---|---|
src/lib/database.ts |
BsonDatabase, IBsonDatabase, top-level entry point; owns the database merkle tree cache |
src/lib/collection.ts |
BsonCollection, IBsonCollection, a named collection of records; owns shard caches and collection merkle tree |
src/lib/shard.ts |
BsonShard, IShard, a single shard bucket; owns record map, dirty flag, and shard-level merkle ref |
src/lib/sort-index.ts |
SortIndex, ISortIndex, B-tree sort index for a single field; owns leaf page cache |
src/lib/merkle-tree.ts |
Helper functions for building, loading, and saving merkle trees at each level |
src/lib/merkle-tree-ref.ts |
MerkleRef, IMerkleRef, lazily-loaded, committable handle for a merkle tree; used at shard, collection, and database level |
src/lib/merge-records.ts |
Timestamp-based record merging used during sync |
src/lib/update-metadata.ts |
Updates per-field timestamp metadata during writes |
src/lib/update-fields.ts |
Applies partial field updates to records |
Records are stored as IInternalRecord objects on disk and exposed as IRecord to callers. The internal format carries extra metadata:
interface IInternalRecord {
_id: string; // UUID
fields: Record<string, any>; // the record's data fields
metadata: IMetadata; // timestamp metadata for sync
}Each field can carry an independent timestamp so that sync can merge changes at field granularity (see Record Merging).
Records are distributed across shards based on the MD5 hash of the record's _id modulo 100. Each shard is a single binary file containing all records in that shard, serialized in BSON format.
Sharding exists because storing one file per record would create excessive filesystem overhead for large collections (100,000+ records). With 100 shards, each shard file holds ~1,000 records on average.
Shard files are stored at:
<bsonDbPath>/collections/<collectionName>/shards/<shardId>
See Database-Format.md for the on-disk file format.
Sort indexes enable efficient sorted pagination without scanning all records. Each index covers one field in one direction (asc or desc) and supports the data types date, string, and number.
Internally each index is a B-tree whose leaf pages are linked in sorted order so sequential scans don't require traversing the tree. On disk each index lives in its own directory:
- The tree structure file:
indexes/<collectionName>/<fieldName>_<direction>/tree.dat - One leaf page file per page, named by its page ID with no extension:
indexes/<collectionName>/<fieldName>_<direction>/<pageId>
A sort index is accessed via collection.sortIndex(fieldName, direction), which returns an ISortIndex with these operations:
-
ensure(collection, type, progressCallback?), builds the index from all existing records; a no-op if already built -
exists(), reports whether the index has been built -
getPage(pageId?), returns one page of sorted records (the first page whenpageIdis omitted) -
findByValue(value), returns all records matching an exact value -
findByRange(options), returns records within a range -
drop(), deletes the index from disk
Three levels of merkle trees track the state of all data for efficient sync and integrity verification:
Database merkle tree
└─ Collection merkle tree (one per collection)
└─ Shard merkle tree (one per shard)
└─ Record hashes (one per record)
When any record changes, only the affected shard, collection, and database merkle nodes need to be updated, O(log n) rather than a full rebuild.
Sync works by comparing database merkle hashes first. If they match, nothing has changed. If they differ, the sync descends the tree to find exactly which shards differ, then transfers only those records.
All data loaded from disk is kept in an in-memory cache. All write operations update only the cache and set dirty flags, nothing touches disk until commit() is called.
Dirty flags propagate upward automatically:
-
SortIndexbecomes dirty → callsonDirty()callback →BsonCollection.markDirty() -
BsonCollectionbecomes dirty (shard, merkle, or sort index) → callsonDirty()callback →BsonDatabase.dirty = true - Each
markDirty()is idempotent, fires the upward callback only on the first transition per commit cycle
Flushes all dirty data to disk in this order:
- Write dirty shard files
- Write dirty shard merkle trees
- Write collection merkle tree (if dirty)
- Commit all sort indexes (write dirty leaf pages and tree structure)
- Update and write database merkle tree
After commit(), dirty flags are cleared but the in-memory cache remains populated. Subsequent reads are served from cache without disk I/O.
Ejects all cached data from memory. Throws if there are uncommitted changes, the caller must commit() first.
Use flush() before acquiring a write lock to ensure no stale data from a previous session is in memory.
await database.flush(); // eject stale cache before acquiring lock
await acquireWriteLock(...);
try {
await collection.insertOne(...);
await collection.updateOne(...);
await database.commit(); // write everything to disk before releasing
} finally {
await releaseWriteLock(...);
}This pattern is used in apply-database-ops.ts, import.ts, and sync.ts (all under packages/node-api/src/lib/).
Write locks prevent concurrent writes from multiple processes (e.g. two sync operations running simultaneously). Locks are file-based and scoped to a session ID. The lock implementation lives in packages/api/src/lib/write-lock.ts.
The pattern is always: flush() → acquire lock → write operations → commit() → release lock.
Files that acquire write locks (all under packages/node-api/src/lib/):
-
apply-database-ops.ts, applying metadata ops from a client -
import.ts, importing newly uploaded assets -
sync.ts, syncing between source and target databases
During sync, records from two databases may both have modifications. mergeRecords() in merge-records.ts performs a field-level merge based on timestamps: for each field, the version with the newer timestamp wins.
This is why each IInternalRecord carries metadata with per-field timestamps, it enables correct three-way merges without a conflict resolution UI.