-
Notifications
You must be signed in to change notification settings - Fork 0
Database Code
This page describes the architecture and implementation of the packages/bdb database library used by Photosphere.
packages/bdb is the core database library for Photosphere. It provides a BSON-based document store with sharding, B-tree sort indexes, and a three-level merkle tree hierarchy for efficient sync and integrity verification.
The library is used by:
-
packages/api— all read and write operations on photo/video metadata -
apps/bdb-cli— inspection and repair of database files on disk
Key source files and their roles:
| File | Purpose |
|---|---|
src/lib/database.ts |
BsonDatabase, IBsonDatabase — top-level entry point; owns the database merkle tree cache |
src/lib/collection.ts |
BsonCollection, IBsonCollection — a named collection of records; owns shard caches and collection merkle tree |
src/lib/shard.ts |
BsonShard, IShard — a single shard bucket; owns record map, dirty flag, and shard-level merkle ref |
src/lib/sort-index.ts |
SortIndex, ISortIndex — B-tree sort index for a single field; owns leaf page cache |
src/lib/merkle-tree.ts |
Helper functions for building, loading, and saving merkle trees at each level |
src/lib/merkle-tree-ref.ts |
MerkleRef, IMerkleRef — lazily-loaded, committable handle for a merkle tree; used at shard, collection, and database level |
src/lib/merge-records.ts |
Timestamp-based record merging used during sync |
src/lib/update-metadata.ts |
Updates per-field timestamp metadata during writes |
src/lib/update-fields.ts |
Applies partial field updates to records |
Records are stored as IInternalRecord objects on disk and exposed as IRecord to callers. The internal format carries extra metadata:
interface IInternalRecord {
_id: string; // UUID
fields: Record<string, any>; // the record's data fields
metadata: IMetadata; // timestamp metadata for sync
}Each field can carry an independent timestamp so that sync can merge changes at field granularity (see Record Merging).
Records are distributed across shards based on the MD5 hash of the record's _id modulo 100. Each shard is a single binary file containing all records in that shard, serialized in BSON format.
Sharding exists because storing one file per record would create excessive filesystem overhead for large collections (100,000+ records). With 100 shards, each shard file holds ~1,000 records on average.
Shard files are stored at:
<bsonDbPath>/collections/<collectionName>/shards/<shardId>
See Database-Format.md for the on-disk file format.
Sort indexes enable efficient sorted pagination without scanning all records. Each index covers one field in one direction (asc or desc) and supports the data types date, string, and number.
Internally each index is a B-tree whose leaf pages are linked in sorted order so sequential scans don't require traversing the tree. The tree is stored as:
- One tree structure file (
<collectionName>/<fieldName>-<direction>.dat) - One leaf page file per page (
<collectionName>/<fieldName>-<direction>/<pageId>.dat)
Key operations:
-
ensureSortIndex(field, direction, type)— creates the index and builds it from all existing records; no-op if already exists -
loadSortIndex(field, direction)— loads an existing index from disk into the cache -
getSorted(field, direction, pageId?)— returns one page of sorted records -
findByValue(value)— returns all records matching an exact value -
findByRange(min, max, options)— returns records in a range
Three levels of merkle trees track the state of all data for efficient sync and integrity verification:
Database merkle tree
└─ Collection merkle tree (one per collection)
└─ Shard merkle tree (one per shard)
└─ Record hashes (one per record)
When any record changes, only the affected shard, collection, and database merkle nodes need to be updated — O(log n) rather than a full rebuild.
Sync works by comparing database merkle hashes first. If they match, nothing has changed. If they differ, the sync descends the tree to find exactly which shards differ, then transfers only those records.
All data loaded from disk is kept in an in-memory cache. All write operations update only the cache and set dirty flags — nothing touches disk until commit() is called.
Dirty flags propagate upward automatically:
-
SortIndexbecomes dirty → callsonDirty()callback →BsonCollection.markDirty() -
BsonCollectionbecomes dirty (shard, merkle, or sort index) → callsonDirty()callback →BsonDatabase.dirty = true - Each
markDirty()is idempotent — fires the upward callback only on the first transition per commit cycle
Flushes all dirty data to disk in this order:
- Write dirty shard files
- Write dirty shard merkle trees
- Write collection merkle tree (if dirty)
- Commit all sort indexes (write dirty leaf pages and tree structure)
- Update and write database merkle tree
After commit(), dirty flags are cleared but the in-memory cache remains populated. Subsequent reads are served from cache without disk I/O.
Ejects all cached data from memory. Throws if there are uncommitted changes — the caller must commit() first.
Use flush() before acquiring a write lock to ensure no stale data from a previous session is in memory.
await database.flush(); // eject stale cache before acquiring lock
await acquireWriteLock(...);
try {
await collection.insertOne(...);
await collection.updateOne(...);
await database.commit(); // write everything to disk before releasing
} finally {
await releaseWriteLock(...);
}This pattern is used in apply-database-ops.ts, import.ts, and sync.ts.
Write locks prevent concurrent writes from multiple processes (e.g. two sync operations running simultaneously). Locks are file-based and scoped to a session ID.
The pattern is always: flush() → acquire lock → write operations → commit() → release lock.
Files that acquire write locks:
-
packages/api/src/lib/apply-database-ops.ts— applying metadata ops from a client -
packages/api/src/lib/import.ts— importing newly uploaded assets -
packages/api/src/lib/sync.ts— syncing between source and target databases
packages/api/src/lib/photosphere.ts provides IPsi and its concrete implementation Psi, which wrap all the lower-level database primitives into a single object.
Before IPsi, every CLI command received an IInitResult bag of loose fields (assetStorage, bsonDatabase, metadataCollection, sessionId, …) and then called free functions from media-file-database.ts and write-lock.ts by threading those fields through every call. IPsi consolidates this into one object with a clean interface.
interface IPsi {
database(): IBsonDatabase; // BSON DB + collections
files(): IMerkleRef<IDatabaseMetadata>; // lazy handle for the files merkle tree
metadata(): IBsonCollection<IAsset>; // metadata collection
acquireWriteLock(): Promise<void>;
refreshWriteLock(): Promise<void>;
releaseWriteLock(): Promise<void>;
commit(): Promise<void>;
flush(): void;
summary(): Promise<IDatabaseSummary>;
stream(assetId, assetType): Promise<NodeJS.ReadableStream>;
write(assetId, assetType, contentType, buffer): Promise<void>;
writeStream(assetId, assetType, contentType, stream, length): Promise<void>;
remove(assetId, recordDeleted): Promise<void>;
}Psi is instantiated directly and takes five parameters:
const psi = new Psi(assetStorage, rawStorage, sessionId, uuidGenerator, timestampProvider);Internally it calls createMediaFileDatabase() to obtain the BsonDatabase and metadataCollection. All callers see only the IBsonDatabase / IBsonCollection interfaces.
IInitResult in apps/cli/src/lib/init-cmd.ts now includes a psi field populated by both loadDatabase() and createDatabase(). The existing bsonDatabase and metadataCollection fields are derived from psi.database() and psi.metadata() for backward compatibility while CLI commands migrate to using psi directly.
IMerkleRef<T> and MerkleRef<T> in packages/bdb/src/lib/merkle-tree-ref.ts are generic (defaulting to T = undefined). This allows Psi.files() to return a typed IMerkleRef<IDatabaseMetadata> while existing BsonDatabase usages continue to use the default IMerkleRef<undefined> without changes.
During sync, records from two databases may both have modifications. mergeRecords() in merge-records.ts performs a field-level merge based on timestamps: for each field, the version with the newer timestamp wins.
This is why each IInternalRecord carries metadata with per-field timestamps — it enables correct three-way merges without a conflict resolution UI.