-
Notifications
You must be signed in to change notification settings - Fork 0
Database Format Legacy
This document describes the legacy on-disk layout and binary formats used by the Photosphere media database. This is database version 5. The database version is stored in the version field of .db/tree.dat. For the current format (version 6), see Database-Format.md. To migrate a version 5 database to version 6, use psi upgrade.
Important: Do not modify database files manually. Use the Photosphere CLI (psi) for all operations.
A database is a single root directory. All paths below are relative to that root.
| Path | Description |
|---|---|
README.md |
Auto-generated warning and usage instructions. |
.db/ |
Database-level control and integrity data. |
.db/tree.dat |
Files Merkle tree: hashes of asset/display/thumb files and metadata; used for sync and verify. |
asset/ |
Original imported media files (one file per asset, keyed by asset UUID, no extension). |
display/ |
Display-sized derivatives (e.g. max 1000px, JPEG). One file per asset, keyed by UUID. |
thumb/ |
Thumbnail derivatives (e.g. max 300px, JPEG). One file per asset, keyed by UUID. |
metadata/ |
BSON database root: all structured metadata and indexes live under this prefix. |
Structured metadata is stored in a BSON-based layout with sharded collections and sort indexes; its root is metadata/.
| Path | Description |
|---|---|
metadata/db.dat |
Database Merkle tree: one root hash per collection; used for replication and integrity. |
Format: versioned Merkle tree serialization (current version 5), stored without a trailing checksum.
Each collection is a directory under metadata/ (e.g. metadata/metadata for the asset metadata collection). The Photosphere app uses a single collection named metadata whose records are asset documents.
-
Shard files: one file per shard, named by shard ID (e.g.
0,1, …,96). No extension. Shard ID ismd5(recordId)[0:8] % numShards(default 100 shards). -
Shard Merkle trees: next to each shard file:
<shardId>.dat(e.g.96.dat). Used to build the collection Merkle tree. -
Collection Merkle tree:
metadata/<collectionName>/collection.dat(e.g.metadata/metadata/collection.dat). Aggregates shard root hashes.
Shard files are versioned binary blobs with an optional SHA-256 checksum.
Generic serialized file layout (when checksum is enabled):
-
[4 bytes], Version (uint32 LE). -
[payload], Version-specific payload. -
[32 bytes], SHA-256 checksum ofversion + payload.
Shard payload (version 2; version 1 is legacy, fields-only):
-
[4 bytes], Record count (uint32 LE). - For each record (sorted by
_id):-
[16 bytes], Record ID as raw UUID bytes (no dashes, 16 bytes hex decoded). -
[BSON], Record fields (BSON document;_idis stored separately). -
[BSON], Metadata (version 2 only):{ timestamp?, fields? }for field-level timestamps.
-
Record IDs are normalized to 16-byte hex (UUID without dashes) for shard keying; on read they are formatted back to standard UUID string.
Sort indexes live under metadata/sort_indexes/<collectionName>/<fieldName>_<direction>/ (e.g. metadata/sort_indexes/metadata/hash_asc/, metadata/sort_indexes/metadata/photoDate_desc/).
Each index directory contains:
-
tree.dat: B-tree metadata and node descriptors (version 2). Same versioned + checksummed wrapper as above. -
<pageId>: Leaf page files; page IDs are UUIDs. Each file is a serialized page of index entries (version 1, with checksum). -
build.checkpoint: Optional JSON checkpoint for incremental index builds.
tree.dat payload (version 2):
-
totalEntries(uint32),totalPages(uint32). -
rootPageId(buffer/length-prefixed string). -
fieldName,direction(buffer/length-prefixed strings). -
type(uint8): 0 = none, 1 = date, 2 = string, 3 = number. - Reserved 8 bytes (uint64).
-
nodeCount(uint32). - For each node (by sorted pageId):
-
pageId(length-prefixed buffer). - Node: legacy 4-byte skip, BSON
{ keys },children.length(uint32), child IDs (length-prefixed strings),nextLeaf,previousLeaf(length-prefixed strings).
-
Leaf page file payload (version 1):
- Record count (uint32 LE).
- For each entry:
- Record ID (length-prefixed buffer, UTF-8).
- Value (BSON
{ value }). - Record fields (BSON document).
A separate Merkle tree over asset/display/thumb paths and related metadata is stored at .db/tree.dat (relative to the database root).
-
Path:
.db/tree.dat. -
Content: Sort tree + Merkle tree + optional database metadata (e.g.
filesImported,deletedAssetIds,isPartial). - Serialization: Same as other Merkle tree files (version 5); stored without a trailing checksum.
-
Leaf names: Paths like
asset/<uuid>,display/<uuid>,thumb/<uuid>; leaves store content hash and metadata for verify/sync.
A database can be full or partial. The layout and file formats are the same; the difference is which files are present on disk.
Full database: All asset files are stored: asset/, display/, and thumb/ each have one file per asset. The BSON metadata collection and .db/ are complete. This is the normal case after import or after a full replicate.
Partial database: Only thumb files and root-level files (e.g. README.md) are stored. The asset/ and display/ directories are missing or sparse, original and display-sized media are not on disk. The BSON metadata under metadata/ is still complete (all asset records and indexes are present), so the catalog is intact; only the full-size and display-size binaries are omitted. Partial databases are created by replicating with the partial option (e.g. “only copy thumb directory assets”).
The partial flag is stored in the files Merkle tree: in .db/tree.dat, the database metadata has isPartial: true when the database is partial. Tools use this to:
-
Verify: Treat missing
asset/anddisplay/files as expected, not as removed or corrupt. - Sync: When syncing to a partial database, only copy thumb and root-level files; do not copy asset or display files into the partial target.
So a partial database has the same directory structure and metadata as a full one, but only thumbnails (and optionally README) on disk. Missing files can be filled in lazily as required, for example, when a user browses the photo gallery, missing asset or display files can be downloaded from a remote database as they are viewed, or in bulk via a full replicate.
Versioned binary files use one of two layouts:
-
With checksum:
[4 bytes version][payload][32 bytes SHA-256(version+payload)]. -
Without checksum:
[4 bytes version][payload](used for Merkle tree files).
Primitives are little-endian (uint32, int32, uint64, int64); strings and buffers are length-prefixed; documents use BSON (lengths 32-bit where length-prefixed).
The database can use an encrypted storage backend for media and BSON data. When encryption is enabled, only certain paths are encrypted; the rest stay plain.
Encrypted (when a key is provided): everything under asset/, display/, thumb/, and metadata/, plus README.md. These are read and written through a storage backend that applies encryption.
Unencrypted: the .db/ directory (e.g. tree.dat, write.lock, encryption.pub). This is always stored in the clear so the application can detect that the database is encrypted and prompt for a key without needing the key to read the directory.
Each encrypted file is stored as:
- [512 bytes]: RSA-encrypted AES-256 key (decrypt with the private key to obtain the per-file symmetric key).
- [16 bytes]: AES initialization vector (IV).
- [remaining bytes]: Payload encrypted with AES-256-CBC using the decrypted key and IV.
The decrypted payload is the same as the unencrypted file (e.g. a versioned serialized blob or raw media).
The metadata collection stores asset records. Main fields:
-
_id, UUID string. -
origFileName,origPath?,contentType,width,height,hash. -
coordinates?,location?,duration?,fileDate,photoDate?,uploadDate. -
properties?,labels?,description?,deleted?. -
micro, base64 micro thumbnail. -
color,[number, number, number](e.g. dominant color).
Sort indexes used in practice: hash (asc), photoDate (desc).
<database root>/
README.md
.db/
tree.dat # Files Merkle tree
asset/
<uuid> # Original media (no extension)
display/
<uuid>
thumb/
<uuid>
metadata/ # BDB root
db.dat # Database Merkle tree
metadata/ # "metadata" collection
<shardId> # Shard data (e.g. 96)
<shardId>.dat # Shard Merkle tree (e.g. 96.dat)
collection.dat # Collection Merkle tree
sort_indexes/
metadata/
hash_asc/
tree.dat
<pageId> # UUID-named leaf pages
photoDate_desc/
tree.dat
<pageId>