Skip to content

Database Format Legacy

Ashley Davis edited this page Jun 14, 2026 · 3 revisions

Photosphere Database Format (Legacy)

This document describes the legacy on-disk layout and binary formats used by the Photosphere media database. This is database version 5. The database version is stored in the version field of .db/tree.dat. For the current format (version 6), see Database-Format.md. To migrate a version 5 database to version 6, use psi upgrade.

Important: Do not modify database files manually. Use the Photosphere CLI (psi) for all operations.


1. Top-level directory layout

A database is a single root directory. All paths below are relative to that root.

Path Description
README.md Auto-generated warning and usage instructions.
.db/ Database-level control and integrity data.
.db/tree.dat Files Merkle tree: hashes of asset/display/thumb files and metadata; used for sync and verify.
asset/ Original imported media files (one file per asset, keyed by asset UUID, no extension).
display/ Display-sized derivatives (e.g. max 1000px, JPEG). One file per asset, keyed by UUID.
thumb/ Thumbnail derivatives (e.g. max 300px, JPEG). One file per asset, keyed by UUID.
metadata/ BSON database root: all structured metadata and indexes live under this prefix.

2. BSON database under metadata/

Structured metadata is stored in a BSON-based layout with sharded collections and sort indexes; its root is metadata/.

2.1 Database Merkle tree

Path Description
metadata/db.dat Database Merkle tree: one root hash per collection; used for replication and integrity.

Format: versioned Merkle tree serialization (current version 5), stored without a trailing checksum.

2.2 Collections

Each collection is a directory under metadata/ (e.g. metadata/metadata for the asset metadata collection). The Photosphere app uses a single collection named metadata whose records are asset documents.

Collection directory contents

  • Shard files: one file per shard, named by shard ID (e.g. 0, 1, …, 96). No extension. Shard ID is md5(recordId)[0:8] % numShards (default 100 shards).
  • Shard Merkle trees: next to each shard file: <shardId>.dat (e.g. 96.dat). Used to build the collection Merkle tree.
  • Collection Merkle tree: metadata/<collectionName>/collection.dat (e.g. metadata/metadata/collection.dat). Aggregates shard root hashes.

2.3 Shard file format (collection shards)

Shard files are versioned binary blobs with an optional SHA-256 checksum.

Generic serialized file layout (when checksum is enabled):

  • [4 bytes], Version (uint32 LE).
  • [payload], Version-specific payload.
  • [32 bytes], SHA-256 checksum of version + payload.

Shard payload (version 2; version 1 is legacy, fields-only):

  • [4 bytes], Record count (uint32 LE).
  • For each record (sorted by _id):
    • [16 bytes], Record ID as raw UUID bytes (no dashes, 16 bytes hex decoded).
    • [BSON], Record fields (BSON document; _id is stored separately).
    • [BSON], Metadata (version 2 only): { timestamp?, fields? } for field-level timestamps.

Record IDs are normalized to 16-byte hex (UUID without dashes) for shard keying; on read they are formatted back to standard UUID string.

2.4 Sort indexes

Sort indexes live under metadata/sort_indexes/<collectionName>/<fieldName>_<direction>/ (e.g. metadata/sort_indexes/metadata/hash_asc/, metadata/sort_indexes/metadata/photoDate_desc/).

Each index directory contains:

  • tree.dat: B-tree metadata and node descriptors (version 2). Same versioned + checksummed wrapper as above.
  • <pageId>: Leaf page files; page IDs are UUIDs. Each file is a serialized page of index entries (version 1, with checksum).
  • build.checkpoint: Optional JSON checkpoint for incremental index builds.

tree.dat payload (version 2):

  • totalEntries (uint32), totalPages (uint32).
  • rootPageId (buffer/length-prefixed string).
  • fieldName, direction (buffer/length-prefixed strings).
  • type (uint8): 0 = none, 1 = date, 2 = string, 3 = number.
  • Reserved 8 bytes (uint64).
  • nodeCount (uint32).
  • For each node (by sorted pageId):
    • pageId (length-prefixed buffer).
    • Node: legacy 4-byte skip, BSON { keys }, children.length (uint32), child IDs (length-prefixed strings), nextLeaf, previousLeaf (length-prefixed strings).

Leaf page file payload (version 1):

  • Record count (uint32 LE).
  • For each entry:
    • Record ID (length-prefixed buffer, UTF-8).
    • Value (BSON { value }).
    • Record fields (BSON document).

3. Files Merkle tree (.db/tree.dat)

A separate Merkle tree over asset/display/thumb paths and related metadata is stored at .db/tree.dat (relative to the database root).

  • Path: .db/tree.dat.
  • Content: Sort tree + Merkle tree + optional database metadata (e.g. filesImported, deletedAssetIds, isPartial).
  • Serialization: Same as other Merkle tree files (version 5); stored without a trailing checksum.
  • Leaf names: Paths like asset/<uuid>, display/<uuid>, thumb/<uuid>; leaves store content hash and metadata for verify/sync.

4. Partial vs full databases

A database can be full or partial. The layout and file formats are the same; the difference is which files are present on disk.

Full database: All asset files are stored: asset/, display/, and thumb/ each have one file per asset. The BSON metadata collection and .db/ are complete. This is the normal case after import or after a full replicate.

Partial database: Only thumb files and root-level files (e.g. README.md) are stored. The asset/ and display/ directories are missing or sparse, original and display-sized media are not on disk. The BSON metadata under metadata/ is still complete (all asset records and indexes are present), so the catalog is intact; only the full-size and display-size binaries are omitted. Partial databases are created by replicating with the partial option (e.g. “only copy thumb directory assets”).

The partial flag is stored in the files Merkle tree: in .db/tree.dat, the database metadata has isPartial: true when the database is partial. Tools use this to:

  • Verify: Treat missing asset/ and display/ files as expected, not as removed or corrupt.
  • Sync: When syncing to a partial database, only copy thumb and root-level files; do not copy asset or display files into the partial target.

So a partial database has the same directory structure and metadata as a full one, but only thumbnails (and optionally README) on disk. Missing files can be filled in lazily as required, for example, when a user browses the photo gallery, missing asset or display files can be downloaded from a remote database as they are viewed, or in bulk via a full replicate.


5. Versioned file layout

Versioned binary files use one of two layouts:

  • With checksum: [4 bytes version][payload][32 bytes SHA-256(version+payload)].
  • Without checksum: [4 bytes version][payload] (used for Merkle tree files).

Primitives are little-endian (uint32, int32, uint64, int64); strings and buffers are length-prefixed; documents use BSON (lengths 32-bit where length-prefixed).


6. Optional encryption

The database can use an encrypted storage backend for media and BSON data. When encryption is enabled, only certain paths are encrypted; the rest stay plain.

Encrypted (when a key is provided): everything under asset/, display/, thumb/, and metadata/, plus README.md. These are read and written through a storage backend that applies encryption.

Unencrypted: the .db/ directory (e.g. tree.dat, write.lock, encryption.pub). This is always stored in the clear so the application can detect that the database is encrypted and prompt for a key without needing the key to read the directory.

Each encrypted file is stored as:

  • [512 bytes]: RSA-encrypted AES-256 key (decrypt with the private key to obtain the per-file symmetric key).
  • [16 bytes]: AES initialization vector (IV).
  • [remaining bytes]: Payload encrypted with AES-256-CBC using the decrypted key and IV.

The decrypted payload is the same as the unencrypted file (e.g. a versioned serialized blob or raw media).


7. Asset record shape

The metadata collection stores asset records. Main fields:

  • _id, UUID string.
  • origFileName, origPath?, contentType, width, height, hash.
  • coordinates?, location?, duration?, fileDate, photoDate?, uploadDate.
  • properties?, labels?, description?, deleted?.
  • micro, base64 micro thumbnail.
  • color, [number, number, number] (e.g. dominant color).

Sort indexes used in practice: hash (asc), photoDate (desc).


8. Summary diagram

<database root>/
  README.md
  .db/
    tree.dat              # Files Merkle tree
  asset/
    <uuid>                # Original media (no extension)
  display/
    <uuid>
  thumb/
    <uuid>
  metadata/               # BDB root
    db.dat                # Database Merkle tree
    metadata/             # "metadata" collection
      <shardId>           # Shard data (e.g. 96)
      <shardId>.dat       # Shard Merkle tree (e.g. 96.dat)
      collection.dat      # Collection Merkle tree
    sort_indexes/
      metadata/
        hash_asc/
          tree.dat
          <pageId>        # UUID-named leaf pages
        photoDate_desc/
          tree.dat
          <pageId>

Clone this wiki locally