Database Format Legacy

Photosphere Database Format (Legacy)

This document describes the legacy on-disk layout and binary formats used by the Photosphere media database. This is database version 5. The database version is stored in the version field of .db/tree.dat. For the current format (version 6), see Database-Format.md. To migrate a version 5 database to version 6, use psi upgrade.

Important: Do not modify database files manually. Use the Photosphere CLI (psi) for all operations.

1. Top-level directory layout

A database is a single root directory. All paths below are relative to that root.

Path	Description
`README.md`	Auto-generated warning and usage instructions.
`.db/`	Database-level control and integrity data.
`.db/tree.dat`	Files Merkle tree: hashes of asset/display/thumb files and metadata; used for sync and verify.
`asset/`	Original imported media files (one file per asset, keyed by asset UUID, no extension).
`display/`	Display-sized derivatives (e.g. max 1000px, JPEG). One file per asset, keyed by UUID.
`thumb/`	Thumbnail derivatives (e.g. max 300px, JPEG). One file per asset, keyed by UUID.
`metadata/`	BSON database root: all structured metadata and indexes live under this prefix.

2. BSON database under `metadata/`

Structured metadata is stored in a BSON-based layout with sharded collections and sort indexes; its root is metadata/.

2.1 Database Merkle tree

Path	Description
`metadata/db.dat`	Database Merkle tree: one root hash per collection; used for replication and integrity.

Format: versioned Merkle tree serialization (current version 5), stored without a trailing checksum.

2.2 Collections

Each collection is a directory under metadata/ (e.g. metadata/metadata for the asset metadata collection). The Photosphere app uses a single collection named metadata whose records are asset documents.

Collection directory contents

Shard files: one file per shard, named by shard ID (e.g. 0, 1, …, 96). No extension. Shard ID is md5(recordId)[0:8] % numShards (default 100 shards).
Shard Merkle trees: next to each shard file: <shardId>.dat (e.g. 96.dat). Used to build the collection Merkle tree.
Collection Merkle tree: metadata/<collectionName>/collection.dat (e.g. metadata/metadata/collection.dat). Aggregates shard root hashes.

2.3 Shard file format (collection shards)

Shard files are versioned binary blobs with an optional SHA-256 checksum.

Generic serialized file layout (when checksum is enabled):

[4 bytes], Version (uint32 LE).
[payload], Version-specific payload.
[32 bytes], SHA-256 checksum of version + payload.

Shard payload (version 2; version 1 is legacy, fields-only):

[4 bytes], Record count (uint32 LE).
For each record (sorted by _id):
- [16 bytes], Record ID as raw UUID bytes (no dashes, 16 bytes hex decoded).
- [BSON], Record fields (BSON document; _id is stored separately).
- [BSON], Metadata (version 2 only): { timestamp?, fields? } for field-level timestamps.

Record IDs are normalized to 16-byte hex (UUID without dashes) for shard keying; on read they are formatted back to standard UUID string.

2.4 Sort indexes

Sort indexes live under metadata/sort_indexes/<collectionName>/<fieldName>_<direction>/ (e.g. metadata/sort_indexes/metadata/hash_asc/, metadata/sort_indexes/metadata/photoDate_desc/).

Each index directory contains:

tree.dat: B-tree metadata and node descriptors (version 2). Same versioned + checksummed wrapper as above.
<pageId>: Leaf page files; page IDs are UUIDs. Each file is a serialized page of index entries (version 1, with checksum).
build.checkpoint: Optional JSON checkpoint for incremental index builds.

tree.dat payload (version 2):

totalEntries (uint32), totalPages (uint32).
rootPageId (buffer/length-prefixed string).
fieldName, direction (buffer/length-prefixed strings).
type (uint8): 0 = none, 1 = date, 2 = string, 3 = number.
Reserved 8 bytes (uint64).
nodeCount (uint32).
For each node (by sorted pageId):
- pageId (length-prefixed buffer).
- Node: legacy 4-byte skip, BSON { keys }, children.length (uint32), child IDs (length-prefixed strings), nextLeaf, previousLeaf (length-prefixed strings).

Leaf page file payload (version 1):

Record count (uint32 LE).
For each entry:
- Record ID (length-prefixed buffer, UTF-8).
- Value (BSON { value }).
- Record fields (BSON document).

3. Files Merkle tree (`.db/tree.dat`)

A separate Merkle tree over asset/display/thumb paths and related metadata is stored at .db/tree.dat (relative to the database root).

Path: .db/tree.dat.
Content: Sort tree + Merkle tree + optional database metadata (e.g. filesImported, deletedAssetIds, isPartial).
Serialization: Same as other Merkle tree files (version 5); stored without a trailing checksum.
Leaf names: Paths like asset/<uuid>, display/<uuid>, thumb/<uuid>; leaves store content hash and metadata for verify/sync.

4. Partial vs full databases

A database can be full or partial. The layout and file formats are the same; the difference is which files are present on disk.

Full database: All asset files are stored: asset/, display/, and thumb/ each have one file per asset. The BSON metadata collection and .db/ are complete. This is the normal case after import or after a full replicate.

Partial database: Only thumb files and root-level files (e.g. README.md) are stored. The asset/ and display/ directories are missing or sparse, original and display-sized media are not on disk. The BSON metadata under metadata/ is still complete (all asset records and indexes are present), so the catalog is intact; only the full-size and display-size binaries are omitted. Partial databases are created by replicating with the partial option (e.g. “only copy thumb directory assets”).

The partial flag is stored in the files Merkle tree: in .db/tree.dat, the database metadata has isPartial: true when the database is partial. Tools use this to:

Verify: Treat missing asset/ and display/ files as expected, not as removed or corrupt.
Sync: When syncing to a partial database, only copy thumb and root-level files; do not copy asset or display files into the partial target.

So a partial database has the same directory structure and metadata as a full one, but only thumbnails (and optionally README) on disk. Missing files can be filled in lazily as required, for example, when a user browses the photo gallery, missing asset or display files can be downloaded from a remote database as they are viewed, or in bulk via a full replicate.

5. Versioned file layout

Versioned binary files use one of two layouts:

With checksum: [4 bytes version][payload][32 bytes SHA-256(version+payload)].
Without checksum: [4 bytes version][payload] (used for Merkle tree files).

Primitives are little-endian (uint32, int32, uint64, int64); strings and buffers are length-prefixed; documents use BSON (lengths 32-bit where length-prefixed).

6. Optional encryption

The database can use an encrypted storage backend for media and BSON data. When encryption is enabled, only certain paths are encrypted; the rest stay plain.

Encrypted (when a key is provided): everything under asset/, display/, thumb/, and metadata/, plus README.md. These are read and written through a storage backend that applies encryption.

Unencrypted: the .db/ directory (e.g. tree.dat, write.lock, encryption.pub). This is always stored in the clear so the application can detect that the database is encrypted and prompt for a key without needing the key to read the directory.

Each encrypted file is stored as:

[512 bytes]: RSA-encrypted AES-256 key (decrypt with the private key to obtain the per-file symmetric key).
[16 bytes]: AES initialization vector (IV).
[remaining bytes]: Payload encrypted with AES-256-CBC using the decrypted key and IV.

The decrypted payload is the same as the unencrypted file (e.g. a versioned serialized blob or raw media).

7. Asset record shape

The metadata collection stores asset records. Main fields:

_id, UUID string.
origFileName, origPath?, contentType, width, height, hash.
coordinates?, location?, duration?, fileDate, photoDate?, uploadDate.
properties?, labels?, description?, deleted?.
micro, base64 micro thumbnail.
color, [number, number, number] (e.g. dominant color).

Sort indexes used in practice: hash (asc), photoDate (desc).

8. Summary diagram

<database root>/
  README.md
  .db/
    tree.dat              # Files Merkle tree
  asset/
    <uuid>                # Original media (no extension)
  display/
    <uuid>
  thumb/
    <uuid>
  metadata/               # BDB root
    db.dat                # Database Merkle tree
    metadata/             # "metadata" collection
      <shardId>           # Shard data (e.g. 96)
      <shardId>.dat       # Shard Merkle tree (e.g. 96.dat)
      collection.dat      # Collection Merkle tree
    sort_indexes/
      metadata/
        hash_asc/
          tree.dat
          <pageId>        # UUID-named leaf pages
        photoDate_desc/
          tree.dat
          <pageId>

Database Format Legacy

Photosphere Database Format (Legacy)

1. Top-level directory layout

2. BSON database under metadata/

2.1 Database Merkle tree

2.2 Collections

Collection directory contents

2.3 Shard file format (collection shards)

2.4 Sort indexes

3. Files Merkle tree (.db/tree.dat)

4. Partial vs full databases

5. Versioned file layout

6. Optional encryption

7. Asset record shape

8. Summary diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

2. BSON database under `metadata/`

3. Files Merkle tree (`.db/tree.dat`)