Database Format

Photosphere Database Format

This document describes the on-disk layout and binary formats used by the Photosphere media database (current format).

Database version: The database version is determined by the version field in .db/files.dat. The current format is version 6. The legacy format is version 5 (see Database-Format-Legacy.md). Most psi commands only work with version 6. The command psi upgrade migrates a database from older versions (including legacy version 5) to version 6.

Important: Do not modify database files manually. Use the Photosphere CLI (psi) for all operations.

1. Top-level directory layout

A database is a single root directory. All paths below are relative to that root.

Path	Description
`README.md`	Auto-generated warning and usage instructions.
`.db/`	Database-level control and integrity data; see below.
`asset/`	Original imported media files (one file per asset, keyed by asset UUID, no extension).
`display/`	Display-sized derivatives (e.g. max 1000px, JPEG). One file per asset, keyed by UUID.
`thumb/`	Thumbnail derivatives (e.g. max 300px, JPEG). One file per asset, keyed by UUID.

The .db/ directory contains all control and structured data: the BSON database, files Merkle tree, config (including origin), and lock/marker files. Media directories (asset/, display/, thumb/) contain only binary blobs (one per asset).

Example database structure

<database root>/
  README.md
  .db/
    files.dat             # Files Merkle tree (versioned + type + checksum, then encrypted)
    config.json           # Config (origin, etc.) (optional)
    write.lock
    encryption.pub        # Optional encryption marker
    bson/                 # BSON database root
      db.dat              # Database Merkle tree
      collections/
        metadata/         # "metadata" collection
          shards/
            <shardId>     # Shard data
            <shardId>.dat # Shard Merkle tree
          collection.dat  # Collection Merkle tree
      indexes/
        metadata/
          hash_asc/
            tree.dat
            <pageId>      # UUID-named leaf pages
          photoDate_desc/
            tree.dat
            <pageId>
  asset/
    <uuid>                # Original media (encrypted)
  display/
    <uuid>                # Display media.
  thumb/
    <uuid>                # Thumbnail media.

2. The `.db/` directory

Path	Description
`.db/bson/`	BSON database root — all structured metadata and indexes (see §3).
`.db/files.dat`	Files Merkle tree — hashes of asset/display/thumb files and database metadata; used for sync and verify.
`.db/config.json`	Configuration file — JSON object with an `origin` field (path to the database this copy was replicated from) and room for other settings (see §6). Used for sync, repair, and fulfilling missing files.
`.db/write.lock`	Write lock (when held).
`.db/encryption.pub`	Optional marker: copy of the public key used for encryption (enables “this DB is encrypted” detection).

All files under .db/ (and everywhere else) are stored in the encrypted file format when the database is encrypted (see §5). Serialized files under .db/ use the versioned serialized format (version, type, payload, checksum) before encryption (see §4).

3. BSON database under `.db/bson/`

Structured metadata is stored in a BSON-based layout with sharded collections and sort indexes; its root is .db/bson/.

Summary of contents under .db/bson/:

Path	Description
`db.dat`	Database Merkle tree — one root hash per collection (see §3.1).
`collections/`	One subdirectory per collection; each contains a `shards/` subdirectory (shard files and shard Merkle trees `<shardId>.dat`) and `collection.dat` at the collection root (see §3.2, §3.3).
`indexes/`	One subdirectory per index, named `<collectionName>/<fieldName>_<direction>/`; each contains B-tree metadata and leaf page files (see §3.4).

3.1 Database Merkle tree

Path	Description
`.db/bson/db.dat`	Database Merkle tree — one root hash per collection; used for replication and integrity.

Format: versioned serialized file (version, type, payload, checksum). Payload is the Merkle tree serialization (e.g. current tree version 5).

3.2 Collections

Each collection is a directory under .db/bson/collections/ (e.g. .db/bson/collections/metadata for the asset metadata collection). The Photosphere app uses a single collection named metadata whose records are asset documents.

Collection directory contents

shards/ — Subdirectory containing all shard data for the collection:
- Shard files — one file per shard, named by shard ID (e.g. 0, 1, …, 96). No extension. Shard ID is md5(recordId)[0:8] % numShards (default 100 shards). Path: .db/bson/collections/<collectionName>/shards/<shardId>.
- Shard Merkle trees — next to each shard file: <shardId>.dat (e.g. 96.dat). Used to build the collection Merkle tree. Path: .db/bson/collections/<collectionName>/shards/<shardId>.dat.
Collection Merkle tree — .db/bson/collections/<collectionName>/collection.dat (e.g. .db/bson/collections/metadata/collection.dat), at the collection root. Aggregates shard root hashes.

3.3 Shard file format (collection shards)

A shard is a file that holds many collection records; records are distributed across shards by shard ID (see §3.2). Shard files use the versioned serialized format (see §4): version, type, payload, then SHA-256 checksum. The payload (e.g. version 2) is:

[4 bytes] — Record count (uint32 LE).
For each record (sorted by _id):
- [16 bytes] — Record ID as raw UUID bytes (no dashes, 16 bytes hex decoded).
- [BSON] — Record fields (BSON document; _id is stored separately).
- [BSON] — Metadata: { timestamp?, fields? } for field-level timestamps.

Record IDs are normalized to 16-byte hex (UUID without dashes) for shard keying; on read they are formatted back to standard UUID string.

3.4 Sort indexes

Sort indexes live under .db/bson/indexes/<collectionName>/<fieldName>_<direction>/ (e.g. .db/bson/indexes/metadata/hash_asc/, .db/bson/indexes/metadata/photoDate_desc/). Collection data lives under .db/bson/collections/<collectionName>/ (shards under shards/, collection Merkle tree collection.dat at collection root); sort indexes remain under .db/bson/indexes/. Each index orders collection records by the indexed field’s value in the given direction (asc or desc). The type (date, string, number) determines how values are compared for ordering: dates as timestamps, strings lexicographically, numbers numerically. The B-tree’s keys are these values; leaf pages hold index entries (record ID, value, and a copy of the record’s fields) in sorted order so that pagination and range queries can be served without scanning the whole collection.

Each index directory contains:

tree.dat — B-tree metadata and node descriptors. Versioned serialized format with type and checksum.
<pageId> — Leaf page files; page IDs are UUIDs. Each file is a versioned serialized page of index entries.
build.checkpoint — Optional JSON checkpoint for incremental index builds (also stored in encrypted form if the DB is encrypted).

tree.dat payload (version 2):

totalEntries (uint32), totalPages (uint32).
rootPageId (buffer/length-prefixed string).
fieldName, direction (buffer/length-prefixed strings).
type (uint8): 0 = none, 1 = date, 2 = string, 3 = number.
Reserved 8 bytes (uint64).
nodeCount (uint32).
For each node (by sorted pageId): pageId, node (keys, children, nextLeaf, previousLeaf), etc.

Leaf page file payload (version 1):

Record count (uint32 LE).
For each entry: record ID (length-prefixed buffer, UTF-8), value (BSON { value }), record fields (BSON document).

4. Versioned serialized file layout

Every serialized file (Merkle trees, BSON shards, sort index trees and pages, etc.) uses a single layout so that readers can verify and dispatch by type.

Layout (always used):

[4 bytes] — Version (uint32 LE).
[4 bytes] — Type code (4-character ASCII, 32 bits). Identifies the kind of file. Each file kind has a distinct 4-byte ASCII code (e.g. FTRE = files Merkle tree, BDBT = BSON database tree, SHAR = collection shard, COLT = collection Merkle tree, IDXT = index B-tree metadata, IDXP = index leaf page). Stored in the same byte order as the rest of the file (e.g. little-endian as a uint32). Writers and readers use the same code for each kind. Readers use the type code to route to the correct deserializer or reject unknown types.
[payload] — Version- and type-specific payload.
[32 bytes] — SHA-256 checksum of the concatenation: version + type + payload.

Primitives are little-endian (uint32, int32, uint64, int64); strings and buffers are length-prefixed; documents use BSON (lengths 32-bit where length-prefixed).

5. Encryption

All files under the database root are encrypted when the database is encrypted. There is no mixed encrypted/unencrypted layout: asset, display, thumb, and the entire .db/ tree (including .db/files.dat, .db/bson/*, .db/config.json, .db/write.lock, .db/encryption.pub) are stored in the encrypted file format.

Encrypted file format:

Each encrypted file consists of a clear header (so the app can identify encryption and key without decrypting), followed by the encrypted payload.

Clear header:

[4 bytes] — Encrypted file format version (uint32 LE).
[1 byte] — Encryption type code (e.g. 0 = none, 1 = RSA + AES-256-CBC per file).
[32 bytes] — SHA-256 hash of the public key used to encrypt this file. Used to match the file to a key and to detect key mismatch.

Encrypted payload (after the header):

Payload is encrypted with the same scheme as the legacy format: per-file RSA-encrypted AES-256 key, IV, then AES-256-CBC ciphertext. The plaintext that is encrypted is the full serialized content (e.g. the versioned serialized blob with version, type, payload, and checksum, or a raw media blob).

So a reader can: (1) read the clear header to see format version, encryption type, and public-key hash; (2) select the correct private key (if any); (3) decrypt the payload; (4) if the payload is a serialized file, verify checksum and dispatch on type.

6. Origin

A database can record its origin: the database it was replicated from (e.g. a path or URI). The origin is stored in .db/config.json, a JSON configuration file. The file contains an object with at least an origin field whose value is the path (or URI) to the origin database. Other fields may be added to this file for future configuration (e.g. sync options, repair preferences).

Example:

{
  "origin": "/path/to/source/database"
}

The origin is used to:

Sync — Know which remote database to sync with.
Repair — Know which source to use when repairing or validating files.
Fulfil missing files — For partial databases, know where to fetch missing asset/display files when the user browses the gallery or when filling lazily.

If the database was not created by replication, .db/config.json may be absent, or present without an origin field.

7. Files Merkle tree (`.db/files.dat`)

A Merkle tree over asset/display/thumb paths and related metadata is stored at .db/files.dat.

Path: .db/files.dat.
Content: Sort tree + Merkle tree + optional database metadata (e.g. filesImported, deletedAssetIds, isPartial).
Serialization: Versioned serialized format (version, type, payload, checksum). Payload is the same logical content as in the legacy format (e.g. tree version 5). The version field in this file is the database version; version 6 denotes the current format described in this document.
Leaf names: Paths like asset/<uuid>, display/<uuid>, thumb/<uuid>; leaves store content hash and metadata for verify/sync.

8. Partial vs full databases

A database can be full or partial. The layout and file formats are the same; the difference is which files are present on disk.

Full database: All asset files are stored: asset/, display/, and thumb/ each have one file per asset. The BSON database under .db/bson/ and .db/ are complete. This is the normal case after import or after a full replicate.

Partial database: Only thumb files and root-level files (e.g. README.md) are stored. The asset/ and display/ directories are missing or sparse. The BSON database under .db/bson/ is still complete (all asset records and indexes). Partial databases are created by replicating with the partial option.

The partial flag is stored in the files Merkle tree (.db/files.dat) in the database metadata: isPartial: true. Tools use this to treat missing asset/display files as expected (verify) and to only copy thumb and root-level files when syncing to a partial target. Missing files can be filled in lazily (e.g. download from the origin database as the user views photos in the gallery) or in bulk via a full replicate.

9. Asset record shape

The metadata collection stores asset records. Main fields:

_id — UUID string.
origFileName, origPath?, contentType, width, height, hash.
coordinates?, location?, duration?, fileDate, photoDate?, uploadDate.
properties?, labels?, description?, deleted?.
micro — base64 micro thumbnail.
color — [number, number, number] (e.g. dominant color).

Sort indexes used in practice: hash (asc), photoDate (desc).

10. Migration from legacy format to current format

Migration from the legacy format (version 5; see Database-Format-Legacy.md) to the current format (version 6) is performed by the existing command psi upgrade. The database version is stored in the files Merkle tree: in version 5 that file is .db/tree.dat; in version 6 it is .db/files.dat. After a successful upgrade, the file is .db/files.dat and the version is 6.

The following changes are applied when converting a database from version 5 to version 6.

10.1 Layout and path changes

BSON database location: Move the entire BSON tree from metadata/ at the database root to .db/bson/. That is:
- Move metadata/db.dat → .db/bson/db.dat
- Create .db/bson/collections/ and move collection data from metadata/metadata/: create .db/bson/collections/metadata/shards/, move shard files and <shardId>.dat into shards/, move collection.dat to .db/bson/collections/metadata/collection.dat.
- Create .db/bson/indexes/ and rebuild the sort index from the collection data (do not move metadata/sort_indexes/; delete and rebuild so no sort-index format conversion is needed).
- Remove the now-empty metadata/ directory.
Files Merkle tree: Already under .db/files.dat; no path change, but re-serialize using the new versioned format (version, type, checksum) and, if the DB is encrypted, wrap in the new encrypted file format (header + encrypted payload).

10.2 Serialized file format changes

Versioned serialized files: For every serialized file (Merkle trees, BSON shards, index trees and pages under .db/bson/indexes/):
- Add a type field (4 bytes: 4-character ASCII type code, 32 bits) after the version field. Use a distinct 4-character ASCII code per kind of file (e.g. FTRE, BDBT, SHAR, COLT, IDXT, IDXP).
- Ensure a checksum is always present: 32 bytes SHA-256(version + type + payload) after the payload. Legacy files that were stored without checksum must be read with the legacy deserializer, then re-serialized with the new layout (version, type, payload, checksum).
Legacy “no checksum” option: No longer used. All new serialized files have a checksum.

10.3 Encryption changes

Encrypt everything: In the legacy format, only certain paths (asset, display, thumb, metadata, README) were encrypted; .db/ was not. In the current format, all files are encrypted when the database is encrypted, including .db/files.dat, .db/bson/**, .db/config.json, .db/write.lock, .db/encryption.pub.
Encrypted file header: For each encrypted file, prepend the clear header before the existing encrypted payload:
- 4 bytes: encrypted file format version
- 1 byte: encryption type code
- 32 bytes: SHA-256 hash of the public key used for encryption Then store the existing per-file encrypted payload (RSA-wrapped key + IV + ciphertext) as-is, so the payload continues to decrypt to the same plaintext (e.g. the versioned serialized blob or raw media).

10.4 New and optional data

Origin / config: If the database was replicated from another, create or update .db/config.json with an origin field set to the path (or URI) of the source database. If not replicated, .db/config.json may be omitted or may exist without an origin field. Other configuration can be added to this JSON file as needed.
Type codes: Use a 4-character ASCII code (4 bytes, fits in 32 bits) per serialized file kind: e.g. FTRE (files Merkle tree), BDBT (BSON database tree), SHAR (collection shard), COLT (collection Merkle tree), IDXT (index B-tree metadata), IDXP (index leaf page). Use the same codes in writers and readers. Note: .db/config.json is plain JSON, not versioned serialized, so it does not use type codes.

10.5 Order of operations (high level)

Create .db/bson/, .db/bson/collections/; move/copy db.dat and collection data from metadata/ to .db/bson/ (db.dat at bson root; collection dirs under collections/, each with shards under shards/ and collection.dat at collection root). Do not move metadata/sort_indexes/; create .db/bson/indexes/ and rebuild the sort index from the collection data. Re-serialize each moved file to the new versioned format (version, type, payload, checksum); index files are built in v6 and need no conversion.
Re-serialize .db/files.dat to the new versioned format (version, type, payload, checksum), and set the database version in that file to 6.
If encryption is enabled, re-wrap every file (including under .db/) in the new encrypted format: clear header (version, encryption type, public key hash) + existing encrypted payload. Ensure .db/ is no longer written in the clear.
Optionally write or update .db/config.json with an origin field if the DB has a known replication source.
Remove the legacy metadata/ directory and any legacy unencrypted .db/ files that have been replaced.

10.6 What `psi upgrade` must do (v5 → v6)

The psi upgrade command is the only psi command that runs against databases older than version 6. It reads the database version from the version field in the files Merkle tree (in v5 that file is .db/tree.dat; in v6 it is .db/files.dat). When the version is 5 (or older), the command must perform the following to convert the database to version 6.

Version check (already present):

Load the files Merkle tree from the path that exists (.db/tree.dat for v5, .db/files.dat for v6) and determine the current version (e.g. via loadTreeVersion or by loading the tree and reading merkleTree.version).
If version is already 6, exit successfully without changes.
If version is greater than 6, exit with an error asking the user to update the CLI.
If version is 5 or less, proceed with upgrade.

Steps the upgrade command must perform for v5 → v6:

Acquire write lock on the database (e.g. .db/write.lock) so no other process modifies it during upgrade.
Legacy v5 cleanup (already implemented for older upgrades):
- Fill in missing lastModified on tree leaves from file metadata where possible.
- If an assets/ directory exists, move its contents to asset/ and update the files Merkle tree accordingly.
- Create README.md at the database root if it does not exist.
- If the database is encrypted, ensure .db/encryption.pub exists (e.g. copy the public key into .db/ as a marker).
- Rebuild the files Merkle tree in sorted order, excluding legacy paths such as metadata/ and assets/ from the tree (so they are no longer referenced as file leaves).
Move BSON database from metadata/ to .db/bson/:
- Create the .db/bson/ directory and the .db/bson/collections/ subdirectory.
- Move (or copy then delete) metadata/db.dat → .db/bson/db.dat.
- For the collection: create .db/bson/collections/metadata/shards/; move shard files and <shardId>.dat from metadata/metadata/ into shards/; move metadata/metadata/collection.dat → .db/bson/collections/metadata/collection.dat.
- Do not move metadata/sort_indexes/; the sort index will be deleted and rebuilt in the next step so nothing relating to the sort index format needs to be converted.
- Remove the now-empty metadata/ directory.
Delete and rebuild BSON sort index: Create .db/bson/indexes/ and rebuild the sort index(es) from the collection data (e.g. open the metadata collection at .db/bson/collections/metadata/, build the index into .db/bson/indexes/<collectionName>/<fieldName>_<direction>/ using the v6 index writer). The old sort index is not converted; it is discarded and rebuilt, so no legacy index format handling is required.
Re-serialize all serialized files into the v6 format (version, type, checksum):
- For every file under .db/bson/ that was moved (db.dat at root; under collections/<name>/shards/: shard files and <shardId>.dat; at collections/<name>/: collection.dat): read with the legacy deserializer (no type field, optional checksum), then write using the new layout: 4 bytes version, 4 bytes type code (4-char ASCII), payload, 32 bytes SHA-256(version + type + payload). Use the 4-character ASCII type code for each file kind (see §10.4). Index files under indexes/ were just built in v6 format and need no conversion.
- Re-serialize .db/files.dat in the new format (version, type, payload, checksum). Set the database version in the saved tree to 6 (so the first 4 bytes of the serialized tree file, or the version property when written, are 6).
Encryption (when the database is encrypted):
- Upgrade reads unencrypted data from the .db/ directory during the conversion (in v5, .db/tree.dat, .db/write.lock, etc. are unencrypted). Only after moving and re-serializing does it apply v6 encryption.
- Switch to a single storage backend for the whole database root (no separate unencrypted metadataStorage). All reads and writes for the rest of the upgrade must go through the encrypted backend.
- The metadata moved to .db/bson/ must maintain its encryption (it was encrypted in v5; when written under .db/bson/ for v6, it is written through the single encrypted backend). All other files under .db/ (.db/files.dat, .db/config.json, .db/write.lock, etc.) were unencrypted in v5 and must be encrypted for v6.
- For every file under the database root (asset, display, thumb, README, and the entire .db/ tree including .db/files.dat, .db/bson/**, .db/config.json, .db/write.lock, .db/encryption.pub): ensure it is stored in the v6 encrypted format. That is: clear header (4 bytes format version, 1 byte encryption type code, 32 bytes SHA-256 of public key) followed by the existing encrypted payload (RSA-wrapped key + IV + AES-256-CBC ciphertext). Files that were previously unencrypted (e.g. under .db/ in v5) must be encrypted and given this header; files that were already encrypted need the header prepended and the payload left as-is.
- After this, the database has no mixed encrypted/unencrypted layout: all files are encrypted when a key is in use.
Origin / config (optional): If the database has a known replication source (e.g. passed in or recorded elsewhere), create or update .db/config.json with an origin field set to the path (or URI) of that source. If not replicated, omit the file or leave it without an origin field.
Rebuild BSON database Merkle tree: Rebuild the BSON database Merkle tree (e.g. buildDatabaseMerkleTree) using the new BSON root .db/bson/, with collections under .db/bson/collections/ and indexes under .db/bson/indexes/ (not metadata/ at database root). Save the result to .db/bson/db.dat in the v6 serialized format. If encryption is enabled, this write goes through the single encrypted storage backend.
Update files Merkle tree metadata: Set databaseMetadata.filesImported from the actual count of files under asset/ (or equivalent). Ensure the tree’s version property is 6 before saving.
Save the files Merkle tree: Write .db/files.dat in the v6 serialized format (version, type, payload, checksum), with version 6. If encryption is enabled, write through the encrypted backend so .db/files.dat is stored in the v6 encrypted file format (clear header + encrypted payload).
Release the write lock.

After a successful run, the database version in .db/files.dat is 6, all BSON data lives under .db/bson/ (db.dat, collections/, indexes/), all serialized files use version+type+checksum, and (when encryption is used) all files are encrypted with the v6 encrypted file header. Other psi commands can then operate on the database.

11. Code changes to load the current format

The following code changes allow the application to load (and, as needed, create and update) databases in the current format.

11.1 Storage paths and BSON root

BSON root: Stop using a root-level metadata/ directory for the BSON database. Use .db/bson/ as the BSON storage root. Collection data lives under .db/bson/collections/<collectionName>/ (shard files and shard Merkle trees under <collectionName>/shards/, collection Merkle tree collection.dat at collection root); index data lives under .db/bson/indexes/<collectionName>/<fieldName>_<direction>/. All call sites that open the BSON database, a collection, or an index should use the appropriate prefix (e.g. .db/bson/ for the DB, .db/bson/collections/metadata for the metadata collection with shards at metadata/shards/, .db/bson/indexes/metadata/hash_asc for an index) instead of metadata/ and metadata/sort_indexes/.
Single storage backend: When opening a database, pass a single storage instance that represents the database root. All paths (asset, display, thumb, .db/bson/, .db/files.dat, .db/config.json, etc.) are resolved under that root. Encryption, when enabled, applies to this single backend so that every file read/write goes through the same encrypted layer. The storage that serves .db/ (often exposed as metadataStorage to commands) must have encryption applied for v6 and above when the database is encrypted—there is no separate unencrypted metadata storage for v6.

11.2 Versioned serialized format (version, type, checksum)

Serialization layer: Extend the serialization library (or the layer that writes Merkle trees, BSON shards, index tree and leaf page files under .db/bson/indexes/) so that every serialized file is written as: [version (4)][type (4)][payload][checksum (32)]. The type is a 4-character ASCII code (4 bytes, 32 bits) per file kind (e.g. FTRE, BDBT, SHAR, COLT, IDXT, IDXP). On read, read version and type first; verify checksum after reading the payload; dispatch to the correct deserializer based on type (and version if needed).
Checksum: Always compute and verify the SHA-256 checksum for serialized files. Remove or bypass the “no checksum” code path for the current format.
Type codes: Use the same 4-character ASCII type codes in writers and readers (see §11.2). Config (.db/config.json) is JSON, not versioned serialized, so it does not use type codes.

11.3 Encryption

Encrypt all files: Remove the two-storage setup (encrypted assetStorage + unencrypted metadataStorage). Use one storage instance for the whole database root. When encryption is enabled, wrap that single backend with the encrypted storage implementation so that all files (including .db/files.dat, .db/bson/**, .db/config.json, .db/write.lock, .db/encryption.pub) are read and written through the encryption layer.
Encrypted file header: When writing an encrypted file, prepend the clear header: format version (4 bytes), encryption type code (1 byte), public key hash (32 bytes). When reading, read this header first to detect encryption and key; then decrypt the remainder of the file (existing RSA + AES-256-CBC per-file scheme). Update key selection logic to use the public key hash from the header (e.g. to choose the right key when multiple keys exist or to prompt for the correct key).

11.4 Config and origin

Read/write config: Add support for reading and writing .db/config.json (JSON). The file has at least an optional origin field (path or URI to the database this copy was replicated from). Expose the origin to sync, repair, and fulfil-missing-file logic. Other fields may be added for future use.
Use origin: In sync, repair, and lazy-fulfil flows, use the origin value from .db/config.json as the default remote or source when the user has not specified another. This allows partial replicas to fetch missing asset/display files from the database they were replicated from.

11.5 Version detection and upgrade

Detect version: On open, read the version field from .db/files.dat to determine the database version. Version 5 is the legacy format (BSON under metadata/ at database root, optional checksum, mixed encryption). Version 6 is the current format: BSON under .db/bson/ with collections under .db/bson/collections/ (shards under <collectionName>/shards/, collection.dat at collection root) and indexes under .db/bson/indexes/; files Merkle tree at .db/files.dat; version+type+checksum on all serialized files; all files encrypted when encryption is enabled, with encrypted file header; .db/config.json with optional origin field.
psi upgrade: Most psi commands only work with version 6. The command psi upgrade is the exception: it runs against databases of older versions and migrates them to version 6 (applying the changes described in §10). After upgrade, the database version in .db/files.dat is set to 6 and other commands can operate on it.

Database Format

Photosphere Database Format

1. Top-level directory layout

Example database structure

2. The .db/ directory

3. BSON database under .db/bson/

3.1 Database Merkle tree

3.2 Collections

Collection directory contents

3.3 Shard file format (collection shards)

3.4 Sort indexes

4. Versioned serialized file layout

5. Encryption

6. Origin

7. Files Merkle tree (.db/files.dat)

8. Partial vs full databases

9. Asset record shape

10. Migration from legacy format to current format

10.1 Layout and path changes

10.2 Serialized file format changes

10.3 Encryption changes

10.4 New and optional data

10.5 Order of operations (high level)

10.6 What psi upgrade must do (v5 → v6)

11. Code changes to load the current format

11.1 Storage paths and BSON root

11.2 Versioned serialized format (version, type, checksum)

11.3 Encryption

11.4 Config and origin

11.5 Version detection and upgrade

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

2. The `.db/` directory

3. BSON database under `.db/bson/`

7. Files Merkle tree (`.db/files.dat`)

10.6 What `psi upgrade` must do (v5 → v6)