Skip to content

Storage Engine v4 — Stack 1: protos, codec, engine skeleton#868

Open
c1-squire-dev[bot] wants to merge 1 commit into
pquerna/storage-v4-parentfrom
pquerna/storage-v4-stack1-protos-codegen
Open

Storage Engine v4 — Stack 1: protos, codec, engine skeleton#868
c1-squire-dev[bot] wants to merge 1 commit into
pquerna/storage-v4-parentfrom
pquerna/storage-v4-stack1-protos-codegen

Conversation

@c1-squire-dev
Copy link
Copy Markdown
Contributor

@c1-squire-dev c1-squire-dev Bot commented May 24, 2026

Summary

Stack 1 of RFC 0004 (storage engine v4). Stacks on PR #867.

New protos under proto/c1/storage/v3/:

  • options.proto — proto2 file with TableOption + IndexOption descriptor extensions (field 90000 / 90001) so record types declare their primary key and index shapes in the schema.
  • refs.proto — identity-only EntitlementRef, PrincipalRef, ResourceRef.
  • records.protoGrantRecord, EntitlementRecord, ResourceRecord, ResourceTypeRecord, AssetRecord, SyncRunRecord, plus mirror types (GrantExpandableRecord, GrantSourceRecord, SyncType enum). All declare (table) options.

New Go codec layer under pkg/dotc1z/engine/pebble/codec/:

  • tuple.go — FoundationDB-style tuple encoder (AppendTupleString, AppendTupleBytes, AppendTupleInt32/64, AppendTupleUint32/64, AppendTupleBool, AppendTupleSeparator) with the NUL/escape rules from RFC §3.5. Property-tested for the prefix-free property in Stack 5's microtests.
  • syncid.goEncodeSyncID(string)→[]byte / DecodeSyncID([]byte)→string using KSUID's 20-byte canonical binary form (lex-equivalent to base62 but 7 bytes smaller per row × hundreds of millions of rows).
  • registry.goCodec interface + frozen-after-init map for generated codecs + sync.Map reflection cache. Lookup() returns a generated codec when registered or constructs a ReflectCodec lazily.
  • reflect.goReflectCodec covers value-side encode/decode (deterministic proto.Marshal); key-side is gated by (table) options arriving with codegen (deferred — see §8).

Engine skeleton at pkg/dotc1z/engine/pebble/engine_stub.go — centralized sentinel-error declarations (Appendix E) only. The full engine struct lands in Stack 3.

Build tag //go:build batonsdkv2 throughout — connector binaries don't link Pebble unless explicitly built with the tag.

Test plan

  • make lint clean
  • codec tests pass (TestAppendTupleString, TestEncodeSyncIDRoundtrip, etc)
  • proto compiles via the generated pb/c1/storage/v3/ (vendored)
  • CI green

Notes on deferred items (RFC §8)

  • protoc-gen-batonstore codegen plugin DEFERRED — the ReflectCodec covers the MVP; codegen is a perf optimization for the hot record types and lands as a follow-up on this branch once the engine is in production use.

🤖 Generated with Claude Code

@btipling btipling self-assigned this May 24, 2026
Stack 1 of the storage-engine-v4 PR series (per RFC v4 in the
pebble-baton-sdk squire plan). All new code lives under the
`//go:build batonsdkv2` build tag so default connector binaries are
unaffected.

Protos at `proto/c1/storage/v3/`:

  * `options.proto` — TableOption + IndexOption descriptor extensions
    (proto2 syntax; proto3 forbids extending non-descriptor messages).
  * `refs.proto` — EntitlementRef, PrincipalRef, ResourceRef.
  * `records.proto` — six record types (ResourceTypeRecord,
    ResourceRecord, EntitlementRecord, GrantRecord, AssetRecord,
    SyncRunRecord) plus the v3-owned mirrors of v2 types
    (GrantExpandableRecord, GrantSourceRecord, SyncType).

Generated Go committed under `pb/c1/storage/v3/`.

Codec layer at `pkg/dotc1z/engine/pebble/codec/`:

  * `tuple.go` — FoundationDB-style tuple encoding with NUL escape
    rules that work for raw bytes (not just UTF-8 strings); appenders
    for string, bytes, int32, int64, uint32, uint64, bool, plus a
    decoder that consumes a single component.
  * `syncid.go` — KSUID 20-byte canonical binary encoding for
    sync_id keys (saves ~2 GB at 100M-grant scale vs. storing the
    27-char base62 string redundantly across primary + 4 indexes).
  * `registry.go` — Codec interface with error-returning methods
    (no panics on type mismatch); frozen-after-init map keyed by
    proto FullName; lazy ReflectCodec fallback cached process-wide.
  * `reflect.go` — *ReflectCodec skeleton. Value-side encode/decode
    is wired (deterministic proto.Marshal); key-side requires
    (storage.v3.table) walking and lands in Stack 3.
  * `errors.go` — ErrCodecTypeMismatch, ErrInvalidSyncID,
    ErrInvalidTuple.

Engine stub at `pkg/dotc1z/engine/pebble/`:

  * `engine_stub.go` — empty Engine struct + the full sentinel-error
    set from RFC v4 Appendix E. Stack 3 fills in the engine; Stack 2
    consumes the sentinels for envelope errors.

Tests: tuple-encoding prefix-free invariant ported from the
microtest at `/tmp/baton-rfc-microtests/tuple_test.go` and run in-tree
with `-tags=batonsdkv2`. Sync-ID roundtrip + invalid-input + order-
preserving tests. Registry generated-hit and reflection-fallback tests.

Deferred to follow-up commits on this branch:
  * `cmd/protoc-gen-batonstore/` codegen plugin (Appendix D). The
    typed codecs the plugin emits land alongside the plugin.
  * `buf.gen.yaml` entry once the plugin compiles.
  * Per-record generated codecs under `pkg/dotc1z/engine/pebble/gen/`.

Dependencies: adds `github.com/cockroachdb/pebble` via local
`replace` to `/data/squire/src/pebble` for now (the modern Pebble
APIs — IngestAndExcise, FormatValueSeparation, DBCompressionGood —
aren't in any tagged release yet). The replace will lift to a real
version pin before merge.

Refs: RFC v4 §3.4 (record protos), §3.6 (codec hybrid), Appendix A
(full record proto), Appendix D (codegen toolchain), Appendix E
(sentinels), micro-test results in research/import-11.md.
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-parent branch from 4ea29f4 to 464a8f3 Compare May 24, 2026 21:11
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack1-protos-codegen branch from 5b79a33 to 205e0e4 Compare May 24, 2026 21:11
@c1-squire-dev
Copy link
Copy Markdown
Contributor Author

c1-squire-dev Bot commented May 24, 2026

Rebased onto updated Parent (PR #867) after review fixes from btipling + pr-review bot. See #867 for the specific fixes, or PR #874 for the combined squash view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants