From 1b683c38bee7ebead6a65b256a6618f046a65f91 Mon Sep 17 00:00:00 2001 From: Jascha Date: Thu, 14 May 2026 15:36:58 -0700 Subject: [PATCH] Spec: add key revocation, key distribution, size limits, replay enforcement MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Defense-in-depth additions (no wire-format change). - §4.1 model_hash now constrained to "sha256:" + hex of weights, omitted otherwise (no more signature-binding theatre on an undefined field). - §4.3 (new) Size limits: pin JSON ≤ 64 KiB, extra ≤ 32 entries, key ≤ 128 B, value ≤ 1 KiB, vec_dim ≤ 2^20, sig exactly 64 B. - §5 step 0 makes size-limit enforcement a verifier MUST before parsing; step 7 requires verifiers to MUST-check vectorpin.{record,collection, tenant}_id against caller-supplied expected values. - §5 adds KEY_EXPIRED, RECORD_MISMATCH, COLLECTION_MISMATCH, TENANT_MISMATCH, PARSE_ERROR to the required failure-mode taxonomy. - §7 split into rotation and revocation; introduces (valid_from, valid_until) registry semantics so a compromised key invalidates post-compromise pins without invalidating historical ones. - §10 (new) Key distribution: fingerprint format, transparency-log SHOULD, TOFU NOT RECOMMENDED, per-tenant kid separation SHOULD. - §9 security considerations updated to reference §4.3 DoS limits, §7 revocation window, and §5 step 7 replay enforcement. - §11 versioning adds an explicit downgrade-resistance argument. Wire format unchanged; existing pins continue to verify. --- docs/spec.md | 66 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 52 insertions(+), 14 deletions(-) diff --git a/docs/spec.md b/docs/spec.md index 9985416..86c0762 100644 --- a/docs/spec.md +++ b/docs/spec.md @@ -61,7 +61,7 @@ A Pin is a JSON object with the following fields: |---|---|---|---| | `v` | integer | yes | Protocol version. Must equal `1`. | | `model` | string | yes | Embedding model identifier. | -| `model_hash` | string | no | Optional content hash of the model weights. | +| `model_hash` | string | no | Optional content hash of the model weights. When present, MUST match the format `"sha256:" || hex(SHA-256(weights))` where the input is the concatenation of model weight shards in sorted filename order. Implementations that cannot meet this convention MUST omit the field rather than emit a non-conforming value. | | `source_hash` | string | yes | Hash of the source text (§3.1). | | `vec_hash` | string | yes | Hash of the embedding (§3.2). | | `vec_dtype` | string | yes | One of `"f32"` or `"f64"`. | @@ -83,7 +83,22 @@ The signature in `sig` is produced over a canonical byte sequence that excludes This canonical form is fed directly into Ed25519 signing. -### 4.3 Example +### 4.3 Size limits + +To bound parser resource consumption and prevent DoS through hostile pins, conforming v1 implementations MUST enforce: + +| Limit | Maximum | +|---|---| +| Total pin JSON, UTF-8 byte length | 64 KiB (65,536 bytes) | +| `extra` entry count | 32 | +| Any `extra` key, UTF-8 byte length | 128 bytes | +| Any `extra` value, UTF-8 byte length | 1 KiB (1,024 bytes) | +| `vec_dim` | 1,048,576 (2^20) | +| `sig`, decoded byte length | exactly 64 (Ed25519 signature) | + +Verifiers MUST reject oversized pins before parsing the signature. These limits are not part of the signed payload — they are parser-side defenses. + +### 4.4 Example ```json { @@ -103,38 +118,51 @@ This canonical form is fed directly into Ed25519 signing. A verifier MUST: +0. Reject pins whose serialized JSON exceeds the size limits in §4.3 before parsing. 1. Reject pins whose `v` field is unknown to it. -2. Reject pins whose `kid` is not in its key registry. +2. Reject pins whose `kid` is not in its key registry, OR whose registry entry's `valid_from`/`valid_until` window excludes `ts` (see §7). 3. Reconstruct the canonical byte sequence (§4.2) and verify `sig` against the registered public key for `kid`. 4. If a ground-truth source string was supplied, recompute `hash_text(source)` and compare to `source_hash`. 5. If a ground-truth vector was supplied, recompute `hash_vector(vector, vec_dtype)` and compare to `vec_hash`. Also check that the supplied vector's shape matches `vec_dim`. 6. If an expected model identifier was supplied, compare to `model`. +7. If the caller supplied an expected `vectorpin.record_id` / `vectorpin.collection_id` / `vectorpin.tenant_id`, the verifier MUST compare against the value in `extra` and reject on mismatch. These reserved keys are the v1 replay-protection mechanism (§8); a verifier that ignores them when the caller has supplied an expected value MUST be considered non-conformant. Verifiers MUST distinguish at least these failure modes (the reference implementation uses the names below; other implementations MAY use different names but MUST distinguish the cases): - `UNSUPPORTED_VERSION` - `UNKNOWN_KEY` +- `KEY_EXPIRED` — `kid` is registered but `ts` falls outside the registered validity window. - `SIGNATURE_INVALID` - `VECTOR_TAMPERED` - `SOURCE_MISMATCH` - `MODEL_MISMATCH` - `SHAPE_MISMATCH` +- `RECORD_MISMATCH` / `COLLECTION_MISMATCH` / `TENANT_MISMATCH` — caller-supplied expected value for the corresponding reserved `extra` key does not match. +- `PARSE_ERROR` — pin JSON exceeds size limits, contains unknown top-level fields, or fails type validation. ## 6. Storage conventions Adapter implementations SHOULD store pins under the metadata key `vectorpin`. Backends without free-form metadata fields are out of scope for this version of the protocol — provenance must travel with the data. -## 7. Key rotation +## 7. Key rotation and revocation -Verifiers MUST support multiple `kid` -> public key mappings simultaneously. Issuers rotate by: +Verifiers MUST support multiple `kid` -> public key mappings simultaneously, each with an optional validity window `(valid_from, valid_until)` of RFC 3339 timestamps. Issuers rotate by: 1. Generating a new keypair with a fresh `kid`. -2. Adding the new public key to all relevant verifier registries. +2. Adding the new public key to all relevant verifier registries, with a `valid_from` no earlier than the moment the new private key becomes operational. 3. Switching production signing to the new private key. 4. Optionally re-pinning the corpus over time. -5. Removing the old public key from registries once re-pinning is complete or the rotation policy expires. +5. Setting `valid_until` on the old key entry to the rotation cutover instant (do not remove the entry — historical pins must continue to verify against it). + +Old pins continue to verify against the old public key as long as their `ts` falls within the old key's `(valid_from, valid_until)` window. -Old pins continue to verify against the old public key during this window. +### Revocation distinct from rotation + +If a private key is **compromised** (as opposed to merely rotated for hygiene), the corresponding `kid` entry MUST be marked with `valid_until` set to the latest moment the key is believed to have been uncompromised. Pins with `ts` after that instant return `KEY_EXPIRED`; pins with `ts` before it continue to verify. This preserves the integrity of historical pins while immediately invalidating anything an attacker could produce post-compromise. + +Operators SHOULD pair this with a transparency-log entry (e.g., sigstore Rekor or a project-specific append-only log) for the revocation event itself, so that downstream verifiers can detect a malicious registry rollback. + +The protocol does not specify a revocation file format in v1; this is intentionally out of band so deployments can integrate with existing PKI / sigstore infrastructure. The minimum requirement on a v1.x verifier is to honor the `(valid_from, valid_until)` window however it is delivered. ## 8. Reserved `extra` keys @@ -152,12 +180,22 @@ A v1.1 candidate spec promotes `record_id`, `collection_id`, and `tenant_id` to ## 9. Security considerations -- **Replay**: Pins are not bound to a specific record id at the wire format level. An attacker who copies a pin from one record to another can pass verification only if the vector and source they paste alongside match the pin. Implementations that need stronger replay protection SHOULD use the reserved `vectorpin.collection_id` / `vectorpin.record_id` / `vectorpin.tenant_id` keys defined in §8. -- **Time**: The `ts` field is informational. Verifiers MAY reject pins outside an acceptable time window but the protocol does not require it. -- **Key custody**: An attacker with the private signing key can produce arbitrary pins. Treat the signing key as a high-value secret. -- **Source-time integrity**: VectorPin attests to the relationship between source and vector at pin time. It does not attest that the source itself was authentic at ingestion. +- **Replay**: Pins are not bound to a specific record id at the wire format level. An attacker who copies a pin from one record to another can pass verification only if the vector and source they paste alongside match the pin. Implementations that need stronger replay protection SHOULD use the reserved `vectorpin.collection_id` / `vectorpin.record_id` / `vectorpin.tenant_id` keys defined in §8, and verifiers MUST enforce them when the caller supplies an expected value (see §5 step 7). +- **Time**: The `ts` field is informational *for the pin* but load-bearing for revocation: verifiers MUST consult `(valid_from, valid_until)` on the `kid` registration (§7) and reject pins whose `ts` falls outside that window. +- **Key custody**: An attacker with the private signing key can produce arbitrary pins. Treat the signing key as a high-value secret. Reference implementations write private keys with mode `0600`; production deployments SHOULD use a KMS or hardware-backed signer rather than file-system keys. +- **Source-time integrity**: VectorPin attests to the relationship between source and vector at pin time. It does not attest that the source itself was authentic at ingestion. Pair VectorPin with source-side controls (signed ingestion logs, document provenance) where this matters. +- **DoS via malformed pins**: Without the §4.3 size limits, a single hostile pin can exhaust verifier resources. Implementations MUST enforce these limits before reaching the signature path. + +## 10. Key distribution + +The protocol assumes a verifier has access to a registry mapping `kid` to `(public_key, valid_from, valid_until)`. How that registry is populated is out of scope, but the following SHOULD apply to any production deployment: + +- **Fingerprint format**: Operators identifying a key out of band (Slack, email, ticket) SHOULD use `SHA-256(pubkey_bytes)` truncated to the first 16 hex digits, formatted as four colon-separated quads, e.g. `1f3a:7b22:9e0d:c4f1`. Full 32-byte public keys are themselves URL-safe-base64 short enough to share verbatim where possible. +- **Production registries SHOULD reference a transparency log entry** (e.g., sigstore Rekor) for each `kid` registration and revocation. The log entry binds the key material to a publicly observable, append-only history, allowing downstream verifiers to detect a malicious registry rollback. +- **Trust-on-first-use (TOFU) is NOT RECOMMENDED for new pins** unless the operator has explicitly opted in. A verifier that auto-registers any `kid` it encounters provides no integrity guarantee — it is a checksum, not a signature. +- **Per-tenant key separation**: Multi-tenant deployments SHOULD issue separate `kid`s per tenant rather than share a single producer key, so that compromise of one tenant's environment cannot forge pins for another tenant. -## 10. Versioning +## 11. Versioning This is protocol version 1. Future versions MAY: @@ -165,4 +203,4 @@ This is protocol version 1. Future versions MAY: - Add new dtype identifiers. - Add new signature/hash algorithms (with corresponding identifiers). -A change is breaking iff a v1 verifier would silently accept a v2 pin as valid when the v2 pin's additional semantics matter. Such changes MUST bump the major version. +A change is breaking iff a v1 verifier would silently accept a v2 pin as valid when the v2 pin's additional semantics matter. Such changes MUST bump the major version. Downgrade resistance is provided by including the protocol-version field in the signed canonical bytes (§4.2) and by §5 step 0 / §4.3 size limits, which together prevent an attacker from stripping new fields and presenting the remainder to an older verifier.