feat(schema): provider data-policy metadata fields#5
Merged
OriginalGary merged 2 commits intomainfrom May 5, 2026
Merged
Conversation
Adds trains_on_data, data_residency, retention_days, local, and e2ee to ProviderSchema (src/shared/validation/providerSchema.ts). All five fields are required and Zod-validated at module load time with three cross-field invariants enforced with named-provider error messages: 1. local=true → trains_on_data must be false 2. e2ee=true → trains_on_data must be false 3. data_residency="multi" → retention_days must not be null Provider audit: 13 providers confidently classified (anthropic, openai, and all 11 self-hosted local providers including searxng-search). The remaining 139 providers are assigned conservative defaults (trains_on_data=true, data_residency="unknown", retention_days=null) with per-provider TODO(sam) comments pointing to their policy URL. Anthropic ZDR is explicitly marked e2ee=false (contractual ≠ architectural). 22 new tests in provider-metadata-schema.test.ts cover: field acceptance, data_residency format validation, all three invariant violations with named-provider error messages, local provider bulk assertions, the Anthropic ZDR e2ee=false case, full registry load regression guard, and validateProviders throw-on-violation paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI Coverage Report
Coverage artifact was not available for this run. PR Test PolicyThis PR changes production code in |
… bedrock, vertex, tier-dependent)
Applies verified data-policy values to 10 providers previously at
conservative TODO defaults:
glm, glmt — Z.ai international, trains_on_data=false, data_residency=SG
glm-cn — Z.ai China endpoint, trains_on_data=false, data_residency=CN
azure-openai — trains_on_data=false, data_residency=multi, retention_days=30
azure-ai — same Azure AOAI data-privacy policy as azure-openai
bedrock — trains_on_data=false, data_residency=multi, retention_days=0
(zero-persistence architecture, not just contractual)
vertex — trains_on_data=false, data_residency=multi, retention_days=30
vertex-partner — same Vertex AI DPA covers partner models in Model Garden
Tier-dependent providers (gemini, codex, github) keep trains_on_data=true
but have their inline comments upgraded from TODO to Verified, with policy
source URLs and an explanation that the conservative default applies because
Graze cannot determine subscription tier from an API key or OAuth token.
github gets retention_days=28 and data_residency=US from its published policy.
Adds §5a Tier-dependent providers to GRAZE.md documenting the pattern,
Graze's stance, and the deferred override mechanism.
10 new test assertions: gemini.trains_on_data=true regression guard,
bedrock.retention_days=0, azure-openai invariant-3 path, glm/glm-cn/glmt
residency, vertex/vertex-partner, github tier-dependent fields, codex.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five data-policy fields added to
ProviderSchema+ audited across all 152 providers. Two commits:Schema touchpoints
Single schema layer extended:
src/shared/validation/providerSchema.ts→src/shared/constants/providers.ts. The mainRegistryEntryinproviderRegistry.ts(operational call config) is unchanged — sensitivity routing will join on provider ID.Invariants
local=true→trains_on_data=falseProvider "id": invariant violated — local=true requires trains_on_data=falsee2ee=true→trains_on_data=falseProvider "id": invariant violated — e2ee=true requires trains_on_data=falsedata_residency="multi"→retention_daysnot nullProvider "id": invariant violated — data_residency="multi" requires retention_days to be specifiedAudit results
Confidently classified: 23 providers (up from 13 in commit 1)
glmt classification:
glmtis NOT a separate aggregator — it is a preset variant (thinking mode + higher token budget + longer timeout) on the same Z.ai API endpoint (https://api.z.ai/api/anthropic/v1/messages) asglm. Registry entry confirmsbaseUrlis identical. Same Z.ai Additional Terms §3.b apply → same SG residency and trains_on_data=false asglm.Tier-dependent providers (3):
gemini,codex,github— inline comments upgraded from TODO to Verified, but conservative default (trains_on_data=true) is preserved. Override mechanism deferred to workstream 4.Anthropic ZDR: confirmed e2ee=false. Contractual ≠ architectural.
gemini.trains_on_data confirmed true (assertion in test suite prevents regression).
TODO(sam): ~129 providers remain.
Sam-verify priority — remaining
openrouter,laozhang, etc.) — need contractual guarantee from aggregator, not upstreamcursor,gitlab-duo,kimi-coding,claudeconsumer OAuth, etc.)watsonx,oci,sap,databricks, etc.)Tier eligibility summary
Surprises for sensitivity-routing prompt
AGGREGATOR_PROVIDER_IDSSet (17 entries) andSELF_HOSTED_CHAT_PROVIDER_IDSSet (8 entries) already exist in providers.ts — sensitivity routing can use these for fast-path decisionssdwebuiandcomfyuiarelocal=true(tier-3) but are NOT inSELF_HOSTED_CHAT_PROVIDER_IDS(image-only)Tests
32 assertions in
tests/unit/provider-metadata-schema.test.ts:data_residencyformat validationgemini.trains_on_data=true,bedrock.retention_days=0,azure-openaiinvariant-3 path🤖 Generated with Claude Code