Summary
The generated src/ucp_sdk/models/schemas/capability.py contains 8 nearly-identical Extends* classes (Extends, Extends2, Extends4, Extends6 are structurally identical strings; Extends1, Extends3, Extends5, Extends7 are structurally identical lists; their *Item siblings are also duplicated). This was flagged by the Gemini code review on #28:
The duplication harms readability and maintainability of the generated SDK surface.
Root cause
ucp/source/schemas/capability.json defines extends as an inline oneOf: [string, array<string>] inside $defs.base. preprocess_schemas.py::merge_all_of_to_node resolves the local $ref with copy.deepcopy and inlines the block into each of the 4 consumers (Base, PlatformSchema, BusinessSchema, ResponseSchema). distribute_properties_to_branches then propagates the oneOf into each variant.
By the time datamodel-codegen runs, the extends oneOf appears inline ~4× across the schema. datamodel-codegen mints a numbered class per anonymous inline subschema and never deduplicates structurally identical types unless told to — hence 4 string variants + 4 list variants (with companion *Item classes) = 11 generated symbols where there should be 1–2.
Proposed fixes
Quick win (recommended first)
Add --reuse-model to the datamodel-codegen invocation in generate_models.sh (~line 73). This is the official flag designed exactly for collapsing structurally identical types and applies repo-wide, so it will also dedup any other accidental duplicates created by the allOf flattening pass.
Architectural follow-up
Add a hoist_duplicate_subschemas() pass to preprocess_schemas.py, called after merge_all_of_to_node, that:
- Walks the merged schema and hashes anonymous subschemas.
- Lifts any subschema appearing >=2 times into a top-level
$defs entry.
- Replaces consumers with
$ref pointers.
This stops emitting redundant inline schemas at the source rather than relying on codegen heuristics.
Verification
./generate_models.sh
grep -c "class Extends" src/ucp_sdk/models/schemas/capability.py
Should drop from 11 to 1-2.
References
Summary
The generated
src/ucp_sdk/models/schemas/capability.pycontains 8 nearly-identicalExtends*classes (Extends,Extends2,Extends4,Extends6are structurally identical strings;Extends1,Extends3,Extends5,Extends7are structurally identical lists; their*Itemsiblings are also duplicated). This was flagged by the Gemini code review on #28:The duplication harms readability and maintainability of the generated SDK surface.
Root cause
ucp/source/schemas/capability.jsondefinesextendsas an inlineoneOf: [string, array<string>]inside$defs.base.preprocess_schemas.py::merge_all_of_to_noderesolves the local$refwithcopy.deepcopyand inlines the block into each of the 4 consumers (Base,PlatformSchema,BusinessSchema,ResponseSchema).distribute_properties_to_branchesthen propagates theoneOfinto each variant.By the time
datamodel-codegenruns, theextendsoneOfappears inline ~4× across the schema. datamodel-codegen mints a numbered class per anonymous inline subschema and never deduplicates structurally identical types unless told to — hence 4 string variants + 4 list variants (with companion*Itemclasses) = 11 generated symbols where there should be 1–2.Proposed fixes
Quick win (recommended first)
Add
--reuse-modelto thedatamodel-codegeninvocation ingenerate_models.sh(~line 73). This is the official flag designed exactly for collapsing structurally identical types and applies repo-wide, so it will also dedup any other accidental duplicates created by theallOfflattening pass.Architectural follow-up
Add a
hoist_duplicate_subschemas()pass topreprocess_schemas.py, called aftermerge_all_of_to_node, that:$defsentry.$refpointers.This stops emitting redundant inline schemas at the source rather than relying on codegen heuristics.
Verification
./generate_models.sh grep -c "class Extends" src/ucp_sdk/models/schemas/capability.pyShould drop from 11 to 1-2.
References