Skip to content

Codegen emits 8 duplicate Extends* classes in capability.py #36

@pjordan

Description

@pjordan

Summary

The generated src/ucp_sdk/models/schemas/capability.py contains 8 nearly-identical Extends* classes (Extends, Extends2, Extends4, Extends6 are structurally identical strings; Extends1, Extends3, Extends5, Extends7 are structurally identical lists; their *Item siblings are also duplicated). This was flagged by the Gemini code review on #28:

The duplication harms readability and maintainability of the generated SDK surface.

Root cause

ucp/source/schemas/capability.json defines extends as an inline oneOf: [string, array<string>] inside $defs.base. preprocess_schemas.py::merge_all_of_to_node resolves the local $ref with copy.deepcopy and inlines the block into each of the 4 consumers (Base, PlatformSchema, BusinessSchema, ResponseSchema). distribute_properties_to_branches then propagates the oneOf into each variant.

By the time datamodel-codegen runs, the extends oneOf appears inline ~4× across the schema. datamodel-codegen mints a numbered class per anonymous inline subschema and never deduplicates structurally identical types unless told to — hence 4 string variants + 4 list variants (with companion *Item classes) = 11 generated symbols where there should be 1–2.

Proposed fixes

Quick win (recommended first)

Add --reuse-model to the datamodel-codegen invocation in generate_models.sh (~line 73). This is the official flag designed exactly for collapsing structurally identical types and applies repo-wide, so it will also dedup any other accidental duplicates created by the allOf flattening pass.

Architectural follow-up

Add a hoist_duplicate_subschemas() pass to preprocess_schemas.py, called after merge_all_of_to_node, that:

  1. Walks the merged schema and hashes anonymous subschemas.
  2. Lifts any subschema appearing >=2 times into a top-level $defs entry.
  3. Replaces consumers with $ref pointers.

This stops emitting redundant inline schemas at the source rather than relying on codegen heuristics.

Verification

./generate_models.sh
grep -c "class Extends" src/ucp_sdk/models/schemas/capability.py

Should drop from 11 to 1-2.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions