Skip to content

fix: port docutray-node PR #20 (unwrap get() + jsonSchema) and add is_public to DocumentTypes create/update #19

@rarce

Description

@rarce

Summary

Two related gaps in the DocumentTypes resource, surfaced while syncing the Python SDK with the latest changes in docutray/docutray-node:

  1. Port docutray-node#20 — the documentTypes.get() method does not unwrap the { data } envelope, and the DocumentType model exposes the schema field under the wrong name (schema_ instead of jsonSchema).
  2. Close a gap in the docutray-node#18 port — the Node DocumentTypeCreateParams / DocumentTypeUpdateParams interfaces include isPublic, but the Python create() and update() methods do not expose is_public as a kwarg.

Bug 1 — document_types.get() returns the raw {data} wrapper

Files: src/docutray/resources/document_types.py:127 (sync) and :400 (async)

# Current
return DocumentType.model_validate(response.json())

The backend returns { "data": {...DocumentType...} } from GET /api/document-types/{id} (same envelope as POST and PUT). With model_config = ConfigDict(extra="allow") on the model, validation does not raise — but every typed field (id, name, codeType, …) silently falls back to defaults because the actual fields are nested under data. Consumers receive a DocumentType whose fields all look empty.

Fix: mirror the unwrap pattern already used by create() and update() in this same file:

return DocumentType.model_validate(
    response.json().get("data", response.json())
)

The .get("data", response.json()) form keeps the call defensive in case the backend ever stops wrapping the response.

Bug 2 — DocumentType.schema_ is misnamed

File: src/docutray/types/document_type.py:46

schema_: dict[str, Any] | None = None
"""JSON schema for the document type (when retrieved by ID)."""

The backend returns this field as jsonSchema (verified in docutray/docutray-node#20 and consistent with the rest of the camelCase fields in the model: codeType, isDraft, createdAt). The current name schema_ does not exist on the wire, so the field is always None — the SDK can never surface the schema returned by GET /api/document-types/{id}.

Fix: rename to jsonSchema:

jsonSchema: dict[str, Any] | None = None
"""JSON Schema for the document type (returned by GET /api/document-types/{id})."""

This is a type-only breaking change. Pre-1.0 patch releases are allowed to carry breaking changes per the existing project versioning policy. In practice the field never arrived under either name, so no real consumer reads .schema_.

The docstring example in resources/document_types.py:124 (doc_type.schema_) must also be updated to doc_type.jsonSchema.

Bug 3 — Missing is_public parameter on create() and update()

Files:

  • src/docutray/resources/document_types.pyDocumentTypes.create (line 164), DocumentTypes.update (line 231), AsyncDocumentTypes.create (line 423), AsyncDocumentTypes.update (line 472)
  • src/docutray/_response.pyDocumentTypesWithRawResponse.create (line 833), .update (line 889), and the two async equivalents

The Node DocumentTypeCreateParams and DocumentTypeUpdateParams interfaces both expose isPublic?: boolean. The Python kwargs do not. The DocumentType response model already exposes isPublic: bool as a read field (types/document_type.py:31), so this is purely a request-side gap.

Fix: add is_public: bool | None = None kwarg to all six methods, with the standard guard:

if is_public is not None:
    body["isPublic"] = is_public

Acceptance criteria

  • client.document_types.get(id) returns a flat DocumentType with all fields populated from the API (sync + async).
  • DocumentType.jsonSchema replaces DocumentType.schema_. Docstring example updated.
  • client.document_types.create(...) and .update(...) accept is_public: bool | None = None (sync + async + raw response wrappers — six methods total).
  • Tests cover:
    • get() mocked with the wire format {"data": {...}} and asserts unwrapped result, including jsonSchema.
    • create() and update() forwarding is_public=True/False to the request body as isPublic.
  • uv run pytest, uv run mypy src, uv run ruff check src, uv run ruff format src all pass.
  • CHANGELOG.md entry under a new patch version describes the runtime fix (get() unwrap), the type-only breaking change (schema_jsonSchema), and the new is_public kwarg.
  • Version bumped in both pyproject.toml and src/docutray/_version.py per project policy.

Out of scope (follow-up)

The Node PR #20 description flags an audit of other endpoints that may also be missing the {data} unwrap. In the Python SDK, KnowledgeBases.get and KnowledgeBaseDocuments.get already unwrap correctly. The remaining candidates are status-polling endpoints, which most likely are NOT wrapped — but should be confirmed against the backend in a separate issue:

  • src/docutray/resources/steps.py:110, 131, 227, 243Steps.get_status
  • src/docutray/resources/convert.py:105, 172, 194, 284, 340, 356 — Convert result and status
  • src/docutray/resources/identify.py:101, 159, 180, 261, 315, 331 — Identify result and status

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:mediumMedium prioritysdkSDK features and improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions