Summary
Two related gaps in the DocumentTypes resource, surfaced while syncing the Python SDK with the latest changes in docutray/docutray-node:
- Port docutray-node#20 — the
documentTypes.get() method does not unwrap the { data } envelope, and the DocumentType model exposes the schema field under the wrong name (schema_ instead of jsonSchema).
- Close a gap in the docutray-node#18 port — the Node
DocumentTypeCreateParams / DocumentTypeUpdateParams interfaces include isPublic, but the Python create() and update() methods do not expose is_public as a kwarg.
Bug 1 — document_types.get() returns the raw {data} wrapper
Files: src/docutray/resources/document_types.py:127 (sync) and :400 (async)
# Current
return DocumentType.model_validate(response.json())
The backend returns { "data": {...DocumentType...} } from GET /api/document-types/{id} (same envelope as POST and PUT). With model_config = ConfigDict(extra="allow") on the model, validation does not raise — but every typed field (id, name, codeType, …) silently falls back to defaults because the actual fields are nested under data. Consumers receive a DocumentType whose fields all look empty.
Fix: mirror the unwrap pattern already used by create() and update() in this same file:
return DocumentType.model_validate(
response.json().get("data", response.json())
)
The .get("data", response.json()) form keeps the call defensive in case the backend ever stops wrapping the response.
Bug 2 — DocumentType.schema_ is misnamed
File: src/docutray/types/document_type.py:46
schema_: dict[str, Any] | None = None
"""JSON schema for the document type (when retrieved by ID)."""
The backend returns this field as jsonSchema (verified in docutray/docutray-node#20 and consistent with the rest of the camelCase fields in the model: codeType, isDraft, createdAt). The current name schema_ does not exist on the wire, so the field is always None — the SDK can never surface the schema returned by GET /api/document-types/{id}.
Fix: rename to jsonSchema:
jsonSchema: dict[str, Any] | None = None
"""JSON Schema for the document type (returned by GET /api/document-types/{id})."""
This is a type-only breaking change. Pre-1.0 patch releases are allowed to carry breaking changes per the existing project versioning policy. In practice the field never arrived under either name, so no real consumer reads .schema_.
The docstring example in resources/document_types.py:124 (doc_type.schema_) must also be updated to doc_type.jsonSchema.
Bug 3 — Missing is_public parameter on create() and update()
Files:
src/docutray/resources/document_types.py — DocumentTypes.create (line 164), DocumentTypes.update (line 231), AsyncDocumentTypes.create (line 423), AsyncDocumentTypes.update (line 472)
src/docutray/_response.py — DocumentTypesWithRawResponse.create (line 833), .update (line 889), and the two async equivalents
The Node DocumentTypeCreateParams and DocumentTypeUpdateParams interfaces both expose isPublic?: boolean. The Python kwargs do not. The DocumentType response model already exposes isPublic: bool as a read field (types/document_type.py:31), so this is purely a request-side gap.
Fix: add is_public: bool | None = None kwarg to all six methods, with the standard guard:
if is_public is not None:
body["isPublic"] = is_public
Acceptance criteria
Out of scope (follow-up)
The Node PR #20 description flags an audit of other endpoints that may also be missing the {data} unwrap. In the Python SDK, KnowledgeBases.get and KnowledgeBaseDocuments.get already unwrap correctly. The remaining candidates are status-polling endpoints, which most likely are NOT wrapped — but should be confirmed against the backend in a separate issue:
src/docutray/resources/steps.py:110, 131, 227, 243 — Steps.get_status
src/docutray/resources/convert.py:105, 172, 194, 284, 340, 356 — Convert result and status
src/docutray/resources/identify.py:101, 159, 180, 261, 315, 331 — Identify result and status
References
Summary
Two related gaps in the
DocumentTypesresource, surfaced while syncing the Python SDK with the latest changes in docutray/docutray-node:documentTypes.get()method does not unwrap the{ data }envelope, and theDocumentTypemodel exposes the schema field under the wrong name (schema_instead ofjsonSchema).DocumentTypeCreateParams/DocumentTypeUpdateParamsinterfaces includeisPublic, but the Pythoncreate()andupdate()methods do not exposeis_publicas a kwarg.Bug 1 —
document_types.get()returns the raw{data}wrapperFiles:
src/docutray/resources/document_types.py:127(sync) and:400(async)The backend returns
{ "data": {...DocumentType...} }fromGET /api/document-types/{id}(same envelope asPOSTandPUT). Withmodel_config = ConfigDict(extra="allow")on the model, validation does not raise — but every typed field (id,name,codeType, …) silently falls back to defaults because the actual fields are nested underdata. Consumers receive aDocumentTypewhose fields all look empty.Fix: mirror the unwrap pattern already used by
create()andupdate()in this same file:The
.get("data", response.json())form keeps the call defensive in case the backend ever stops wrapping the response.Bug 2 —
DocumentType.schema_is misnamedFile:
src/docutray/types/document_type.py:46The backend returns this field as
jsonSchema(verified in docutray/docutray-node#20 and consistent with the rest of the camelCase fields in the model:codeType,isDraft,createdAt). The current nameschema_does not exist on the wire, so the field is alwaysNone— the SDK can never surface the schema returned byGET /api/document-types/{id}.Fix: rename to
jsonSchema:This is a type-only breaking change. Pre-1.0 patch releases are allowed to carry breaking changes per the existing project versioning policy. In practice the field never arrived under either name, so no real consumer reads
.schema_.The docstring example in
resources/document_types.py:124(doc_type.schema_) must also be updated todoc_type.jsonSchema.Bug 3 — Missing
is_publicparameter oncreate()andupdate()Files:
src/docutray/resources/document_types.py—DocumentTypes.create(line 164),DocumentTypes.update(line 231),AsyncDocumentTypes.create(line 423),AsyncDocumentTypes.update(line 472)src/docutray/_response.py—DocumentTypesWithRawResponse.create(line 833),.update(line 889), and the two async equivalentsThe Node
DocumentTypeCreateParamsandDocumentTypeUpdateParamsinterfaces both exposeisPublic?: boolean. The Python kwargs do not. TheDocumentTyperesponse model already exposesisPublic: boolas a read field (types/document_type.py:31), so this is purely a request-side gap.Fix: add
is_public: bool | None = Nonekwarg to all six methods, with the standard guard:Acceptance criteria
client.document_types.get(id)returns a flatDocumentTypewith all fields populated from the API (sync + async).DocumentType.jsonSchemareplacesDocumentType.schema_. Docstring example updated.client.document_types.create(...)and.update(...)acceptis_public: bool | None = None(sync + async + raw response wrappers — six methods total).get()mocked with the wire format{"data": {...}}and asserts unwrapped result, includingjsonSchema.create()andupdate()forwardingis_public=True/Falseto the request body asisPublic.uv run pytest,uv run mypy src,uv run ruff check src,uv run ruff format srcall pass.CHANGELOG.mdentry under a new patch version describes the runtime fix (get()unwrap), the type-only breaking change (schema_→jsonSchema), and the newis_publickwarg.pyproject.tomlandsrc/docutray/_version.pyper project policy.Out of scope (follow-up)
The Node PR #20 description flags an audit of other endpoints that may also be missing the
{data}unwrap. In the Python SDK,KnowledgeBases.getandKnowledgeBaseDocuments.getalready unwrap correctly. The remaining candidates are status-polling endpoints, which most likely are NOT wrapped — but should be confirmed against the backend in a separate issue:src/docutray/resources/steps.py:110, 131, 227, 243—Steps.get_statussrc/docutray/resources/convert.py:105, 172, 194, 284, 340, 356— Convert result and statussrc/docutray/resources/identify.py:101, 159, 180, 261, 315, 331— Identify result and statusReferences
isPublicwas added on the Node side)