Skip to content

fix(describegpt): keep jsonschema examples valid against property type#3885

Merged
jqnatividad merged 1 commit into
masterfrom
fix/describegpt-jsonschema-invalid-examples
May 22, 2026
Merged

fix(describegpt): keep jsonschema examples valid against property type#3885
jqnatividad merged 1 commit into
masterfrom
fix/describegpt-jsonschema-invalid-examples

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Problem

describegpt --format jsonschema produced a schema that failed JSON Schema Draft 2020-12 validation. The schema document itself passed the meta-schema, but its examples arrays contained values that did not validate against their own property's type:

[X Coordinate (State Plane)] examples value 'Other…'  -> not of type 'integer', 'null'
[Y Coordinate (State Plane)] examples value '(NULL)…' -> not of type 'integer', 'null'
[Latitude]                   examples value 'Other…'  -> not of type 'number', 'null'
[Longitude]                  examples value '(NULL)…' -> not of type 'number', 'null'
... (8 total on a NYC 311 sample)

Root cause

For numeric columns, the frequency aggregation-bucket rows — the "Other" long-tail bucket and the "(NULL)" bucket, rendered as Other… / (NULL)… — were landing in examples. coerce_value cannot parse those as a number, so it falls back to a JSON string, and that string leaked into an integer/number-typed property's examples array.

The spec recommends that each examples member validate against the schema, and validators that enforce this rejected the output.

Fix

In src/cmd/describegpt/formatters.rs:

  • Add value_matches_json_type() — checks a coerced value against the property's declared JSON Schema scalar type.
  • build_property_schema filters examples through it, so a value is emitted only when it matches the property's type. Numeric/boolean properties drop the Other…/(NULL)… sentinels; string properties keep all their (type-valid) examples. The existing "skip examples if empty" guard handles the case where filtering removes everything.

enum/const were already safe — generate_code_based_dictionary only populates enumeration for columns with no bucket rows.

Testing

  • New unit tests: jsonschema_drops_non_numeric_examples_from_numeric_property, jsonschema_keeps_examples_for_string_property — pass.
  • All 70 describegpt integration tests pass.
  • cargo clippy clean for the changed file; binary builds clean.

🤖 Generated with Claude Code

`describegpt --format jsonschema` emitted an `examples` array whose
members did not validate against their own property's `type`. For
numeric columns, the `frequency` aggregation-bucket rows (the "Other"
long-tail bucket and the "(NULL)" bucket, rendered as "Other…" /
"(NULL)…") landed in `examples`; `coerce_value` cannot parse those as a
number and falls back to a JSON string, so a string leaked into an
`integer`/`number`-typed property's `examples`.

The schema itself still passed the Draft 2020-12 meta-schema, but
validators that check `examples` against the property subschema (as
recommended by the spec) rejected it.

Add `value_matches_json_type` and filter `examples` through it in
`build_property_schema`, so a value is emitted only when it matches the
property's declared scalar `type`. Numeric/boolean properties drop the
bucket sentinels; string properties keep all their (type-valid)
examples. `enum`/`const` were already safe — `enumeration` is populated
only for columns with no bucket rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 4 complexity

Metric Results
Complexity 4

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@jqnatividad jqnatividad merged commit 7cd58e7 into master May 22, 2026
18 checks passed
@jqnatividad jqnatividad deleted the fix/describegpt-jsonschema-invalid-examples branch May 22, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant