fix(describegpt): keep jsonschema examples valid against property type#3885
Merged
Merged
Conversation
`describegpt --format jsonschema` emitted an `examples` array whose members did not validate against their own property's `type`. For numeric columns, the `frequency` aggregation-bucket rows (the "Other" long-tail bucket and the "(NULL)" bucket, rendered as "Other…" / "(NULL)…") landed in `examples`; `coerce_value` cannot parse those as a number and falls back to a JSON string, so a string leaked into an `integer`/`number`-typed property's `examples`. The schema itself still passed the Draft 2020-12 meta-schema, but validators that check `examples` against the property subschema (as recommended by the spec) rejected it. Add `value_matches_json_type` and filter `examples` through it in `build_property_schema`, so a value is emitted only when it matches the property's declared scalar `type`. Numeric/boolean properties drop the bucket sentinels; string properties keep all their (type-valid) examples. `enum`/`const` were already safe — `enumeration` is populated only for columns with no bucket rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 4 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
describegpt --format jsonschemaproduced a schema that failed JSON Schema Draft 2020-12 validation. The schema document itself passed the meta-schema, but itsexamplesarrays contained values that did not validate against their own property'stype:Root cause
For numeric columns, the
frequencyaggregation-bucket rows — the "Other" long-tail bucket and the "(NULL)" bucket, rendered asOther…/(NULL)…— were landing inexamples.coerce_valuecannot parse those as a number, so it falls back to a JSON string, and that string leaked into aninteger/number-typed property'sexamplesarray.The spec recommends that each
examplesmember validate against the schema, and validators that enforce this rejected the output.Fix
In
src/cmd/describegpt/formatters.rs:value_matches_json_type()— checks a coerced value against the property's declared JSON Schema scalar type.build_property_schemafiltersexamplesthrough it, so a value is emitted only when it matches the property'stype. Numeric/boolean properties drop theOther…/(NULL)…sentinels; string properties keep all their (type-valid) examples. The existing "skipexamplesif empty" guard handles the case where filtering removes everything.enum/constwere already safe —generate_code_based_dictionaryonly populatesenumerationfor columns with no bucket rows.Testing
jsonschema_drops_non_numeric_examples_from_numeric_property,jsonschema_keeps_examples_for_string_property— pass.describegptintegration tests pass.cargo clippyclean for the changed file; binary builds clean.🤖 Generated with Claude Code