Skip to content

fix(duckdb): use INTERSECT to find data-contract column overlap (#1065)#2

Open
barry0451 wants to merge 4 commits intomainfrom
fix/1065-csv-field-check
Open

fix(duckdb): use INTERSECT to find data-contract column overlap (#1065)#2
barry0451 wants to merge 4 commits intomainfrom
fix/1065-csv-field-check

Conversation

@barry0451
Copy link
Copy Markdown
Collaborator

Fixes datacontract#1065field_is_present check always passes for CSV/Parquet files because the current implementation compares row counts instead of checking actual column overlap. Uses SQL INTERSECT to correctly detect which columns in the data contract are actually present in the source data.

dependabot bot and others added 4 commits April 6, 2026 20:24
Bumps [uvicorn](https://github.com/Kludex/uvicorn) from 0.42.0 to 0.44.0.
- [Release notes](https://github.com/Kludex/uvicorn/releases)
- [Changelog](https://github.com/Kludex/uvicorn/blob/main/docs/release-notes.md)
- [Commits](Kludex/uvicorn@0.42.0...0.44.0)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-version: 0.44.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Detect and raise error for multi-branch Avro union types

The Avro importer now raises a DataContractException when a union
field contains multiple non-null types (e.g., ["null", "string", "int"]),
since ODCS does not support union types. Only the optional-field idiom
(["null", T]) remains supported.
…ds (datacontract#1065)

Before: INTERSECT query only selected columns present in BOTH contract and data,
so missing contract columns were silently ignored — field_is_present always passed.

After: SELECT explicitly lists contract schema columns by name. Missing data
columns become NULL, allowing field_is_present to properly detect them.
When creating DuckDB tables from CSV/Parquet, use INTERSECT between
data file columns and contract schema columns to avoid BinderException
when optional columns are missing from the data. Only SELECT the
intersecting columns — missing optional columns become NULL in the
table and the field_is_present check correctly passes for optional
fields while failing for required ones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check field_is_present always passes for CSV/Parquet files

2 participants