Skip to content

fix(schema): cast to VARCHAR before approx_distinct, add --verbose to…#7

Merged
SpollaL merged 1 commit into
masterfrom
fix/schema-approx-distinct
May 5, 2026
Merged

fix(schema): cast to VARCHAR before approx_distinct, add --verbose to…#7
SpollaL merged 1 commit into
masterfrom
fix/schema-approx-distinct

Conversation

@SpollaL
Copy link
Copy Markdown
Owner

@SpollaL SpollaL commented May 5, 2026

… schema

DataFusion 43 reads parquet with schema_force_view_types=true, coercing LargeUtf8 to Utf8View. The HLL implementation behind approx_distinct does not support Utf8View or Date32 at execution time — the planner accepts them but .collect() panics. Casting to VARCHAR before the aggregate handles every physical type: string-like columns are free casts, dates and timestamps produce their canonical string form so distinct counts remain correct.

Also adds --verbose to the schema subcommand so the full anyhow error chain is visible on failure, consistent with the validate subcommand. Without this flag the root cause was completely hidden behind the top-level context string ("Failed to collect results").

… schema

DataFusion 43 reads parquet with schema_force_view_types=true, coercing
LargeUtf8 to Utf8View. The HLL implementation behind approx_distinct does
not support Utf8View or Date32 at execution time — the planner accepts
them but .collect() panics. Casting to VARCHAR before the aggregate
handles every physical type: string-like columns are free casts, dates
and timestamps produce their canonical string form so distinct counts
remain correct.

Also adds --verbose to the schema subcommand so the full anyhow error
chain is visible on failure, consistent with the validate subcommand.
Without this flag the root cause was completely hidden behind the
top-level context string ("Failed to collect results").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@SpollaL SpollaL merged commit 81ef719 into master May 5, 2026
1 check passed
@SpollaL SpollaL deleted the fix/schema-approx-distinct branch May 5, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant