Skip to content

Conversation

@aaraujo
Copy link

@aaraujo aaraujo commented Sep 17, 2025

When aggregation operations produce unqualified column names in their output schema, subsequent operations (like binary expressions) may still reference the original qualified names. This fix adds fallback resolution that attempts to match qualified column references to unqualified column names when the exact qualified match is not found.

This resolves errors like:
'No field named table.column. Valid fields are column'

that occur in expressions like avg(table.column) / 1024 where the aggregation produces an unqualified 'value' field but the division still references 'table.column'.

Includes test case demonstrating the issue and fix.

Which issue does this PR close?

This PR addresses a schema resolution issue discovered during integration testing. No existing issue was filed.

Rationale for this change

Currently, when an aggregation function produces an unqualified output schema (e.g., just "value" without a table qualifier), subsequent binary operations that reference the original qualified column name fail with a schema resolution error. This is a common pattern in SQL queries where aggregations are combined with arithmetic operations.

For example:

SELECT avg(metrics.value) / 1024 FROM metrics

The aggregation produces an unqualified "value" field, but the division operation still carries the qualified reference "metrics.value", causing the query to fail.

What changes are included in this PR?

  • Modified Expr::Column case in get_type() and nullable() methods in datafusion/expr/src/expr_schema.rs to add fallback resolution
  • When a qualified column reference is not found, attempts to resolve using just the column name without the qualifier
  • Added comprehensive test case test_qualified_column_after_aggregation that demonstrates the issue and validates the fix

Are these changes tested?

Yes, includes a new test case that:

  • Creates a schema simulating aggregation output (unqualified fields)
  • Tests resolution of qualified column references against this schema
  • Validates both direct column access and binary expressions
  • Verifies both data type and nullability resolution

Are there any user-facing changes?

This is a bug fix that makes previously failing queries work correctly. No breaking changes to existing functionality.

When aggregation operations produce unqualified column names in their output
schema, subsequent operations (like binary expressions) may still reference
the original qualified names. This fix adds fallback resolution that attempts
to match qualified column references to unqualified column names when the
exact qualified match is not found.

This resolves errors like:
  'No field named table.column. Valid fields are column'

that occur in expressions like avg(table.column) / 1024 where the
aggregation produces an unqualified 'value' field but the division
still references 'table.column'.

Includes test case demonstrating the issue and fix.
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Sep 17, 2025
@aaraujo aaraujo closed this Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant