Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple instances of pg_typeof inside a GROUP BY clause result in a bogus "column does not exist" error #8302

Closed
antiguru opened this issue Sep 16, 2021 · 6 comments · Fixed by #10202
Assignees
Labels
A-sql Area: SQL planning C-bug Category: something is broken D-good second issue Difficulty: Good for a newcomer who's warmed up

Comments

@antiguru
Copy link
Member

What version of Materialize are you using?

materialized v0.9.5-dev 

How did you install Materialize?

Built from source

What was the issue?

materialize=> SELECT pg_typeof(operator), pg_typeof(worker), pg_typeof(count) FROM mz_arrangement_sharing;
Time: 5.231 ms
materialize=> SELECT pg_typeof(operator), pg_typeof(worker), pg_typeof(count) FROM mz_arrangement_sharing group by 1, 2, 3;
ERROR:  column "worker" does not exist
Time: 1.009 ms
@antiguru antiguru added the C-bug Category: something is broken label Sep 16, 2021
@philip-stoev philip-stoev changed the title Group-by with column numbers fails to locate columns Multiple instances of pg_typeof inside a GROUP BY clause result in a bogus "column does not exist" error Sep 16, 2021
@philip-stoev
Copy link
Contributor

This issue is not specific to using GROUP BY with column numbers, it also happens with actual expressions in the GROUP BY column.

However, it seems to require the following to reproduce:

  • the use of the pg_typeof function
  • the GROUP BY should have two or more instances of that function.
materialize=> create table t1 (f1 integer, f2 integer);
CREATE TABLE

This fails:

materialize=> select pg_typeof(f1), pg_typeof(f2) from t1 group by pg_typeof(f1), pg_typeof(f2);
ERROR:  column "f2" does not exist

But all of those work:

  1. One instance of pg_typeof works:
materialize=> select pg_typeof(f1), abs(f2) from t1 group by pg_typeof(f1), abs(f2);
 pg_typeof | abs 
-----------+-----
(0 rows)

materialize=> select abs(f1), pg_typeof(f2) from t1 group by abs(f1), pg_typeof(f2);
 abs | pg_typeof 
-----+-----------
(0 rows)
  1. Use of other functions apart from pg_typeof also works:
materialize=> select abs(f1), abs(f2) from t1 group by abs(f1), abs(f2);
 abs | abs 
-----+-----
(0 rows)

@philip-stoev philip-stoev added go= A-sql Area: SQL planning D-good second issue Difficulty: Good for a newcomer who's warmed up and removed go= labels Sep 16, 2021
@umanwizard
Copy link
Contributor

@pH14 is taking a look

@pH14
Copy link
Contributor

pH14 commented Jan 21, 2022

What I'm seeing so far --

When we're computing the GROUP BY plan, we intentionally drop repeated expressions:

for group_expr in group_by {
let (group_expr, expr) = plan_group_by_expr(ecx, group_expr, &projection)?;
let new_column = group_key.len();
// Repeated expressions in GROUP BY confuse name resolution later,
// and dropping them doesn't change the result.
if group_exprs
.iter()
.find(|existing_expr| **existing_expr == expr)
.is_none()
{

This causes us to omit generating and storing the ScopeItem for all but the first pg_typeof expression, later causing an error when we try to resolve the expression for each projection. The first lookup succeeds, but any subsequent one will fail and error out, since they have no corresponding expression:

ExpandedSelectItem::Expr(expr) => plan_expr(ecx, &expr)
.map_err(check_ungrouped_col)?
.type_as_any(ecx)?,

The difference in behavior between pg_typeof and other functions like abs comes from the expression types they generate. pg_typeof becomes a Literal: Literal(Row{[String("bigint")]}, ColumnType { scalar_type: String, nullable: false }), whereas an abs becomes a function call CallUnary { func: AbsInt32(AbsInt32), expr: Column(ColumnRef { level: 0, column: 0 }) } that contains a column reference. The column references are different for each column, so they aren't deduped, and therefore the projection <-> expr mapping works as intended.

I tried a few quick naive fixes here, like removing the dedupping entirely, but those ran amuck of several assumptions about the output from the current GROUP BY handling. Will revisit this one next week

@umanwizard
Copy link
Contributor

So, the issue is that pg_typeof(...) gets planned as a constant here , so repeated pg_typeof are considered the same expression, even though they don't have the same name...

Does this happen when grouping by any constant expressions multiple times?

@umanwizard
Copy link
Contributor

(I'm not actually sure how to construct another example of expressions in SQL that are planned as constants and have the same constant value but different names.)

@umanwizard
Copy link
Contributor

umanwizard commented Jan 21, 2022

Okay list_ndims also has this property (requires experimental mode):

materialize=> select list_ndims(column2) from (values (list[1], list[2])) group by list_ndims(column1), list_ndims(column2);
ERROR:  column "column2" must appear in the GROUP BY clause or be used in an aggregate function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: SQL planning C-bug Category: something is broken D-good second issue Difficulty: Good for a newcomer who's warmed up
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants