Skip to content

Conversation

@JasonLi-cn
Copy link
Contributor

Which issue does this PR close?

Closes #12183

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Aug 27, 2024
@jayzhan211
Copy link
Contributor

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

@JasonLi-cn
Copy link
Contributor Author

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

Removing the Alias will cause a schema change. 🤔

@jayzhan211
Copy link
Contributor

jayzhan211 commented Aug 27, 2024

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

Removing the Alias will cause a schema change. 🤔

That is why I think the root cause might be the incorrect schema we have.

SELECT "test.a" FROM (SELECT a AS "test.a" FROM test)

The expected schema should be something like.

DFSchema { inner: Schema { fields: [Field { name: "test.a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }.

a is aliased to "test.a". The column should be test."test.a"

Upd:

query I
SELECT a FROM (SELECT a FROM test)
----
1

This is the schema for the query without alias

schema: DFSchema { inner: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }
query I
SELECT b FROM (SELECT a as b FROM test)
----
1

This is the schema with alias

schema: DFSchema { inner: Schema { fields: [Field { name: "b", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }

field_qualifiers is empty for alias case. Does that mean we didn't preserve field_qualifiers for alias. 🤔

field_qualifiers: [Some(Bare { table: "test" })]

@JasonLi-cn
Copy link
Contributor Author

The plan looks like aliasing to the same name after optimize_projections

    Projection: test.a AS test.a
      TableScan: test projection=[a]

I wonder could we even remove the alias here, since the name is the same

Removing the Alias will cause a schema change. 🤔

That is why I think the root cause might be the incorrect schema we have.

SELECT "test.a" FROM (SELECT a AS "test.a" FROM test)

The expected schema should be something like.

DFSchema { inner: Schema { fields: [Field { name: "test.a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }.

a is aliased to "test.a". The column should be test."test.a"

Upd:

query I
SELECT a FROM (SELECT a FROM test)
----
1

This is the schema for the query without alias

schema: DFSchema { inner: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }
query I
SELECT b FROM (SELECT a as b FROM test)
----
1

This is the schema with alias

schema: DFSchema { inner: Schema { fields: [Field { name: "b", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }

field_qualifiers is empty for alias case. Does that mean we didn't preserve field_qualifiers for alias. 🤔

field_qualifiers: [Some(Bare { table: "test" })]

Yes. Possibly because the b field itself is not part of the test table? 🤔
If we make the following changes to the SQL, the b field has field_qualifiers: [Some(Bare { table: "t0" })]

query I
SELECT b FROM (SELECT a as b FROM test) t0
----
1
schema: DFSchema { inner: Schema { fields: [Field { name: "b", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "t0" })], functional_dependencies: FunctionalDependencies { deps: [] } }

@findepi
Copy link
Member

findepi commented Aug 27, 2024

I wonder could we even remove the alias here, since the name is the same

due to my subjective recency bias, #1468 seems related.
would it be easier if output schema (column names) were explicit?

@jayzhan211
Copy link
Contributor

I wonder could we even remove the alias here, since the name is the same

due to my subjective recency bias, #1468 seems related. would it be easier if output schema (column names) were explicit?

Not pretty sure, I think they are unrelated 🤔

I think the issue here is that we lost the qualifier for alias in subquery case

@jonahgao
Copy link
Member

jonahgao commented Aug 27, 2024

It looks like we need to save the qualifiers in the NamePreserver 🤔, otherwise, we won't be able to distinguish between the unqualified column "test.a" and the qualified column test.a. Their schema_names are the same.

@jayzhan211
Copy link
Contributor

Should we get Column's relation for Alias in to_field?

            Expr::Alias(Alias { expr, name, .. }) => {
                let relation = match expr.as_ref() {
                    Expr::Column(c) => c.relation.to_owned(),
                    _ => None
                };

                let (data_type, nullable) = self.data_type_and_nullable(input_schema)?;
                Ok((
                    relation,
                    Field::new(name, data_type, nullable)
                        .with_metadata(self.metadata(input_schema)?)
                        .into(),
                ))
            }

@jonahgao
Copy link
Member

Should we get Column's relation for Alias in to_field?

I think we should directly use the relation from the Alias itself, as that is the purpose of an Alias expression. After an expression is rewritten, by adding an Alias we can ensure that the result of to_field remains the same, including the qualifier and name, thus maintaining the schema unchanged.

@alamb
Copy link
Contributor

alamb commented Sep 5, 2024

Here is an alternate implementation proposed by @jonahgao : #12341

@JasonLi-cn
Copy link
Contributor Author

Here is an alternate implementation proposed by @jonahgao : #12341

thanks @jonahgao @jayzhan211 @alamb

@JasonLi-cn JasonLi-cn closed this Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug triggered by special aliases in nested queries

5 participants