Skip to content

[CALCITE-3789] Support validation of UNNEST multiple array columns like Presto#1811

Closed
my7ym wants to merge 1 commit intoapache:masterfrom
my7ym:miao/alias-unnest-multiple-array-columns
Closed

[CALCITE-3789] Support validation of UNNEST multiple array columns like Presto#1811
my7ym wants to merge 1 commit intoapache:masterfrom
my7ym:miao/alias-unnest-multiple-array-columns

Conversation

@my7ym
Copy link
Contributor

@my7ym my7ym commented Feb 16, 2020

No description provided.

}

private boolean allowFlattenStruct(SqlOperatorBinding operatorBinding) {
if (!(operatorBinding instanceof SqlCallBinding)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowFlattenStruct -> allowAliasUnnestArrayColumns ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another class type for operator binding is RexCallBinding, but why we return true for that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great name. Much more concise. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another class type for operator binding is RexCallBinding, but why we return true for that ?

This is the original behavior. Actually there are tons of SqlOperatorBinding implementations. As the original behavior, UNNEST will ALWAYS flatten the struct. The logic here is that we do NOT allow flatten struct ONLY for SqlCallBinding && conformance.allowAliasUnnestArrayColumns()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 Took a second thought on the function name, the result of the function is actually whether disallow flatten struct in UNNEST. So maybe disallowFlattenStructInUnnest? Let me know. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you only to support SqlCallBinding for unnest alias columns, but the logic seems weird, can you change
allowFlattenStruct => allowAliasUnnestColumns ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks even more confusing. If you read the code carefully enough, it is NOT allowAliasUnnestColumns. If you prefer the name change, I think disallowAliasUnnestColumn is more appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn’t you use !allowAliasUnnestColumns ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Will do

}

private boolean allowFlattenStruct(SqlOperatorBinding operatorBinding) {
if (!(operatorBinding instanceof SqlCallBinding)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another class type for operator binding is RexCallBinding, but why we return true for that ?

prefix.names,
nameMatcher,
validator.getConformance().allowAliasUnnestArrayColumns(),
resolved);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code may works for your case, but you have changed a common sql identifier resolving logic with a special conformance, which seems not right way to go ~

The conformance switch logic should always happen in SqlValidator, we can pass along some switch flags into the scope if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry a little confused here. Here I am directly using conformance as the flag. I feel like it's more concise than an explicit flag like

SqlValidatorImpl.java::
boolean allowAliasUnnestArrays = validator.getConformance().allowAliasUnnestArrayColumns;
DelegatingScope.setAllowAliasUnnestArrays(allowAliasUnnestArrays);

DeletatingScope.java::
Just replace validator.getConformance().allowAliasUnnestArrayColumns with the explicit flag.

And it seems using conformance in scope happens before. e.g. OrderByScope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current fix seems not right, the nested field can be seen if we set up the record type to StructKind.PEEK_FIELDS, for the test cases you gave:

sql("select e.ENAME\n"
        + "from dept_nested as d CROSS JOIN\n"
        + " UNNEST(d.employees) as t(e)")
      .withConformance(SqlConformanceEnum.PRESTO).columnType("VARCHAR(10) NOT NULL");

There t(e) table has record type with e as field, the e can be seen and if you change the nested employee type with code snippet:

final RelDataType empRecordType = typeFactory.builder()
      .add("EMPNO", intType)
      .add("ENAME", varchar10Type)
      .add("DETAIL", typeFactory.builder()
          .add("SKILLS", array(skillRecordType)).build())
      .kind(StructKind.PEEK_FIELDS) // here is the line you can tweak
      .build();

So, in total, this line change is not necessary, the field access should not be controlled by the SqlConformance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. Will do

SQL_SERVER_2008,

PRESTO;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRESTO is not really a db, it behaves more like a computation engine, i'm not sure if we should add a new conformance for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I am also debating on this. I could delete it or rename it. But I think the conformance is only for SQL string validation, so whether or not DB is kinda fine. But it's your call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presto has its own SQL parser & validator & type system, so it counts as a 'dialect' (conformance) for these purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@my7ym my7ym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danny0405 for the timely and detailed review!

}

private boolean allowFlattenStruct(SqlOperatorBinding operatorBinding) {
if (!(operatorBinding instanceof SqlCallBinding)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great name. Much more concise. Will do.

}

private boolean allowFlattenStruct(SqlOperatorBinding operatorBinding) {
if (!(operatorBinding instanceof SqlCallBinding)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another class type for operator binding is RexCallBinding, but why we return true for that ?

This is the original behavior. Actually there are tons of SqlOperatorBinding implementations. As the original behavior, UNNEST will ALWAYS flatten the struct. The logic here is that we do NOT allow flatten struct ONLY for SqlCallBinding && conformance.allowAliasUnnestArrayColumns()

prefix.names,
nameMatcher,
validator.getConformance().allowAliasUnnestArrayColumns(),
resolved);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry a little confused here. Here I am directly using conformance as the flag. I feel like it's more concise than an explicit flag like

SqlValidatorImpl.java::
boolean allowAliasUnnestArrays = validator.getConformance().allowAliasUnnestArrayColumns;
DelegatingScope.setAllowAliasUnnestArrays(allowAliasUnnestArrays);

DeletatingScope.java::
Just replace validator.getConformance().allowAliasUnnestArrayColumns with the explicit flag.

And it seems using conformance in scope happens before. e.g. OrderByScope.

SQL_SERVER_2008,

PRESTO;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I am also debating on this. I could delete it or rename it. But I think the conformance is only for SQL string validation, so whether or not DB is kinda fine. But it's your call.

@my7ym
Copy link
Contributor Author

my7ym commented Feb 19, 2020

@danny0405 Just a kindly reminder, could you please take another look when available? Thanks a lot!

@my7ym
Copy link
Contributor Author

my7ym commented Feb 25, 2020

@danny0405 Could you help take another look at this PR? Thanks!

@danny0405
Copy link
Contributor

danny0405 commented Feb 25, 2020

I'm planning to review again this weekend, very busy with the release work of Calcite 1.22 and also my own company work, sorry for that, Just like Julian said, add a new SqlConformance is really a big change, for "big" i mean, we should keep all the changes correct.

@my7ym
Copy link
Contributor Author

my7ym commented Feb 26, 2020

I'm planning to review again this weekend, very busy with the release work of Calcite 1.22 and also my own company work, sorry for that, Just like Julian said, add a new SqlConformance is really a big change, for "big" i mean, we should keep all the changes correct.

Make sense. Just send a reminder and get a rough timeline. Thanks @danny0405

@my7ym my7ym force-pushed the miao/alias-unnest-multiple-array-columns branch 2 times, most recently from f25f91d to 5aba5c4 Compare February 26, 2020 05:19
@my7ym
Copy link
Contributor Author

my7ym commented Mar 4, 2020

@danny0405 @julianhyde Just a soft reminder on this PR. No rush. Thanks!

@danny0405
Copy link
Contributor

Reviewing now, may take some time ~

@danny0405
Copy link
Contributor

Thanks @my7ym , could you rebase you code to remove the unnecessary change ?

@my7ym my7ym force-pushed the miao/alias-unnest-multiple-array-columns branch 2 times, most recently from 999fe1b to d48b2f1 Compare March 10, 2020 21:26
@danny0405
Copy link
Contributor

We should also add a test case for at least one SqlToRelConverterTest.

@my7ym
Copy link
Contributor Author

my7ym commented Mar 11, 2020

We should also add a test case for at least one SqlToRelConverterTest.

Will do it in the future if needed. I am still unclear that how this should be represented as RelNode. (discussed in https://issues.apache.org/jira/browse/CALCITE-3787)

Is it OK for you?

EDIT: Some other reasons:

  1. We still plan to communicate with Presto using SQL.
  2. RelNode -> SqlNode is not quite mature I believe, but I may be wrong there.
  3. The optimization enabled by RelNode is helpful. But not a high-pri since Presto will also optimize the internal execution plan.

@my7ym my7ym force-pushed the miao/alias-unnest-multiple-array-columns branch 2 times, most recently from 9e8d652 to 12c5af4 Compare March 12, 2020 00:38
@danny0405
Copy link
Contributor

We should also add a test case for at least one SqlToRelConverterTest.

Will do it in the future if needed. I am still unclear that how this should be represented as RelNode. (discussed in https://issues.apache.org/jira/browse/CALCITE-3787)

Is it OK for you?

EDIT: Some other reasons:

  1. We still plan to communicate with Presto using SQL.
  2. RelNode -> SqlNode is not quite mature I believe, but I may be wrong there.
  3. The optimization enabled by RelNode is helpful. But not a high-pri since Presto will also optimize the internal execution plan.

Usually we have 2 kinds of use cases for Calcite:

  • Generates RelNode/RexNode to adapter other downstream planners
  • Generates RelNode/RexNode and unparse it as special SQL dialect to interpretation through JDBC.

Either way we should generates a working plan, that is why I suggest to add SqlToRel test. For more general, we should add operator test and SQL ITCase if we change an operator semantics. But for you case, I think a SqlToRelConverter test is okey.

@my7ym
Copy link
Contributor Author

my7ym commented Mar 14, 2020

We should also add a test case for at least one SqlToRelConverterTest.

Will do it in the future if needed. I am still unclear that how this should be represented as RelNode. (discussed in https://issues.apache.org/jira/browse/CALCITE-3787)
Is it OK for you?
EDIT: Some other reasons:

  1. We still plan to communicate with Presto using SQL.
  2. RelNode -> SqlNode is not quite mature I believe, but I may be wrong there.
  3. The optimization enabled by RelNode is helpful. But not a high-pri since Presto will also optimize the internal execution plan.

Usually we have 2 kinds of use cases for Calcite:

  • Generates RelNode/RexNode to adapter other downstream planners
  • Generates RelNode/RexNode and unparse it as special SQL dialect to interpretation through JDBC.

Either way we should generates a working plan, that is why I suggest to add SqlToRel test. For more general, we should add operator test and SQL ITCase if we change an operator semantics. But for you case, I think a SqlToRelConverter test is okey.

OK. I could do that for something like unnest(employees) as t(e), but I am not sure what's a valid plan for unnest(employees, admins) as t(e, a). Can I just add one test case for something like unnest(employees) as t(e)? Thanks.

@my7ym my7ym force-pushed the miao/alias-unnest-multiple-array-columns branch from 12d30c5 to 667c43d Compare March 29, 2020 18:48
@my7ym
Copy link
Contributor Author

my7ym commented Mar 29, 2020

We should also add a test case for at least one SqlToRelConverterTest.

Will do it in the future if needed. I am still unclear that how this should be represented as RelNode. (discussed in https://issues.apache.org/jira/browse/CALCITE-3787)
Is it OK for you?
EDIT: Some other reasons:

  1. We still plan to communicate with Presto using SQL.
  2. RelNode -> SqlNode is not quite mature I believe, but I may be wrong there.
  3. The optimization enabled by RelNode is helpful. But not a high-pri since Presto will also optimize the internal execution plan.

Usually we have 2 kinds of use cases for Calcite:

  • Generates RelNode/RexNode to adapter other downstream planners
  • Generates RelNode/RexNode and unparse it as special SQL dialect to interpretation through JDBC.

Either way we should generates a working plan, that is why I suggest to add SqlToRel test. For more general, we should add operator test and SQL ITCase if we change an operator semantics. But for you case, I think a SqlToRelConverter test is okey.

I have added tests for it. But it needs changes on SqlToRelConverter.

The change is because usually for STRUCT, Uncollect will flatten the type but for this feature, it should not. This is also aligned with the registered type during validation.

I am not sure whether this should be the best way to implement it. Let me know. Thanks.

EDIT: If you think this change is large unnecessarily because we don't need SqlToRel conversion right now), I am happy to revert the SqlToRel conversion changes.

@my7ym
Copy link
Contributor Author

my7ym commented Mar 29, 2020

We should also add a test case for at least one SqlToRelConverterTest.

Done

Comment on lines 150 to 170
if (field.getType() instanceof MapSqlType) {
builder.add(SqlUnnestOperator.MAP_KEY_COLUMN_NAME, field.getType().getKeyType());
builder.add(SqlUnnestOperator.MAP_VALUE_COLUMN_NAME, field.getType().getValueType());
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure what will be the best way to handle MAP here.

@my7ym my7ym force-pushed the miao/alias-unnest-multiple-array-columns branch from 667c43d to d1c6964 Compare March 29, 2020 19:24
boolean withOrdinality, List<String> fieldAliases) {
super(cluster, traitSet, input);
this.withOrdinality = withOrdinality;
this.fieldAliases = fieldAliases;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should copy fieldAliases into an immutable list.

Do you think the logic would be more uniform if fieldAliases were empty, not null, in the usual case? I'm a big fan of empty lists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Usually I think null is a separate state to flag something other than empty. But here they seem to be the same and we have extra bonus to avoid NPE. Good call.

*
* <p>E.g. in UNNEST(a_array, b_array) AS T(a, b),
* a and b will be aliases of a_array and b_array
* respectively
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the behavior of that expression if allowAliasUnnestColumns is false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the most logical place for this function inside SqlConformance? I bet it's not at the very end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/** Conformance value that instructs Calcite to use SQL semantics
* consistent with Presto. */
PRESTO;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alphabetical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return true;
}
}
public boolean allowAliasUnnestColumns() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need space before this function, no space after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


private void convertUnnest(Blackboard bb, SqlCall from, List<String> fieldAliases) {
SqlCall call;
call = from;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign on one line and make final? or eliminate a variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my bad. Done.

replaceSubQueries(bb, node, RelOptUtil.Logic.TRUE_FALSE_UNKNOWN);
}
final List<RexNode> exprs = new ArrayList<>();
final List<String> fieldNames = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need two separate arrays. RelBuilder.project will apply aliases if you have used RelBuilder.alias to wrap expressions in 'as'. Use project rather than projectNamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. btw this logic is there already before I extract it as a separate method. I usually avoid to touch unnecessary part as much as possible.

(null != bb.root) ? bb.root : LogicalValues.createOneRow(cluster);
relBuilder.push(child).projectNamed(exprs, fieldNames, false);

Uncollect uncollect =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's about time we added RelBuilder.uncollect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@my7ym my7ym force-pushed the miao/alias-unnest-multiple-array-columns branch from d1c6964 to 1e212e1 Compare April 9, 2020 05:26
Copy link
Contributor Author

@my7ym my7ym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 @julianhyde Danny's patch seems to resolve all the pending issues in this PR. Let me know if you folks have other feedbacks. If not, please accept it. Thanks!

@danny0405
Copy link
Contributor

If you think my fix is okey, can you check it in this patch, thanks ~

@my7ym
Copy link
Contributor Author

my7ym commented Apr 22, 2020

If you think my fix is okey, can you check it in this patch, thanks ~

Left another comment. I thought you preferred to keep them separate so I did not merge them. Will do after that comment is resolved! Thanks!

@danny0405
Copy link
Contributor

danny0405 commented Apr 26, 2020

I have fired a new fix at https://github.com/danny0405/calcite/tree/fix-3789, please check if you have time.

@my7ym
Copy link
Contributor Author

my7ym commented Apr 26, 2020

I have fired a new fix at https://github.com/danny0405/calcite/tree/fix-3789, please check if you have time.

👍 Will do and let you know.

EIDT: Just two nit comments. Solid fix. Let me know if you want to update it or I could directly resolve them when I merge your fix. Thanks!

@danny0405
Copy link
Contributor

I have fired a new fix at https://github.com/danny0405/calcite/tree/fix-3789, please check if you have time.

👍 Will do and let you know.

EIDT: Just two nit comments. Solid fix. Let me know if you want to update it or I could directly resolve them when I merge your fix. Thanks!

Directly resolve them when you merge the fix, thanks for the contribution ~

@my7ym my7ym force-pushed the miao/alias-unnest-multiple-array-columns branch from 1e212e1 to 5ab8dee Compare April 27, 2020 06:08
@my7ym
Copy link
Contributor Author

my7ym commented Apr 27, 2020

I have fired a new fix at https://github.com/danny0405/calcite/tree/fix-3789, please check if you have time.

👍 Will do and let you know.
EIDT: Just two nit comments. Solid fix. Let me know if you want to update it or I could directly resolve them when I merge your fix. Thanks!

Directly resolve them when you merge the fix, thanks for the contribution ~

Done. Not sure whether I should make the commit message & JIRA title the same. Feel free to modify either of them. Thanks!

@danny0405 danny0405 added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Apr 27, 2020
@danny0405 danny0405 closed this in e44beba Apr 27, 2020
jamesstarr pushed a commit to jamesstarr/calcite that referenced this pull request Aug 28, 2025
This patch also add a new PRESTO conformance.

Fix-up (by Danny):
- Fix SqlTypeUtil#flattenRecordType to not append field index if
  there are no duplicates
- Rename SqlConformance#allowAliasUnnestColumns to
  SqlConformance#allowAliasUnnestItems
- Fix RelStructuredTypeFlattener to not generate flattenned
  field based on struct field
- Promote SqlToRelConverter#convertFrom to allow specify field
  aliases
- Add comment to RelBuilder#uncollect

close apache#1811

Change-Id: Ibeaeda6f0007b5409afcb43e5aab3b878e0de89b
jamesstarr pushed a commit to jamesstarr/calcite that referenced this pull request Mar 16, 2026
apache#343)

This patch also add a new PRESTO conformance.

Fix-up (by Danny):
- Fix SqlTypeUtil#flattenRecordType to not append field index if there
are no duplicates
- Rename SqlConformance#allowAliasUnnestColumns to
SqlConformance#allowAliasUnnestItems
- Fix RelStructuredTypeFlattener to not generate flattenned field based
on struct field
- Promote SqlToRelConverter#convertFrom to allow specify field aliases
- Add comment to RelBuilder#uncollect

close apache#1811

Co-authored-by: Will Yu <wmy7ymw@gmail.com>
jamesstarr pushed a commit to jamesstarr/calcite that referenced this pull request Mar 16, 2026
This patch also add a new PRESTO conformance.

Fix-up (by Danny):
- Fix SqlTypeUtil#flattenRecordType to not append field index if
  there are no duplicates
- Rename SqlConformance#allowAliasUnnestColumns to
  SqlConformance#allowAliasUnnestItems
- Fix RelStructuredTypeFlattener to not generate flattenned
  field based on struct field
- Promote SqlToRelConverter#convertFrom to allow specify field
  aliases
- Add comment to RelBuilder#uncollect

close apache#1811

Change-Id: Ibeaeda6f0007b5409afcb43e5aab3b878e0de89b
jamesstarr pushed a commit to jamesstarr/calcite that referenced this pull request Mar 16, 2026
apache#343)

This patch also add a new PRESTO conformance.

Fix-up (by Danny):
- Fix SqlTypeUtil#flattenRecordType to not append field index if there
are no duplicates
- Rename SqlConformance#allowAliasUnnestColumns to
SqlConformance#allowAliasUnnestItems
- Fix RelStructuredTypeFlattener to not generate flattenned field based
on struct field
- Promote SqlToRelConverter#convertFrom to allow specify field aliases
- Add comment to RelBuilder#uncollect

close apache#1811

Co-authored-by: Will Yu <wmy7ymw@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants