[BEAM-9379] Update calcite to 1.26 #14729

apilloud · 2021-05-05T02:04:31Z

This replaces #12962.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

`ValidatesRunner` compliance status (on master branch)

Lang	ULR	Samza	Twister2
Go	---	---	---
Java
Python	---	---	---
XLang		---	---

Examples testing status on various runners

Lang	ULR	Dataflow	Flink	Samza	Spark	Twister2
Go	---	---	---	---	---	---	---
Java	---		---	---	---	---	---
Python	---	---	---	---	---	---	---
XLang	---	---	---	---	---	---	---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go	Java	Python

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website	Whitespace	Typescript
Non-portable
Portable	---			---	---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

codecov · 2021-05-05T02:21:38Z

Codecov Report

Merging #14729 (965154a) into master (3a7b8e7) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #14729   +/-   ##
=======================================
  Coverage   83.74%   83.75%           
=======================================
  Files         442      442           
  Lines       60050    60050           
=======================================
+ Hits        50290    50295    +5     
+ Misses       9760     9755    -5

Impacted Files	Coverage Δ
sdks/python/apache_beam/internal/metrics/metric.py	`90.42% <0.00%> (-1.07%)`	⬇️
...hon/apache_beam/runners/worker/bundle_processor.py	`93.64% <0.00%> (+0.12%)`	⬆️
sdks/python/apache_beam/io/localfilesystem.py	`92.24% <0.00%> (+0.77%)`	⬆️
...che_beam/runners/interactive/interactive_runner.py	`92.52% <0.00%> (+1.86%)`	⬆️
.../python/apache_beam/testing/test_stream_service.py	`93.02% <0.00%> (+4.65%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a7b8e7...965154a. Read the comment docs.

apilloud · 2021-05-05T16:33:51Z

run sql postcommit

ibzib · 2021-06-10T21:33:09Z

Cannot cast "byte[]" to "ByteString"

Seems to be because of this commit: apache/calcite@dae53ef#diff-2095e662d66d6c1b2851915fa0d73934c3a847550545396835e02d88a9f5e2daR2858-R2871

Calcite 1.24+ casts UDF return values to their expected Java types. Calcite's default JavaTypeFactoryImpl maps VARBINARY to ByteString. We can fix it by overriding this mapping to return byte[] instead.

By the way, org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlJavaUdfTest.testNullArgumentIsNotTypeChecked tests for bad behavior, so having it fail is actually a good thing. I'm guessing the casts added in that same commit fixed the bug (late type checking) it was testing for. So we can just update the test to reflect that.

I committed fixes for these issues to my branch: https://github.com/ibzib/beam/commits/updatecalcite-udf-types

apilloud · 2021-08-20T19:35:56Z

run java precommit

apilloud · 2021-08-20T19:37:44Z

This is finally ready for review! @ibzib @nielsbasjes

apilloud · 2021-08-20T19:42:09Z

Sorry for the massive size. You might have better luck reviewing if you drop the first (generated) commit: 303c959...54881ce

You'll still have to comment on the PR directly.

apilloud · 2021-08-20T19:48:42Z

Run Java_Examples_Dataflow PreCommit

apilloud · 2021-08-20T20:37:43Z

Run Java PreCommit

…r typechecking in Calcite 1.24+.

ibzib · 2021-09-02T22:44:50Z

🥳

aromanenko-dev · 2021-09-06T11:40:01Z

Please, don't forget to squash the commits before merge or just use GitHub's "Squash and Merge" button.

apilloud · 2021-09-06T16:45:47Z

These commits include valuable information. I am -1 on squashing commits.

aromanenko-dev · 2021-09-06T18:01:46Z

Do you mean all these 23 commits merged with this PR? Then, every commit requires Jira Id prefix to make it easier to track commits history.

Tbh, I doubt that commits like "Update CHANGES.md" and "Up spotbug stack size" can't be squashed with main PR's commit(s). It should make reading commits history easier.

apilloud · 2021-09-07T01:29:01Z

I'm not sure what tools you are using to view history, but many of them have ways to give your preferred view. For example git log --first-parent, git blame --first-parent etc.

I squashed fixup commits and put effort into keeping the commits clean, if I hadn't it would easily be 100 commits. I could have been more aggressive on squashing in a few cases (I thought about squashing "Make it functional
" into "Update to vendored Calcite to 1.26.0"), but I would also argue that at least "Update to vendored Calcite to 1.26.0" is over squashed here. A single commit would not have been appropriate in this case.

For the two specific examples you've given: The "Update CHANGES.md" doesn't represent any single commit in the PR so I left it separate. The "Up spotbug stack size" is an issue exposed by this change that is mostly unrelated, it could have been a separate PR (like #15362). Perhaps I should have broken more of this out into separate PRs. This PR has been in the works for over a year, and resulted in many split off changes: #13930 #14146 #14518 possibly others I'm not remembering. (Also some of the commits in this change are being reverted in #15457.)

I don't think we are going to agree on the proper curation of git history, I don't like small "fixup" changes but I also think there are many cases where a PR can benefit from more than a single commit. I proposed banning the "Squash and Merge" button in 2018: https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2%40%3Cdev.beam.apache.org%3E

aromanenko-dev · 2021-09-07T15:03:44Z

I see your reasons but I still don't see why it can't be squashed into several atomic and independent commits that reflect every major change of this PR and can be rolled back independently if required? What kind of additional value the tiny commits can bring?

Personally, taking into account that it was a long work on this feature and it required more changes than expected initially, I think it had to be split into several PRs or maybe even Jiras. In ideal world, every feature should have one Jira issue, one PR and one independent commit. If it requires more then it had to be split into more granular parts. Of course, there always can be some exceptions, but the goal is to keep a clear commit history and independent rollback.

In our "Commiter guide" we have the requirements for granularity of changes that we discussed before and we have to follow. So my initial point was mostly about that that should help us to make a git history clear. Now I don't see that it's always a case but we can easily make it better.

PS: Regarding the tools. I usually use tig for CLI and Intellij IDEA internal Git client to view annotated lines (git blame) or file history. Using lines annotations with many tiny commits signed by short commit messages and without Jira prefix makes it quite hard to do.

apilloud force-pushed the updatecalcite branch 2 times, most recently from 522c3f6 to ea51944 Compare May 5, 2021 05:02

apilloud force-pushed the updatecalcite branch from ea51944 to cb120ac Compare May 7, 2021 18:12

apilloud force-pushed the updatecalcite branch 3 times, most recently from 94d3bf0 to ab7c78d Compare May 26, 2021 21:18

apilloud force-pushed the updatecalcite branch 2 times, most recently from 3b5a1f5 to c93f18c Compare July 1, 2021 20:54

apilloud force-pushed the updatecalcite branch 8 times, most recently from f3e85e4 to 41d714c Compare August 13, 2021 01:39

apilloud mentioned this pull request Aug 14, 2021

[BEAM-12680] Calcite SqlTransform no longer experimental #15244

Merged

4 tasks

apilloud force-pushed the updatecalcite branch 3 times, most recently from f29da0a to a42956e Compare August 20, 2021 18:16

apilloud changed the title ~~[WIP][BEAM-9379] Update calcite to 1.26~~ [BEAM-9379] Update calcite to 1.26 Aug 20, 2021

apilloud requested a review from ibzib August 20, 2021 19:37

apilloud and others added 20 commits September 2, 2021 12:37

Fix DDL

b4e163c

Handle BeamRelNode in RelSubset

098218b

Fix BeamIOPushDown

81dd514

[BEAM-9190] Update BeamBigQuerySqlDialect

bea14a6

Remap IN to Search

410b454

[BEAM-9379] Use byte[] instead of ByteString for (VAR)BINARY in UDFs.

b542de1

[BEAM-9379] Update UDF NULL type mismatch test since there is stricte…

a4a6fac

…r typechecking in Calcite 1.24+.

Fix ZetaSQL window function mapping

841ef2c

Fix Bigtable tests that depend on SQL types

cfcde44

Workaround CALCITE-4759 in JoinPushThroughJoinRule

6c5c04b

Disable nested bytes tests, sorry!

bd9fc41

SqlLine is rotting, Just CAST types for now

6411537

Update CHANGES.md

85aae76

Up spotbug stack size

41fa6fd

Fix BeamMatchRel copy

4a8a651

partitionKey everywhere

4c00192

Make it functional

6f4333b

Update CreateFunction

c520eff

No tpcds dependency

2b43c32

Fix default time types

965154a

apilloud force-pushed the updatecalcite branch from 664d0fb to 965154a Compare September 2, 2021 19:38

apilloud merged commit cff331b into apache:master Sep 2, 2021

apilloud deleted the updatecalcite branch September 2, 2021 22:29

nielsbasjes mentioned this pull request Sep 3, 2021

[WIP][BEAM-9379] Update calcite to 1.26 #12962

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-9379] Update calcite to 1.26 #14729

[BEAM-9379] Update calcite to 1.26 #14729

apilloud commented May 5, 2021 •

edited

Loading

codecov bot commented May 5, 2021 •

edited

Loading

apilloud commented May 5, 2021

ibzib commented Jun 10, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

ibzib commented Sep 2, 2021

aromanenko-dev commented Sep 6, 2021

apilloud commented Sep 6, 2021

aromanenko-dev commented Sep 6, 2021 •

edited

Loading

apilloud commented Sep 7, 2021

aromanenko-dev commented Sep 7, 2021

[BEAM-9379] Update calcite to 1.26 #14729

[BEAM-9379] Update calcite to 1.26 #14729

Conversation

apilloud commented May 5, 2021 • edited Loading

ValidatesRunner compliance status (on master branch)

Examples testing status on various runners

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

codecov bot commented May 5, 2021 • edited Loading

Codecov Report

apilloud commented May 5, 2021

ibzib commented Jun 10, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

apilloud commented Aug 20, 2021

ibzib commented Sep 2, 2021

aromanenko-dev commented Sep 6, 2021

apilloud commented Sep 6, 2021

aromanenko-dev commented Sep 6, 2021 • edited Loading

apilloud commented Sep 7, 2021

aromanenko-dev commented Sep 7, 2021

apilloud commented May 5, 2021 •

edited

Loading

`ValidatesRunner` compliance status (on master branch)

codecov bot commented May 5, 2021 •

edited

Loading

aromanenko-dev commented Sep 6, 2021 •

edited

Loading