Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9379] Update calcite to 1.26 #14729

Merged
merged 23 commits into from
Sep 2, 2021
Merged

Conversation

apilloud
Copy link
Member

@apilloud apilloud commented May 5, 2021

This replaces #12962.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented May 5, 2021

Codecov Report

Merging #14729 (965154a) into master (3a7b8e7) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #14729   +/-   ##
=======================================
  Coverage   83.74%   83.75%           
=======================================
  Files         442      442           
  Lines       60050    60050           
=======================================
+ Hits        50290    50295    +5     
+ Misses       9760     9755    -5     
Impacted Files Coverage Δ
sdks/python/apache_beam/internal/metrics/metric.py 90.42% <0.00%> (-1.07%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.64% <0.00%> (+0.12%) ⬆️
sdks/python/apache_beam/io/localfilesystem.py 92.24% <0.00%> (+0.77%) ⬆️
...che_beam/runners/interactive/interactive_runner.py 92.52% <0.00%> (+1.86%) ⬆️
.../python/apache_beam/testing/test_stream_service.py 93.02% <0.00%> (+4.65%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a7b8e7...965154a. Read the comment docs.

@apilloud apilloud force-pushed the updatecalcite branch 2 times, most recently from 522c3f6 to ea51944 Compare May 5, 2021 05:02
@apilloud
Copy link
Member Author

apilloud commented May 5, 2021

run sql postcommit

@ibzib
Copy link
Contributor

ibzib commented Jun 10, 2021

Cannot cast "byte[]" to "ByteString"

Seems to be because of this commit: apache/calcite@dae53ef#diff-2095e662d66d6c1b2851915fa0d73934c3a847550545396835e02d88a9f5e2daR2858-R2871

Calcite 1.24+ casts UDF return values to their expected Java types. Calcite's default JavaTypeFactoryImpl maps VARBINARY to ByteString. We can fix it by overriding this mapping to return byte[] instead.

By the way, org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlJavaUdfTest.testNullArgumentIsNotTypeChecked tests for bad behavior, so having it fail is actually a good thing. I'm guessing the casts added in that same commit fixed the bug (late type checking) it was testing for. So we can just update the test to reflect that.

I committed fixes for these issues to my branch: https://github.com/ibzib/beam/commits/updatecalcite-udf-types

@apilloud apilloud force-pushed the updatecalcite branch 2 times, most recently from 3b5a1f5 to c93f18c Compare July 1, 2021 20:54
@apilloud apilloud force-pushed the updatecalcite branch 8 times, most recently from f3e85e4 to 41d714c Compare August 13, 2021 01:39
@apilloud apilloud force-pushed the updatecalcite branch 3 times, most recently from f29da0a to a42956e Compare August 20, 2021 18:16
@apilloud apilloud changed the title [WIP][BEAM-9379] Update calcite to 1.26 [BEAM-9379] Update calcite to 1.26 Aug 20, 2021
@apilloud
Copy link
Member Author

run java precommit

@apilloud
Copy link
Member Author

This is finally ready for review! @ibzib @nielsbasjes

@apilloud apilloud requested a review from ibzib August 20, 2021 19:37
@apilloud
Copy link
Member Author

Sorry for the massive size. You might have better luck reviewing if you drop the first (generated) commit: 303c959...54881ce

You'll still have to comment on the PR directly.

@apilloud
Copy link
Member Author

Run Java_Examples_Dataflow PreCommit

@apilloud
Copy link
Member Author

Run Java PreCommit

@apilloud apilloud merged commit cff331b into apache:master Sep 2, 2021
@apilloud apilloud deleted the updatecalcite branch September 2, 2021 22:29
@ibzib
Copy link
Contributor

ibzib commented Sep 2, 2021

🥳

@aromanenko-dev
Copy link
Contributor

Please, don't forget to squash the commits before merge or just use GitHub's "Squash and Merge" button.

@apilloud
Copy link
Member Author

apilloud commented Sep 6, 2021

These commits include valuable information. I am -1 on squashing commits.

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Sep 6, 2021

Do you mean all these 23 commits merged with this PR? Then, every commit requires Jira Id prefix to make it easier to track commits history.

Tbh, I doubt that commits like "Update CHANGES.md" and "Up spotbug stack size" can't be squashed with main PR's commit(s). It should make reading commits history easier.

@apilloud
Copy link
Member Author

apilloud commented Sep 7, 2021

I'm not sure what tools you are using to view history, but many of them have ways to give your preferred view. For example git log --first-parent, git blame --first-parent etc.

I squashed fixup commits and put effort into keeping the commits clean, if I hadn't it would easily be 100 commits. I could have been more aggressive on squashing in a few cases (I thought about squashing "Make it functional
" into "Update to vendored Calcite to 1.26.0"), but I would also argue that at least "Update to vendored Calcite to 1.26.0" is over squashed here. A single commit would not have been appropriate in this case.

For the two specific examples you've given: The "Update CHANGES.md" doesn't represent any single commit in the PR so I left it separate. The "Up spotbug stack size" is an issue exposed by this change that is mostly unrelated, it could have been a separate PR (like #15362). Perhaps I should have broken more of this out into separate PRs. This PR has been in the works for over a year, and resulted in many split off changes: #13930 #14146 #14518 possibly others I'm not remembering. (Also some of the commits in this change are being reverted in #15457.)

I don't think we are going to agree on the proper curation of git history, I don't like small "fixup" changes but I also think there are many cases where a PR can benefit from more than a single commit. I proposed banning the "Squash and Merge" button in 2018: https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2%40%3Cdev.beam.apache.org%3E

@aromanenko-dev
Copy link
Contributor

I see your reasons but I still don't see why it can't be squashed into several atomic and independent commits that reflect every major change of this PR and can be rolled back independently if required? What kind of additional value the tiny commits can bring?

Personally, taking into account that it was a long work on this feature and it required more changes than expected initially, I think it had to be split into several PRs or maybe even Jiras. In ideal world, every feature should have one Jira issue, one PR and one independent commit. If it requires more then it had to be split into more granular parts. Of course, there always can be some exceptions, but the goal is to keep a clear commit history and independent rollback.

In our "Commiter guide" we have the requirements for granularity of changes that we discussed before and we have to follow. So my initial point was mostly about that that should help us to make a git history clear. Now I don't see that it's always a case but we can easily make it better.

PS: Regarding the tools. I usually use tig for CLI and Intellij IDEA internal Git client to view annotated lines (git blame) or file history. Using lines annotations with many tiny commits signed by short commit messages and without Jira prefix makes it quite hard to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants