Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][BEAM-9379] Update calcite to 1.26 #12962

Closed
wants to merge 12 commits into from

Conversation

nielsbasjes
Copy link
Contributor

In these commits I have done the following:

  1. Update the vendored calcite build to version 1.25
  2. Use that new Calcite version in Beam.
  3. Attempt (!!) to fix all the changes that have occurred in Calcite over the last year.

To test this I have run these to install the new calcite locally

./gradlew -p vendor/calcite-1_25_0  check
./gradlew -p vendor/calcite-1_25_0  -PvendoredDependenciesOnly  publishToMavenLocal 

Current state of this initial pull request:

  1. WORK IN PROGRESS. And I have currently reached the boundaries of my knowledge of this part of the system. Help is needed.
  2. It probably won't build in CI because the vendored Calcite has not yet been released.
  3. There are lots of changes that I did to make it compile that I'm not sure are correct. My current knowledge of this is too limited at this time.
  4. If I run this locally (i.e. ./gradlew build -p sdks/java/extensions/sql/ ) it all builds and about 10% of the tests fail (i.e. about 90% pass)

R:@amaliujia


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@nielsbasjes
Copy link
Contributor Author

R: @amaliujia

@codecov
Copy link

codecov bot commented Sep 29, 2020

Codecov Report

Merging #12962 (de8ff95) into master (80248d0) will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12962      +/-   ##
==========================================
- Coverage   82.72%   82.69%   -0.03%     
==========================================
  Files         466      466              
  Lines       57518    57518              
==========================================
- Hits        47582    47567      -15     
- Misses       9936     9951      +15     
Impacted Files Coverage Δ
sdks/python/apache_beam/utils/interactive_utils.py 90.47% <0.00%> (-4.77%) ⬇️
...hon/apache_beam/runners/direct/test_stream_impl.py 91.91% <0.00%> (-2.21%) ⬇️
...pache_beam/runners/interactive/interactive_beam.py 74.30% <0.00%> (-1.12%) ⬇️
sdks/python/apache_beam/internal/metrics/metric.py 86.45% <0.00%> (-1.05%) ⬇️
...runners/interactive/display/pcoll_visualization.py 85.34% <0.00%> (-0.53%) ⬇️
...eam/runners/interactive/interactive_environment.py 89.92% <0.00%> (-0.36%) ⬇️
.../python/apache_beam/typehints/trivial_inference.py 89.18% <0.00%> (-0.27%) ⬇️
sdks/python/apache_beam/io/iobase.py 84.81% <0.00%> (-0.27%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.44% <0.00%> (-0.26%) ⬇️
sdks/python/apache_beam/transforms/util.py 95.66% <0.00%> (-0.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80248d0...de8ff95. Read the comment docs.

@nielsbasjes nielsbasjes changed the title [BEAM-9379] Update calcite to 1.25 (WIP/INCOMPLETE) [WIP][BEAM-9379] Update calcite to 1.25 Sep 29, 2020
@kennknowles
Copy link
Member

Can you publish a build scan by running your second step with --scan ? Then we can see what is broken.

@kennknowles
Copy link
Member

It might be a huge amount of work. We can just add the new vendored module and then release it, but we will want to know that we have enough people available to also port to it.

@vectorijk
Copy link
Contributor

It might be a huge amount of work. We can just add the new vendored module and then release it, but we will want to know that we have enough people available to also port to it.

agree on this. we could vendor module first and use released library for porting work.

last few weeks. I did a testing on my branch https://github.com/vectorijk/beam/tree/calcite-1-25 and run
./gradlew -Ppublishing -PvendoredDependenciesOnly -PjavaLinkageArtifactIds=beam-vendor-calcite-1_25_0:0.1-SNAPSHOT :checkJavaLinkage
it still has some issues unresolved.

vendor/calcite-1_25_0/build.gradle Outdated Show resolved Hide resolved
vendor/calcite-1_25_0/build.gradle Outdated Show resolved Hide resolved
@nielsbasjes
Copy link
Contributor Author

Can you publish a build scan by running your second step with --scan ? Then we can see what is broken.

https://scans.gradle.com/s/ofcfnu774du4y

@kennknowles
Copy link
Member

@kennknowles
Copy link
Member

@kennknowles
Copy link
Member

@kennknowles
Copy link
Member

https://scans.gradle.com/s/ofcfnu774du4y/tests/:sdks:java:extensions:sql:test/org.apache.beam.sdk.extensions.sql.BeamSqlDslAggregationTest/testWindowOnNonTimestampField#1

This is just a bad test. It checks the exact string instead of the key elements of the string so it is sensitive to irrelevant changes.

@iemejia
Copy link
Member

iemejia commented Oct 9, 2020

Can we consider moving this one to 1.26 because of https://lists.apache.org/thread.html/r0b0fbe2038388175951ce1028182d980f9e9a7328be13d52dab70bb3%40%3Cannounce.apache.org%3E

Probably Beam's use case of Calcite is not impacted by this but I can envision the automatic vulnerability tools complaining on this soon.

@nielsbasjes nielsbasjes changed the title [WIP][BEAM-9379] Update calcite to 1.25 [WIP][BEAM-9379] Update calcite to 1.26 Oct 29, 2020
@nielsbasjes nielsbasjes force-pushed the BEAM-9379-UpdateCalcite branch 2 times, most recently from 8f79ca3 to c7b4b7f Compare December 5, 2020 10:24
@nielsbasjes
Copy link
Contributor Author

nielsbasjes commented Dec 6, 2020

I am in need of advice / assistance.

What I ran into is that some tests in the current state of this pull request fail over a change in Calcite.
I am specifically talking about https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineTest.java#L63 and many other tests in this file that do a CREATE EXTERNAL TABLE or a DROP TABLE.

I tracked the source of this exception back to Calcite where a few months ago this default method was added in Calcite by @julianhyde : https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql/parser/SqlParserImplFactory.java#L58

This default implementation simply returns a DdlExecutor (a new interface) that always fails with a UnsupportedOperationException if no valid implementation has been provided.
Looks good so far.

The problem I have is that my knowledge is lacking how to correctly override this method to return the implementation for Beam that makes it possible to do a CREATE EXTERNAL TABLE.

A pointer on how I should fix this correctly is highly appreciated.


I found in the generated code of BeamSqlParserImpl this code (I shortened it a bit)

    public static final SqlParserImplFactory FACTORY = new SqlParserImplFactory() {
        public SqlAbstractParserImpl getParser(Reader reader) {
            final BeamSqlParserImpl parser = new BeamSqlParserImpl(reader);
...
            return parser;
        }
    };

which is generated from a template in Calcite itself and (as far as I have been able to find so far) does not have a way of implementing a non-default method for DdlExecutor getDdlExecutor().

@aaltay
Copy link
Member

aaltay commented Dec 18, 2020

I am in need of advice / assistance.

@tysonjh @amaliujia @kennknowles - Who could help on this PR?

@nielsbasjes
Copy link
Contributor Author

Yes please.
I have tried over the last week and the main problems that remain are beyond me.
I'm stuck and unable to proceed.
At this point I do not have the understanding on how Calcite and Beam (should) work together to fix this.
So I'm totally fine if someone takes over from here.

At this point I see these big categories of tests that fail:

  1. Tests that try to create a table fail with DDL not supported.
  2. Tests that have nested records fail with class cast exceptions in the SchemaCoder::encode.
    To me it seems an error exists in handling the nesting of schemas.
  3. The BeamCalcRel::castOutputTime tries to call java.time.LocalDate.ofEpochDay(long) yet the underlying Calcite code goes searching for java.time.LocalDate.ofEpochDay(java.lang.Integer) which doesn't exist.
    To me this seems an internal change of calcite how they choose data types.

@apilloud
Copy link
Member

Thanks for all the work you've done on this so far! Hopefully you saw my response to your email to dev@ about about the DDL issues. The other two things you are seeing are probably around changes that need to be made to the type translation code in BeamCalcRel.java.

I don't have time to pick this up right now, but I do have time planned for this in Late January/Early February.

@nielsbasjes
Copy link
Contributor Author

@apilloud Yes, I read and followed the links from your email.
I tried to understand this and given my current knowledge of this I failed in fixing the issues at hand.
When you pick this up in a few weeks remember I'm still willing to help out where I can.

@apilloud apilloud mentioned this pull request Mar 4, 2021
4 tasks
@apilloud apilloud mentioned this pull request May 5, 2021
4 tasks
@nielsbasjes
Copy link
Contributor Author

All of this was fixed in #14729

@nielsbasjes nielsbasjes closed this Sep 3, 2021
@apilloud
Copy link
Member

apilloud commented Sep 3, 2021

Thanks for all the work you did to make this happen!

@nielsbasjes nielsbasjes deleted the BEAM-9379-UpdateCalcite branch January 6, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants