Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update confluent dependency version to 7.6.0 #30638

Merged
merged 8 commits into from Mar 19, 2024
Merged

Conversation

tilgalas
Copy link
Contributor

@tilgalas tilgalas commented Mar 14, 2024

fixes #30610


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@Abacn
Copy link
Contributor

Abacn commented Mar 14, 2024

There are test failures related to avro code generation dependency change:

java.lang.NoClassDefFoundError: Could not initialize class org.apache.beam.sdk.io.gcp.bigquery.AvroGenericRecordToStorageApiProtoTest
...

and

java.lang.AssertionError: null definition should throw an exception: unexpected exception type thrown; expected:<java.lang.NullPointerException> but was:<org.apache.avro.SchemaParseException>
	at org.junit.Assert.assertThrows(Assert.java:1020)
	at org.apache.beam.sdk.io.gcp.pubsub.PubsubClientTest.fromPubsubSchema(PubsubClientTest.java:206)
Caused by: org.apache.avro.SchemaParseException: Cannot parse <null> schema
	at org.apache.avro.Schema.parse(Schema.java:1692)

see https://github.com/apache/beam/runs/22672196056

@Abacn
Copy link
Contributor

Abacn commented Mar 14, 2024

Thanks for the contribution! It involves avro dependency and things get tricky. As avro dependency in Beam is known to be hard to change.

Is it possible to avoid avro dependency changes and still update confluent dependency?

@tilgalas
Copy link
Contributor Author

Thanks for the contribution! It involves avro dependency and things get tricky. As avro dependency in Beam is known to be hard to change.

Is it possible to avoid avro dependency changes and still update confluent dependency?

Thanks for reviewing! I'll try fixing the problems that the tests discovered, if it gets too tricky or risky I imagine we might want to update the confluent version gradually, ie instead of going straight for the 7.6.0 maybe try first with something less radical like 6.x or even 5.4

@@ -262,8 +262,7 @@ private static TableFieldSchema fieldDescriptorFromAvroField(Schema.Field field)
elementType.getType() != Schema.Type.UNION,
"Multiple non-null union types are not supported.");
TableFieldSchema unionFieldSchema =
fieldDescriptorFromAvroField(
new Schema.Field(field.name(), elementType, field.doc(), field.defaultVal()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why defaultVal dropped here?

Copy link
Contributor Author

@tilgalas tilgalas Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the elementType here will be a schema of the union, but with the null type filtered out - in the "extreme" case of an "optionable" union type of [null, SomeSchema] it won't even be a union anymore, just the bare SomeSchema - if the defaultVal is null, it will not correspond to this derived schema anymore, and Avro will throw an exception when constructing the Field object - I'm not sure why this wasn't a problem before (maybe Avro didn't actually check this?)

avro : "org.apache.avro:avro:1.8.2",
avro_tests : "org.apache.avro:avro:1.8.2:tests",
avro : "org.apache.avro:avro:1.11.3",
avro_tests : "org.apache.avro:avro:1.11.3:tests",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there was issue upgrading avro version and that was why it sticked to 1.8.2 for a long time. Might be good to ask in devlist about this proposal

sdks/java/extensions/avro/build.gradle Show resolved Hide resolved
@Abacn
Copy link
Contributor

Abacn commented Mar 15, 2024

also, it's good to run https://github.com/apache/beam/actions/workflows/beam_PostCommit_Java_Avro_Versions.yml in this PR. You can add an empty file named .github/trigger_files/beam_PostCommit_Java_Avro_Versions.json to trigger the test, see README here: https://github.com/apache/beam/tree/master/.github/workflows

@tilgalas
Copy link
Contributor Author

Run Java_GCP_IO_Direct PreCommit

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests now all passed.

wait for another day and will merge if no objection in the devlist mail thread

@Abacn Abacn merged commit 764fcda into apache:master Mar 19, 2024
32 checks passed
@Abacn
Copy link
Contributor

Abacn commented Mar 22, 2024

There is (only) one integration test failing, not too bad:

testBigQueryStorageReadProjectionPushdown (org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOStorageReadIT) failed

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Error converting field Field{name=string_field, description=, type=STRING, options={{}}}: Not a valid schema field: string_field
...
Caused by: org.apache.avro.AvroRuntimeException: Not a valid schema field: string_field
	at org.apache.avro.generic.GenericData$Record.get(GenericData.java:282)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamRow$5(BigQueryUtils.java:500)

See https://github.com/apache/beam/runs/22988286765

@Abacn Abacn mentioned this pull request Apr 2, 2024
3 tasks
hjtran pushed a commit to hjtran/beam that referenced this pull request Apr 4, 2024
* update confluent dependency version to 7.6.0

* make avroSchema field non-transient as the Schema class is now Serializable

* remove now unused org.codehaus.jackson package from GcpApiSurfaceTest

* update expected exception class in PubsubClientTest

* fix AvroGenericRecordToStorageApiProtoTest

* move org.tukaani:xz:1.9 dependency to testImplementation

* bring back the tukaani dep to the avroVersions configurations

* trigger the beam_PostCommitJava_Avro_Versions workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Update confluent libraries versions from 5.3.2 to more recent in kafka io extension
2 participants