-
Notifications
You must be signed in to change notification settings - Fork 748
[GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS #3581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cated public APIs with the compatible APIs (apache#3349) * Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs. * Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries * Applied Gobblin codestyle formatting. Co-authored-by: Lei <autumnust@users.noreply.github.com>
…nd kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368) * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.
…#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ```
…cated public APIs with the compatible APIs (apache#3349) * Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs. * Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries * Applied Gobblin codestyle formatting. Co-authored-by: Lei <autumnust@users.noreply.github.com>
…nd kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368) * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.
…#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ```
…cated public APIs with the compatible APIs (apache#3349) * Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs. * Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries * Applied Gobblin codestyle formatting. Co-authored-by: Lei <autumnust@users.noreply.github.com>
…nd kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368) * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.
…#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ```
…#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ```
…to org.apache.hive
Codecov Report
@@ Coverage Diff @@
## master #3581 +/- ##
============================================
+ Coverage 46.87% 47.71% +0.83%
+ Complexity 10659 8598 -2061
============================================
Files 2117 1710 -407
Lines 82965 65380 -17585
Branches 9242 7089 -2153
============================================
- Hits 38893 31193 -7700
+ Misses 40507 31481 -9026
+ Partials 3565 2706 -859
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
gradle/scripts/repositories.gradle
Outdated
| maven { | ||
| url "https://linkedin.jfrog.io/artifactory/gobblin-hive/" | ||
| } | ||
| jcenter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace jcenter with maven central? We are trying to limit dependencies on jcenter because of its flakiness after becoming EoL. This makes our CI flakey
See #3554 for similar type changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed jcenter(). Will check if everything looks good from github CI side.
| for (Map.Entry<String, JsonNode> stringJsonNodeEntry : field.getJsonProps().entrySet()) { | ||
| newField.addProp(stringJsonNodeEntry.getKey(), stringJsonNodeEntry.getValue()); | ||
| // Avro 1.9 compatible change - replaced deprecated public api getJsonProps with getObjectProps | ||
| for (Map.Entry<String, Object> objectEntry : field.getObjectProps().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getObjectProps() doesn't exist in Avro 1.7.7, so this is not Avro version agnostic. We did add a new API in AvroCompatibilityHelper to getAllPropNames(), so you should be able to use the AvroCompatibilityHelper here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have updated the PR with the changes.
| newField.addProp(stringJsonNodeEntry.getKey(), stringJsonNodeEntry.getValue()); | ||
| // Avro 1.9 compatible change - replaced deprecated public api getJsonProps with getObjectProps | ||
| for (Map.Entry<String, Object> objectEntry : field.getObjectProps().entrySet()) { | ||
| newField.addProp(objectEntry.getKey(), objectEntry.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for addProp(). Use the AvroCompatibilityHelper instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| @Test | ||
| public void testGetMultiConvertsStrings() throws IOException { | ||
| // The below error is due to invalid avro data. Type with "null" union must have "null" first and then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not strictly accurate (as far as Avro is concerned). A union with "null" doesn't have to have null as the first type. It's usually intended to signify optional-ness, and usually people do want to put null first, so that they can set "null" as the default value, but it's not required.
What is required (as per Avro) is that the default value must have the same type as the first entry in the union. So, for example, this is correct:
"type": ["null", "string"],
"default": null
Whereas this is wrong:
// default is null, but must be a string, since that's the first type in the union
"type": ["string", "null"],
"default": null
This is also wrong:
// default is a string ("null"), not null, which is the first type in the union
"type": ["null", "string"],
"default": "null"
And this is totally correct, though usually rare to see:
"type": ["string", "null"]
"default": "foobar"
So what you really want to say in this comment is that since the default value is null, you want to ensure that that's the first type in the union.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation. I have updated the comment.
… the master branch
…h AvroCompatibilityHelper methods
* [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (apache#3349) * Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs. * Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries * Applied Gobblin codestyle formatting. Co-authored-by: Lei <autumnust@users.noreply.github.com> * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368) * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. * Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (apache#3458) * Use helper-all v0.2.74 to solve issues around default values. (apache#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ``` * [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (apache#3349) * Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs. * Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries * Applied Gobblin codestyle formatting. Co-authored-by: Lei <autumnust@users.noreply.github.com> * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368) * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. * Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (apache#3458) * Use helper-all v0.2.74 to solve issues around default values. (apache#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ``` * [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (apache#3349) * Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs. * Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries * Applied Gobblin codestyle formatting. Co-authored-by: Lei <autumnust@users.noreply.github.com> * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368) * Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. * Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (apache#3458) * Use helper-all v0.2.74 to solve issues around default values. (apache#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ``` * Merging apache/gobblin master with avro_1_9 * Use helper-all v0.2.74 to solve issues around default values. (apache#3469) The latest version of helper-all fixes the issues seen before w.r.t. default values, so we can now revert the code and the *.avsc files back to how they used to be, with two minor exceptions: 1. Check Schema equality using their .toString() representations. Doing it the old way works for two out of the three instances, but one of them fails, for reasons I haven't figured out yet. 2. Add a `"default":null` piece to recursive_schema_1_converted.avsc. This is harmless, and is caused by the fact that the compatibility helper always adds it if it's a valid default for the schema. See the comments for FieldBuilder19.setDefault(): https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69 To verify that the files are otherwise the same as before: ``` $ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do > git show 928e018:$file > /tmp/before > diff <(jq . </tmp/before) <(jq . <$file) > done ``` * Added deprecated json method using AvroCompatibilityHelper * Removed unused import and replaced Integer.valueOf with Integer.parseInt * Exclude com.linkedin.hive dependency from gradle build files similar to org.apache.hive * Repalce direct avro field creation with AvroCompatibilityHelper.createSchemaField * Removed extra dependency. Addressed review comment - removed jcenter() repository * Upgrade AvroCompatHelper version * Removed the code that are actually moved to AvroHiveTypeUtils.java in the master branch * Addresssed review comments: replaced getObjectProps/getObjectProp with AvroCompatibilityHelper methods * Fix for test failure Co-authored-by: Lei <autumnust@users.noreply.github.com> Co-authored-by: Sreeram Ramachandran <sramachandran@linkedin.com>
* upstream/master: move dataset handler code before cleaning up staging data (apache#3594) [GOBBLIN-1730] Include flow execution id when try to cancel/submit job using SimpleKafkaSpecProducer (apache#3588) [GOBBLIN-1734] make DestinationDatasetHandler work on streaming sources (apache#3592) give option to cancel helix workflow through Delete API (apache#3580) [GOBBLIN-1728] Fix YarnService incorrect container allocation behavior (apache#3586) Support multiple node types in shared flowgraph, fix logs (apache#3590) Search for dummy file in writer directory (apache#3589) Use root cause for checking if exception is transient (apache#3585) [GOBBLIN-1724] Support a shared flowgraph layout in GaaS (apache#3583) [GOBBLIN-1731] Enable HiveMetadataWriter to override table schema lit… (apache#3587) [GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS (apache#3581) [GOBBLIN-1725] Fix bugs in gaas warm standby mode (apache#3582) [GOBBLIN-1718] Define DagActionStoreMonitor to listen for kill/resume… (apache#3572) Add log line for committing/retrieving watermarks in streaming (apache#3578) [GOBBLIN-1707] Enhance `IcebergDataset` to detect when files already at dest then proceed with only delta (apache#3575) Ignore AlreadyExistsException in hive writer (apache#3579) Fail GMIP container for known transient exceptions to avoid data loss (apache#3576) GOBBLIN-1715: Support vectorized row batch pooling (apache#3574) [GOBBLIN-1696] Implement file based flowgraph that detects changes to the underlying… (apache#3548) GOBBLIN-1719 Replace moveToTrash with moveToAppropriateTrash for hadoop trash (apache#3573) [GOBBLIN-1703] avoid double quota increase for adhoc flows (apache#3550)
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Tests
Commits