Skip to content

Conversation

@abhisheknath2011
Copy link
Contributor

@abhisheknath2011 abhisheknath2011 commented Feb 2, 2022

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
  • Verified local build.
  • Verified the hive changes by pulling newly published artifacts.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@abhisheknath2011 abhisheknath2011 changed the title Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive Changes for Avro 1.9.2 upgrade and leverage hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive Feb 2, 2022
@abhisheknath2011 abhisheknath2011 marked this pull request as ready for review February 2, 2022 01:35
Copy link
Contributor

@sv2000 sv2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A few clarification questions, and trying to understand what would break (if any) in production deployments as we do the migration.

Schema convertedSchema1 = convertSchema("/converter/recursive_schema_1.avsc", "YwchQiH.OjuzrLOtmqLW");
Schema expectedSchema1 = parseSchema("/converter/recursive_schema_1_converted.avsc");
Assert.assertEquals(convertedSchema1, expectedSchema1);
Assert.assertEquals(convertedSchema1.toString(), expectedSchema1.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are restricting the condition for equality of two Avro schemas. Why is this change needed after your changes to .avsc files? Also, what does this mean in production pipelines that uses the Schema#equals() checks?


@Test
public void testGetMultiConvertsStrings() throws IOException {
// The below error is due to invalid avro data. Type with "null" union must have "null" first and then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would this mean for production deployment with existing Avro schemas containing nullable unions with null not being the first member of the union?

SharedLimiterKey res1key = new SharedLimiterKey("res1");

Map<String, String> configMap = avro.shaded.com.google.common.collect.ImmutableMap.<String, String>builder()
Map<String, String> configMap = com.google.common.collect.ImmutableMap.<String, String>builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we no longer need to shade the guava dependency brought in by Avro 1.9?

Schema expectedOutputSchema1 =
new Schema.Parser().parse("{\"type\":\"record\", \"name\":\"test\", " + "\"fields\":["
+ "{\"name\": \"name\", \"type\": \"string\"}, " + "{\"name\": \"number\", \"type\": [\"null\", \"int\"]}"
+ "{\"name\": \"name\", \"type\": \"string\"}, " + "{\"name\": \"number\", \"type\": [\"null\", \"int\"],\"default\":null}]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment why the default is needed here?

@sv2000 sv2000 merged commit cae40b8 into apache:avro_1_9 Feb 15, 2022
Will-Lo pushed a commit that referenced this pull request Oct 18, 2022
* [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (#3349)

* Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs.

* Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries

* Applied Gobblin codestyle formatting.

Co-authored-by: Lei <autumnust@users.noreply.github.com>

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (#3368)

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.

* Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (#3458)

* Use helper-all v0.2.74 to solve issues around default values. (#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (#3349)

* Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs.

* Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries

* Applied Gobblin codestyle formatting.

Co-authored-by: Lei <autumnust@users.noreply.github.com>

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (#3368)

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.

* Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (#3458)

* Use helper-all v0.2.74 to solve issues around default values. (#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (#3349)

* Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs.

* Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries

* Applied Gobblin codestyle formatting.

Co-authored-by: Lei <autumnust@users.noreply.github.com>

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (#3368)

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.

* Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (#3458)

* Use helper-all v0.2.74 to solve issues around default values. (#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* Merging apache/gobblin master with avro_1_9

* Use helper-all v0.2.74 to solve issues around default values. (#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* Added deprecated json method using AvroCompatibilityHelper

* Removed unused import and replaced Integer.valueOf with Integer.parseInt

* Exclude com.linkedin.hive dependency from gradle build files similar to org.apache.hive

* Repalce direct avro field creation with AvroCompatibilityHelper.createSchemaField

* Removed extra dependency. Addressed review comment - removed jcenter() repository

* Upgrade AvroCompatHelper version

* Removed the code that are actually moved to AvroHiveTypeUtils.java in the master branch

* Addresssed review comments: replaced getObjectProps/getObjectProp with AvroCompatibilityHelper methods

* Fix for test failure

Co-authored-by: Lei <autumnust@users.noreply.github.com>
Co-authored-by: Sreeram Ramachandran <sramachandran@linkedin.com>
phet pushed a commit to phet/gobblin that referenced this pull request Dec 3, 2022
* [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (apache#3349)

* Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs.

* Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries

* Applied Gobblin codestyle formatting.

Co-authored-by: Lei <autumnust@users.noreply.github.com>

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368)

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.

* Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (apache#3458)

* Use helper-all v0.2.74 to solve issues around default values. (apache#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (apache#3349)

* Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs.

* Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries

* Applied Gobblin codestyle formatting.

Co-authored-by: Lei <autumnust@users.noreply.github.com>

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368)

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.

* Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (apache#3458)

* Use helper-all v0.2.74 to solve issues around default values. (apache#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* [Branch avro_1_9] Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs (apache#3349)

* Avro 1.9 upgrade compatible change - replaced deprecated public APIs with the compatible APIs.

* Avro 1.9 upgrade compatible change - Replaced guava library import from avro shaded with direct guava libraries

* Applied Gobblin codestyle formatting.

Co-authored-by: Lei <autumnust@users.noreply.github.com>

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency. (apache#3368)

* Avro 1.9 upgrade compatible change - upgraded jackson mapper to 2.x and kept jackson mapper 1.x for modules with hive, helix library dependency.

* Changes for upgrade Avro 1.9.2 and leverges hive with avro changes from https://linkedin.jfrog.io/artifactory/gobblin-hive (apache#3458)

* Use helper-all v0.2.74 to solve issues around default values. (apache#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* Merging apache/gobblin master with avro_1_9

* Use helper-all v0.2.74 to solve issues around default values. (apache#3469)

The latest version of helper-all fixes the issues seen before w.r.t.
default values, so we can now revert the code and the *.avsc files back
to how they used to be, with two minor exceptions:

1. Check Schema equality using their .toString() representations. Doing
   it the old way works for two out of the three instances, but one of
   them fails, for reasons I haven't figured out yet.

2. Add a `"default":null` piece to recursive_schema_1_converted.avsc.
   This is harmless, and is caused by the fact that the compatibility
   helper always adds it if it's a valid default for the schema. See
   the comments for FieldBuilder19.setDefault():
   https://github.com/linkedin/avro-util/blob/b9e89c55980ea8e5fd3c8d8da362d7195dd2a99c/helper/impls/helper-impl-19/src/main/java/com/linkedin/avroutil1/compatibility/avro19/FieldBuilder19.java#L69

To verify that the files are otherwise the same as before:
```
$ for file in gobblin-core-base/src/test/resources/converter/*.avsc; do
> git show 928e018:$file > /tmp/before
> diff <(jq . </tmp/before) <(jq . <$file)
> done
```

* Added deprecated json method using AvroCompatibilityHelper

* Removed unused import and replaced Integer.valueOf with Integer.parseInt

* Exclude com.linkedin.hive dependency from gradle build files similar to org.apache.hive

* Repalce direct avro field creation with AvroCompatibilityHelper.createSchemaField

* Removed extra dependency. Addressed review comment - removed jcenter() repository

* Upgrade AvroCompatHelper version

* Removed the code that are actually moved to AvroHiveTypeUtils.java in the master branch

* Addresssed review comments: replaced getObjectProps/getObjectProp with AvroCompatibilityHelper methods

* Fix for test failure

Co-authored-by: Lei <autumnust@users.noreply.github.com>
Co-authored-by: Sreeram Ramachandran <sramachandran@linkedin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants