-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types #1005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -213,9 +213,9 @@ | |
|
|
||
| <!-- Spark (Packages) --> | ||
| <dependency> | ||
| <groupId>com.databricks</groupId> | ||
| <groupId>org.apache.spark</groupId> | ||
| <artifactId>spark-avro_2.11</artifactId> | ||
| <version>4.0.0</version> | ||
| <scope>provided</scope> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand we can't bundle this since its tied to a spark version now.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah this will be an additional Jar, the user would have to pass while starting the spark-shell. We would have to document it. I don't see any documentation for
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| </dependency> | ||
|
|
||
| <!-- Hadoop --> | ||
|
|
@@ -239,8 +239,19 @@ | |
| <!-- Hive --> | ||
| <dependency> | ||
| <groupId>${hive.groupid}</groupId> | ||
| <artifactId>hive-service</artifactId> | ||
| <artifactId>hive-exec</artifactId> | ||
| <version>${hive.version}</version> | ||
| <classifier>${hive.exec.classifier}</classifier> | ||
| <exclusions> | ||
| <exclusion> | ||
| <groupId>javax.mail</groupId> | ||
| <artifactId>mail</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.eclipse.jetty.aggregate</groupId> | ||
| <artifactId>*</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>${hive.groupid}</groupId> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -89,7 +89,7 @@ | |
| </relocation> | ||
| <relocation> | ||
| <pattern>org.apache.avro.</pattern> | ||
| <shadedPattern>${mr.bundle.avro.shade.prefix}org.apache.avro.</shadedPattern> | ||
| <shadedPattern>org.apache.hudi.org.apache.avro.</shadedPattern> | ||
| </relocation> | ||
| </relocations> | ||
| <createDependencyReducedPom>false</createDependencyReducedPom> | ||
|
|
@@ -143,17 +143,7 @@ | |
| <dependency> | ||
| <groupId>org.apache.avro</groupId> | ||
| <artifactId>avro</artifactId> | ||
| <scope>${mr.bundle.avro.scope}</scope> | ||
| <scope>compile</scope> | ||
| </dependency> | ||
| </dependencies> | ||
|
|
||
| <profiles> | ||
| <profile> | ||
| <id>mr-bundle-shade-avro</id> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @umehrot2 (cc @vinothchandar ) I will get back on this by today EOD.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (cc @n3nash ) Yeah, this would mean that we need to employ the same package relocation in the jar carrying custom record payloads. As discussed in the earlier threads, there is no way around it. @umehrot2 : We would need to document this caveat in Release Notes and add documentation on how to shade it. Can you create a ticket to track this ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vinothchandar @bvaradar Yes, this will affect the custom payload implementation on the reader side. But we are anyways going to make some changes in how the payload packages are loaded so we should be able to absorb this change as part of those considerations. |
||
| <properties> | ||
| <mr.bundle.avro.scope>compile</mr.bundle.avro.scope> | ||
| <mr.bundle.avro.shade.prefix>org.apache.hudi.</mr.bundle.avro.shade.prefix> | ||
| </properties> | ||
| </profile> | ||
| </profiles> | ||
| </project> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -94,8 +94,6 @@ | |
| <include>org.apache.hive:hive-service-rpc</include> | ||
| <include>org.apache.hive:hive-metastore</include> | ||
| <include>org.apache.hive:hive-jdbc</include> | ||
|
|
||
| <include>com.databricks:spark-avro_2.11</include> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we bundled
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can give it a shot, but we need to carefully understand the consequences of shading a spark library, inside a Jar which is being run on Spark. I remember earlier we had some issue on EMR, but don't have the exact details. Nevertheless, let me try and see if tests pass.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume the tests will pass.. but I realize what you are saying.. the user could be running on a higher spark version say and we would be bundling 2.4.4 . Lets just open a JIRA to tackle this usability issue and keep it as -is now.. We can document the need for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that can be one of the problems. Created a JIRA for this issue: https://issues.apache.org/jira/browse/HUDI-516 About documentation of |
||
| </includes> | ||
| </artifactSet> | ||
| <relocations> | ||
|
|
@@ -139,10 +137,6 @@ | |
| <pattern>org.apache.commons.codec.</pattern> | ||
| <shadedPattern>org.apache.hudi.org.apache.commons.codec.</shadedPattern> | ||
| </relocation> | ||
| <relocation> | ||
| <pattern>com.databricks.</pattern> | ||
| <shadedPattern>org.apache.hudi.com.databricks.</shadedPattern> | ||
| </relocation> | ||
| <!-- TODO: Revisit GH ISSUE #533 & PR#633--> | ||
| </relocations> | ||
| <filters> | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.