Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34506][CORE] ADD JAR with ivy coordinates should be compatible with Hive transitive behavior #31623

Closed
wants to merge 4 commits into from

Conversation

shardulm94
Copy link
Contributor

@shardulm94 shardulm94 commented Feb 23, 2021

What changes were proposed in this pull request?

SPARK-33084 added the ability to use ivy coordinates with SparkContext.addJar. PR #29966 claims to mimic Hive behavior although I found a few cases where it doesn't

  1. The default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is true if not specified in the coordinate and false for invalid values. Also, regardless of Hive, I think a default of true for the transitive parameter also matches ivy's own defaults.

  2. The parameter value for transitive parameter is regarded as case-sensitive based on the understanding that Hive behavior is case-sensitive. However, this is not correct, Hive treats the parameter value case-insensitively.

I propose that we be compatible with Hive for these behaviors

Why are the changes needed?

To make ADD JAR with ivy coordinates compatible with Hive's transitive behavior

Does this PR introduce any user-facing change?

The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet

  1. Previously an ivy coordinate without transitive parameter specified did not resolve transitive dependency, now it does.
  2. Previously an transitive parameter value was treated case-sensitively. e.g. transitive=TRUE would be treated as false as it did not match exactly true. Now it will be treated case-insensitively.

How was this patch tested?

Modified existing unit tests to test new behavior
Add new unit test to cover usage of exclude with unspecified transitive

@HyukjinKwon
Copy link
Member

cc @AngersZhuuuu @wangyum FYI

@github-actions github-actions bot added the SQL label Feb 23, 2021
Copy link
Contributor

@xkrogen xkrogen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finds @shardulm94 ! I'm glad we were able to catch this before it went out in a release (since #29966 missed the 3.1.1 release). Thanks a lot for doing this and PR #31591 to really polish up this feature!

Comment on lines 1227 to 1230
// exclude org.pentaho:pentaho-aggdesigner-algorithm as this transitive dependency does
// not exist on mavencentral and hence cannot be found in the test environment
sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion" +
"?exclude=org.pentaho:pentaho-aggdesigner-algorithm")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Any idea why hive-hcatalog-core has a dependency on a package that doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very infamous dependency brought in by calcite. It does not exist on mavencentral but it does exist on conjars.org. Here is a Stack overflow asking about the same issue. https://stackoverflow.com/questions/32587769/dependency-resolution-issue-in-gradle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes.

@AngersZhuuuu
Copy link
Contributor

Thanks for your double check, nice catch, I miss some point when check hive's behavior.

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

cc @maropu and @dongjoon-hyun too FYI

@maropu
Copy link
Member

maropu commented Feb 24, 2021

Oh, I see. Nice catch.

The Hive behavior is that transitive is true if not specified in the coordinate and false for invalid values.

Just to check; all the released Hive versions having this feature have the same default behaviour?

@maropu
Copy link
Member

maropu commented Feb 24, 2021

I left some minor comments though, the proposal itself seems okay.

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39980/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39980/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Test build #135400 has finished for PR 31623 at commit c63b605.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shardulm94
Copy link
Contributor Author

shardulm94 commented Feb 24, 2021

Just to check; all the released Hive versions having this feature have the same default behaviour?

@maropu Yes, the behaviour has been this way since it was added in HIVE-9664.

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Test build #135439 has finished for PR 31623 at commit 61db81b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 26, 2021

If no one has more comments, I'll merge this in a few days.

@maropu maropu closed this in 0216051 Mar 1, 2021
@maropu
Copy link
Member

maropu commented Mar 1, 2021

Thanks! Merged to master.

@HyukjinKwon
Copy link
Member

Thanks @maropu for taking care of this!

@dongjoon-hyun
Copy link
Member

Thank you, @shardulm94 and all. Like @xkrogen said, it's really nice to fix this at the early stage.

xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
… with Hive transitive behavior

### What changes were proposed in this pull request?
SPARK-33084 added the ability to use ivy coordinates with `SparkContext.addJar`. PR apache#29966 claims to mimic Hive behavior although I found a few cases where it doesn't

1) The default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is [true if not specified](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L169) in the coordinate and [false for invalid values](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L124). Also, regardless of Hive, I think a default of true for the transitive parameter also matches [ivy's own defaults](https://ant.apache.org/ivy/history/2.5.0/ivyfile/dependency.html#_attributes).

2) The parameter value for transitive parameter is regarded as case-sensitive [based on the understanding](apache#29966 (comment)) that Hive behavior is case-sensitive. However, this is not correct, Hive [treats the parameter value case-insensitively](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L122).

I propose that we be compatible with Hive for these behaviors

### Why are the changes needed?
To make `ADD JAR` with ivy coordinates compatible with Hive's transitive behavior

### Does this PR introduce _any_ user-facing change?

The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet
1. Previously an ivy coordinate without `transitive` parameter specified did not resolve transitive dependency, now it does.
2. Previously an `transitive` parameter value was treated case-sensitively. e.g. `transitive=TRUE` would be treated as false as it did not match exactly `true`. Now it will be treated case-insensitively.

### How was this patch tested?

Modified existing unit tests to test new behavior
Add new unit test to cover usage of `exclude` with unspecified `transitive`

Closes apache#31623 from shardulm94/spark-34506.

Authored-by: Shardul Mahadik <smahadik@linkedin.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants