[SPARK-37102][BUILD] Removed redundant exclusions in hadoop-cloud module#34383
[SPARK-37102][BUILD] Removed redundant exclusions in hadoop-cloud module#34383vmalakhin wants to merge 7 commits intoapache:masterfrom
hadoop-cloud module#34383Conversation
318d7ea to
96177b3
Compare
|
Thank you for making a PR, @vmalakhin . Could you enable GitHub Action on your repository? Apache Spark is utilizing your free open source GitHub Action runner quota. |
|
Could you review this, @sunchao ? |
sure! |
|
@vmalakhin can you put more details in the PR description?
This doesn't fit the description "What changes were proposed in this pull request"
Hm can you share more details? what missing dependency and how is that related to Spark?
the PR restores transitive dependency for also cc @steveloughran |
OK - there are some details posted under SPARK-37102, but if I try to access ADLS Gen2 then following exception happens: So org.codehaus.jackson.map.ObjectMapper related jar is not presented on class path (ie under jars dir). |
hadoop-cloud module
sunchao
left a comment
There was a problem hiding this comment.
Thanks, could you add more info to the PR description (esp. why the changes are required), so it can go into the git commit eventually.
hadoop-cloud/pom.xml
Outdated
There was a problem hiding this comment.
Because I don't see it in the dependency:tree output:
[INFO] +- org.apache.hadoop:hadoop-azure:jar:3.3.1:compile
[INFO] | +- com.microsoft.azure:azure-storage:jar:7.0.1:compile
[INFO] | | \- com.microsoft.azure:azure-keyvault-core:jar:1.0.0:compile
[INFO] | +- org.apache.hadoop.thirdparty:hadoop-shaded-guava:jar:1.1.1:compile
[INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] | \- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.3.1:compile
[INFO] | +- org.apache.hadoop:hadoop-aliyun:jar:3.3.1:compile
[INFO] | | \- com.aliyun.oss:aliyun-sdk-oss:jar:3.4.1:compile
[INFO] | | +- org.jdom:jdom:jar:1.1:compile
[INFO] | | +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] | | | \- stax:stax-api:jar:1.0.1:compile
[INFO] | | +- com.aliyun:aliyun-java-sdk-core:jar:3.4.0:compile
[INFO] | | +- com.aliyun:aliyun-java-sdk-ram:jar:3.0.0:compile
[INFO] | | +- com.aliyun:aliyun-java-sdk-sts:jar:3.0.0:compile
[INFO] | | \- com.aliyun:aliyun-java-sdk-ecs:jar:4.2.0:compile
[INFO] | +- org.apache.hadoop:hadoop-azure-datalake:jar:3.3.1:compile
[INFO] | | \- com.microsoft.azure:azure-data-lake-store-sdk:jar:2.3.9:compile
[INFO] | \- org.apache.hadoop:hadoop-cos:jar:3.3.1:compile
[INFO] | \- com.qcloud:cos_api-bundle:jar:5.6.19:compile
But happy to get it added back. Shall we do it?
There was a problem hiding this comment.
Hadoop is using hadoop-shaded-guava since 3.3.1 so I think the com.google.guava inclusion is unnecessary.
There was a problem hiding this comment.
Do you mean exclusion?
There was a problem hiding this comment.
I mean the change on the following:
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
needs to be reverted from the PR.
There was a problem hiding this comment.
OK, will do. Thanks.
There was a problem hiding this comment.
That's done. @sunchao - do you mind to take a look please?
hadoop-cloud/pom.xml
Outdated
71d6984 to
e08ca45
Compare
|
Hm, I'm worried that the exclusions are there to ensure that a newer version of Jackson 'wins' in the build. This may change that. What does the result of dev/test-dependencies.sh --replace-manifest show? |
But that's the problem - exclusions dropped required jar completely. Another option could be to add explicit dependency on asl artifact with required version... Then versions can be managed. |
|
codehaus jackson is 1.x, so is more "OK" to add back. That said it's probably excluded because otherwise unused, and triggers security warnings on static analysis, so probably why it wasn't excluded. fasterxml jackson is probably specifically excluded because it is included at a newer version in the Spark build. That isn't related to the error you show. Neither is Guava, which is in a similar situation. Those shouldn't be changed. Can you just add the dependencies that ABFS requires to your app? I don't think this profile is meant to support third party libraries, though ABFS connector could be arguably a special case. hadoop-cloud isn't published as part of the binary release so is more "OK" to change in this way though. All in all I could see adding back codehaus jackson |
Yep, the only difference is just asl jar in the output. If it's OK to go ahead can we merge this in? Or please let me know next steps. Thanks! |
|
No, the fasterxml changes need to be reverted |
Sure, will do. |
That's done. @srowen - do you mind to take another look? |
|
Unless @steveloughran has thoughts on this one, I think it's OK |
|
Jenkins test this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #144783 has finished for PR 34383 at commit
|
|
Merged to master |
|
Thank you, @vmalakhin , @srowen , @sunchao . |
What changes were proposed in this pull request?
Redundant exclusions were removed for hadoop-cloud module so the build output contains required dependency for hadoop-azure artifact (ackson-mapper-asl).
Why are the changes needed?
Currently Hadoop ABFS connector (for Azure Data Lake Storage Gen2) is broken due to missing dependency. So required dependencies for hadoop-azure artifact should be included into distribution output if hadoop-cloud module enabled.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unfortunately Microsoft does not provide support for Data Lake Storage Gen2 within azurite emulator - so the change was tested manually and the diff was checked to see if anything else was picked up for build outputs (before and after the change). So the only change is inclusion of jackson-mapper-asl-1.9.13.jar.