-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-23654][BUILD] remove jets3t as a dependency of spark #22081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #94632 has finished for PR 22081 at commit
|
pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this changes from jets3t.version>0.9.4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah really the change is to remove jets3t; I think Steve also thought it necessary to add back javax.activation that it brought in but wasn't otherwise depended-on by Hadoop. That is I think this patches a gap in the Hadoop pom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason for the activation is that it still comes in from somewhere and the version pulled in drops from 1.1.1 to 1.1; meaning it'd be an accidental downgrade of a JAR. I don't know exactly what uses javax.activation: it is one of those historical artifacts whose main role, mapping mime types, is potentially used somewhere important.
Test build #4243 has finished for PR 22081 at commit
|
Hm @steveloughran looks like the Kinesis tests fail reliably. That makes me suspicious that jets3t is needed, given that's the AWS dependency here. WDYT? |
I've just pushed up my PR which is ~ in sync with this one; I'll close that one now and this can be the one to use. Assume: kinesis uses bouncy castle somewhere. There's some hints in the AWS docs Encrypt and Decrypt Amazon Kinesis Records Using AWS KMS covers end-to-end encryption of Kinesis records. For this you need the AWS encryption SDK, whose docs say you need bouncy castle. And it looks like the AWS encryption SDK does explicitly depend on bouncy castle. Imagine if somehow the removal of bouncy castle as a java crypto provider was stopping that round trip working with some of the encrypt/decrypt not happening. In which case adding bouncy castle should fix things. It worked before because jets3t in spark-core added bouncy castle, and the last bouncy-castle version update made it in sync with kinesis (and broke jets3t, but nobody has noticed...) But
It could just be a strong java crypto provided is needed, and in the absence of the unlimited java crypto JAR in the JDK lib dir (where it's needed for kerberos to work), bouncy-castle needs to be on the CP. What to do?
|
Hm, I wonder, does the (newer) Kinesis SDK pull in bouncy castle? that's fine if so, that would make sense. If #22099 works, then we'll see if this then passes. I suppose it's possible it's not actually related, and the Kinesis bit hasn't worked for a while because it's just never changed and so not tested, and needed an SDK update for some other reason. Who knows. |
No, the SDKs dont pull in bouncy-castle. Checked via mvnrepo
I've checked with Shane: the jenkins systems do not have the unlimited javax crypto in, so suspect that bouncy-castle is just needed for testing |
I see. I guess I'm trying to figure out whether it's reasonable or not to pull in bouncy castle -- just in the Kinesis module I guess -- on behalf of the user then? That's the default conservative thing to do, but having just gone through the ECCN process, I realize it's not trivial for us to redistribute bouncy castle. Worth trimming if it's really about all the same to users. |
I think it should be stripped; maybe add a note to the docs "you need unlimited JCE for kinesis to work" |
Makes sense, but then I wonder how the tests work? we need a test dependency on it? |
making a test-time option is a reasonable idea -getting the unlimited JCE on the test machines (they don't right now) would remove the need for this |
…ions This PR has been superceded by apache#22081 ## What changes were proposed in this pull request? Increment the kinesis client, producer and transient AWS SDK versions to a more recent release. This is to help with the move off bouncy castle of apache#21146 and apache#22081; the goal is that moving up to the new SDK will allow a JVM with unlimited JCE but without bouncy castle to work with Kinesis endpoints. Why this specific set of artifacts? it syncs up with the 1.11.271 AWS SDK used by hadoop 3.0.3, hadoop-3.1. and hadoop 3.1.1; that's been stable for the uses there (s3, STS, dynamo). ## How was this patch tested? Running all the external/kinesis-asl tests via maven with java 8.121 & unlimited JCE, without bouncy castle (apache#21146); default endpoint of us-west.2. Without this SDK update I was getting http cert validation errors, with it they went away. # This PR is not ready without * Jenkins test runs to see what it is happy with * more testing: repeated runs, another endpoint * looking at the new deprecation warnings and selectively addressing them (the AWS SDKs are pretty aggressive about deprecation, but sometimes they increase the complexity of the client code or block some codepaths off completely) Closes apache#22099 from steveloughran/cloud/SPARK-25111-kinesis. Authored-by: Steve Loughran <stevel@hortonworks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
…e licenses and deps
Test build #94849 has finished for PR 22081 at commit
|
Merged to master |
Thanks. Two less JARs on the CP to keep up to date —what more can anyone want? |
Remove jets3t dependency, and bouncy castle which it brings in; update licenses and deps Note this just takes over apache#21146 Existing tests. Closes apache#22081 from srowen/SPARK-23654. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
Remove jets3t dependency, and bouncy castle which it brings in; update licenses and deps
Note this just takes over #21146
How was this patch tested?
Existing tests.