[CELEBORN-1351] Introduce SSLFactory and enable TLS support#2438
[CELEBORN-1351] Introduce SSLFactory and enable TLS support#2438mridulm wants to merge 7 commits intoapache:mainfrom
Conversation
|
The test failure does not look related to this PR (and the same tests passed here): |
|
+CC @otterc, @FMX, @RexXiong, @SteNicholas, @waitinfuture |
|
@mridulm, there are unnecessary changes of document in this pull request. Please check the changes of document. |
|
Oh no, this is some weird artifact of pulling from my other branch - let me revert those, sorry about that ! |
|
|
||
| if (channelsLimiter != null) { | ||
| pipeline.addLast("limiter", channelsLimiter); | ||
| } |
There was a problem hiding this comment.
Rate limiter is something which does not exist in spark ... this looks fine to me, but please do take a look at this reviewers !
|
|
||
| @Override | ||
| public X509Certificate[] getAcceptedIssuers() { | ||
| return EMPTY_CERT_ARRAY; |
There was a problem hiding this comment.
Note: This fixes a bug in spark, and I will backport it there.
There was a problem hiding this comment.
I assume this surfaced as part of the usage of wantClientAuth? What was the bug?
There was a problem hiding this comment.
The api expects an empty array and not null to be returned.
As part of testing various combinations, this surfaced, yes: though I dont recall if it was specifically related to want auth - it probably was
We need something similar to ssl connectivity test in Spark as well, which can help drive the matrix of combinations for testing (there are a lot more combinations in Spark)
| public SSLEngine createSSLEngine(boolean isClient, ByteBufAllocator allocator) { | ||
| SSLEngine engine = createEngine(isClient, allocator); | ||
| engine.setUseClientMode(isClient); | ||
| engine.setWantClientAuth(true); |
There was a problem hiding this comment.
Unlike in spark, where clients do not have ssl cert, here we allow for ssl client to present their certificate - when available.
This is mainly relevant when master and workers communicate, and not so much between applications and master/worker (not common for cert to be present for spark/flink/mr apps) - providing for master/worker to validate each others identity. In future, this can be the basis for SASL EXTERNAL between s2s components !
See SslConnectivitySuiteJ.testUntrustedClientCertFails where this validation failure is tested.
There was a problem hiding this comment.
Unlike in spark, where clients do not have ssl cert, here we allow for ssl client to present their certificate - when available.
Would be great to do this on Spark too. I had a PR to do this that I never got around to submitting as part of other feature tests, but I wasn't sure if it would have value in isolation beyond the other changes I had in mind (those proved infeasible).
There was a problem hiding this comment.
I was not sure if there is a lot of value in it for spark - in Celeborn this is mainly useful for s2s auth between master and workers - it is not that common for client apps to have signed certs
| clientFactory.close(); | ||
| context.close(); | ||
| testData.cleanup(); | ||
| } |
There was a problem hiding this comment.
The existing test suites, like RpcIntegrationSuite, TransportClientFactorySuite, etc are modified such that a SSL subclass can test all the cases - but over SSL.
| pipeline.addLast("loggingHandler", nettyLogger.getLoggingHandler()); | ||
| } | ||
| if (sslEncryptionEnabled()) { | ||
| if (!isClient && !sslFactory.hasKeyManagers()) { |
There was a problem hiding this comment.
Could the check !sslFactory.hasKeyManagers move into createSslFactory? Meanwhile, does this need to throw exception when not a client connection? What about not add SslHandler for non-client conection?
There was a problem hiding this comment.
If I understood your query - for non client connections, we need a private key in order for SSL encryption.
A client connection can also optionally provide a certificate, but it is not required.
In other words, at client side, keys are not required (will use server cert provided as part of the handshake) - but for server, keys are mandatory and is used for encryption/decryption - until a session key is negotiated.
Given this, we cannot pull this into createSslFactory as it is common for both client and servers. Please let me know if I am missing something here.
|
To answer query about which modules are relevant, all modules can be secured using TLS. Will add more details on this in the docs PR (wip) |
…-and-related-changes
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2438 +/- ##
==========================================
+ Coverage 39.24% 40.17% +0.94%
==========================================
Files 217 218 +1
Lines 13432 13600 +168
Branches 1173 1195 +22
==========================================
+ Hits 5270 5463 +193
+ Misses 7863 7823 -40
- Partials 299 314 +15 ☔ View full report in Codecov by Sentry. |
|
@SteNicholas, please do let me know if I can help clarify better ... hopefully the details above help. |
There was a problem hiding this comment.
Overall LGTM. Thank for @mridulm explain and updates.
otterc
left a comment
There was a problem hiding this comment.
Just some nits. Looks good to me.
common/src/main/java/org/apache/celeborn/common/network/client/TransportClientFactory.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/apache/celeborn/common/network/client/TransportClientFactory.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/apache/celeborn/common/network/ssl/SSLFactory.java
Show resolved
Hide resolved
common/src/main/java/org/apache/celeborn/common/network/ssl/SSLFactory.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/apache/celeborn/common/network/ssl/SSLFactory.java
Outdated
Show resolved
Hide resolved
|
I will merge it tomorrow if it is still open. |
|
LGTM. Merging to main(v0.5.0). |
|
Thanks for the reviews @SteNicholas, @otterc ! |
What changes were proposed in this pull request?
Add SSLFactory, and wire up TLS support with rest of Celeborn to enable secure over the wire communication.
Why are the changes needed?
Add support for TLS to secure wire communication.
This is the last PR to add basic support for TLS.
There will be a follow up for CELEBORN-1356 and documentation ofcourse !
Does this PR introduce any user-facing change?
Yes, completes basic support for TLS in Celeborn.
How was this patch tested?
Existing tests, augmented with additional unit tests.