-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[FLINK-9816][network] add option to configure SSL engine provider for TM communication #7688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
pnowojski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NicoK for the change. Couple of rather minor comments. The only bigger one is about running tests "if openSSL is available".
| final SSLHandlerFactory serverSSLHandlerFactory = SSLUtils.createInternalServerSSLEngineFactory(serverConfig); | ||
| final SslHandler sslHandler = serverSSLHandlerFactory.createNettySSLHandler(); | ||
| // note: a 'null' allocator seems to work here (probably only because we do not use the ssl engine!) | ||
| final SslHandler sslHandler = serverSSLHandlerFactory.createNettySSLHandler(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.netty.buffer.UnpooledByteBufAllocator#DEFAULT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, a later commit in the PR already fixed that :)
| // SSL should be the first handler in the pipeline | ||
| if (sslFactory != null) { | ||
| ch.pipeline().addLast("ssl", sslFactory.createNettySSLHandler()); | ||
| ch.pipeline().addLast("ssl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this [FLINK-9816][network] netty-fy SSL configuration commit. What does it do? There are no new tests, no changes in the existing tests, no documentation and nothing in the commit message :(
Is it refactor? If so please explain in the commit message what are you refactoring and why.
| .list( | ||
| TextElement.text("%s: default Java-based SSL engine", TextElement.code("JDK")), | ||
| TextElement.text("%s: openSSL-based SSL engine using system libraries" | ||
| + " (falls back to JDK if not available)", TextElement.code("OPENSSL")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be safer to fail instead of fall back if users specifies the openSSL explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think, it would then be a good idea to also have AUTO with the behaviour that it chooses openSSL if available, otherwise Java SSL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be dangerous though if this would result in a heterogeneous cluster where some nodes had to fall back to JavaSSL while others are using openSSL. Maybe let's leave it with the user having to decide manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. I think an AUTO mode (even as the default value) would make sense. But I leave it up to you to decide whether this is worth the extra effort or not.
Problem in non heterogeneous shouldn't be that frequent. I was mainly concerned with the situation where somebody is explicitly setting the value to OpenSSL and it is silently not being respected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it without AUTO since, especially with standalone clusters, it may actually quite simple to accidentally get into a heterogeneous SSL setup by missing some dependencies on some nodes. I fear that debugging such scenarios would be painful
| } else if (providerString.equalsIgnoreCase("JDK")) { | ||
| return JDK; | ||
| } else { | ||
| throw new IllegalArgumentException("Unknown SSL provider: " + providerString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IllegalArgumentException runtime exception that doesn't use our exception hierarchy? Did you mean IllegalConfigurationException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, why not
| public static final List<String> AVAILABLE_SSL_PROVIDERS; | ||
| static { | ||
| if (OpenSsl.isAvailable()) { | ||
| AVAILABLE_SSL_PROVIDERS = Arrays.asList("JDK", "OPENSSL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we test for this somehow on travis? Manually? Is there some kind of instruction how to make OpenSsl available during testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, we'd have to have a custom flink-shaded build around*. There are licensing issues that prevent us from going the easy way and the actual way of providing openSSL in a safe way for the Flink distribution is not set in stone yet. I'll create a follow-up PR with proper documentation once this has been defined.
- I have two WIP branches for this: https://github.com/NicoK/flink-shaded/commits/flink-9816.static-6.0 and https://github.com/NicoK/flink-shaded/tree/flink-9816
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I enriched this PR with a commit that enables the custom flink-shaded builds with static openSSL builds for Travis, just to see whether the tests run through. We would have to drop that commit during the merge though.
| // session context is only be available after a session was setup -> this should be true after data was sent | ||
| SSLSessionContext sessionContext = sslHandler.engine().getSession().getSessionContext(); | ||
| assertNotNull("bug in unit test setup: session context not available", sessionContext); | ||
| // TODO: sessionContext is from the client side which may not have a session cache at all (with openSSL it behaves that way) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done - by a little extra work here and there :(
2255050 to
7c4cfd0
Compare
|
Thanks for the review, I addressed all issues you found and rebased onto the latest |
7b7bcb6 to
486102a
Compare
pnowojski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM however to avoid having a dead/un-tested code in the master I think it would be better to merge this only once we resolve flink-shaded-netty issue and once we will be able to test this on travis.
8232b85 to
8922340
Compare
|
hi @pnowojski I did some updates to this PR:
|
pnowojski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % one comment
| source "$(dirname "$0")"/common_s3.sh | ||
|
|
||
| # randomly set up openSSL with dynamically/statically linked libraries | ||
| OPENSSL_LINKAGE=$(if (( RANDOM % 2 )) ; then echo "dynamic"; else echo "static"; fi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an echo message directly after the RANDOM what was chosen to be tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you think it is enough to print this one (during setup, in common_ssl.sh)?
echo "Setting up SSL with: ${type} ${provider} ${provider_lib}"
(I added this line there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently yes, but it would be safer to tie those two things together (best hidden behind some function like:
String x = new Random().choice("dynamic", "static");
System.out.println(format("Executing test with following value of x = [%s] that was randomly selected", x));
), so that there is obvious connection between randomness and printing the random result.
Otherwise:
- someone might remove the logging message, without realising the implication of that fact
- there is a higher chance of someone executing the test and not realising why is not deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely convinced but also don't object and therefore added a fixup
-> I'm now waiting for the flink-shaded 7.0 release before merging
549cb87 to
71323e8
Compare
This allows line breaks in text block elements and may be useful, for example, in starting a new line inside a list description element.
Refactor the SSL configuration done for Netty to have it more like the way Netty intends it to be: using its SslContextBuilder. This will make it much easier to set a different Netty SSL engine provider. [hotfix][network] extract key and trust manager factory creation
[FLINK-9816][network][tests] add openSSL-based SSL tests if available [FLINK-9816][network] use OpenSslX509KeyManagerFactory for openSSL back-end According to https://netty.io/news/2018/07/10/4-1-26-Final.html, this will vastly reduce handshake latency and CPU use. [FLINK-9816][network][tests] allow forcing openSSL tests to run For forcing openSSL tests to run (or fail if not available), specify the following system property: '-D flink.tests.force-openssl'
…opt/ Please note that there is also a static version of netty-tcnative but we currently do not distribute it due to licensing issues. Once openSSL completes its switch to Apache License v2, we can provide this as well and maybe even make that one default (by putting it into lib/). Since there are to many things which may go wrong with the dynamically-linked library (based on the system you run on), we provide this only in opt/.
…shaded version This uses the dynamically-linked openSSL for the unit tests since this is the artifact that we distribute. TODO: e2e tests for dynamically and statically linked openSSL
This test (run nightly) will use a dynamically or statically linked openSSL library at random during runtime, in order to eventually verify both.
Mention all steps necessary to get openSSL-based SSL running, based on flink-shaded 7.0.
What is the purpose of the change
Netty has the ability to run with different
SSLEngineimplementations but with our current setup, we are fixed to the JDK implementation, although one based on OpenSSL is expected to be faster [1].We should make this configurable and ideally also provide everything needed to run with OpenSSL in the future (the last part is not part of this PR).
[1] https://netty.io/wiki/requirements-for-4.x.html#benefits-of-using-openssl
This PR subsumes #6328.
Brief change log
SSLUtilsby using Netty'sSslContextBuilder(only a few places do not use netty SSL setups - provide a workaround there)security.ssl.providerOpenSslX509KeyManagerFactoryfor openSSL back-endVerifying this change
This change can be verified as follows:
JDKSSL engine andOPENSSL-> all using a custom build using
netty-tcnativewith statically linked boringssl libraries from http://netty.io/wiki/forked-tomcat-native.html (see https://github.com/NicoK/flink-shaded/tree/flink-9816.static-6.0)Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation