HDDS-15176. Ozone SCM fails to start when gRPC cipher policy list includes unsupported cipher#10192
Conversation
…ludes unsupported cipher
|
@octachoron could you please review it? thanks! |
octachoron
left a comment
There was a problem hiding this comment.
Points double-checked (most of which we discussed offline as well):
- Silently removing cipher suites still keeps the server in line with the user-supplied configuration (and one specific suite is selected for each connection anyway).
- This is a convenient way to filter out exactly the unsupported suites, more so than asking users to manage the list externally.
- No specification seems to exist that requires failing to start when the list contains unsupported cipher suites, so we should be good here too. (I did not find any such requirement either, and there is no test for it.)
- Setting the cipher list is idiomatic, and the new test follows this idiom, so it should effectively cover the whole change.
- Netty usage looks good.
- All four usages of the configuration parameter are covered.
There is one edge case where we may see surprises: when more than one TLS version is enabled, and one of them is left with zero ciphers after the filter. This can have different results, possibly depending on the selected SslProvider:
- A hard error despite having allowed and valid combinations in the configuration, like with
openssl ciphers -ciphersuites TLS_AES_256_GCM_SHA384 -V ''. - Falling back to defaults on the given TLS version, like with
openssl ciphers -ciphersuites TLS_AES_256_GCM_SHA384 -V. - Disabling the corresponding TLS version.
- Something else I am not considering.
Are you aware of any reason to prefer a specific one, or anything that guarantees the outcome? (I am checking too, but maybe this has been investigated already.)
|
Thanks for reviewing it @octachoron and for verifying the listed points.
Regarding this corner case I'm not sure if we should handle it (if yes how) in Ozone. I tried testing this with unit tests with the help of Cursor. These are passing on this branch, so in the second case, when we only have 1.2 supported cipher and try to connect with 1.3 it'll throw an exception. That's the first point in your list and I think it makes sense. Not a 100% sure I understood your case, please let me know if I missed something. The |
|
Yes, @dombizita, thank you, this definitely helps confirm the behavior in the corner case. As long as an empty (sub-)list doesn't result in a fallback to defaults, we are good, and this points to that. In addition, we checked this kind of test out together as well, with both This fails with both providers (and It would probably be nice to have official test coverage for these assumptions (just in case the underlying libraries change), but these should give us reasonable confidence that we don't inadvertently enable ciphers, or create confusion. In this case, there should be no need to handle this corner case specifically in Ozone. Thank you again for the investigation efforts and patience. |
octachoron
left a comment
There was a problem hiding this comment.
We have addressed the only worry I had, concluding that we don't need any changes. Thank you again, looks good to me.
|
Thanks @dombizita for the patch, @octachoron for extensive review/testing. |
|
Thanks for the review @adoroszlai @octachoron! |
What changes were proposed in this pull request?
The gRPC server TLS setup applies the configured cipher list directly when building the Netty OpenSSL context. If any configured cipher is unsupported (and there is no supported cipher in the list before that), TLS context creation throws an error and SCM startup fails. Unsupported ciphers in the configured list should be filtered out and service startup should continue if at least one valid cipher remains.
Instead of this:
It should use Netty
SupportedCipherSuiteFilter.INSTANCEwhen applying configured cipher lists in gRPC server TLS context builders:What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15176
How was this patch tested?
Added a unit test for this scenario, which was failing before applying the fix. Green CI on my fork: https://github.com/dombizita/ozone/actions/runs/25378217331