Skip to content

HDDS-15176. Ozone SCM fails to start when gRPC cipher policy list includes unsupported cipher#10192

Merged
adoroszlai merged 1 commit into
apache:masterfrom
dombizita:HDDS-15176
May 7, 2026
Merged

HDDS-15176. Ozone SCM fails to start when gRPC cipher policy list includes unsupported cipher#10192
adoroszlai merged 1 commit into
apache:masterfrom
dombizita:HDDS-15176

Conversation

@dombizita
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

The gRPC server TLS setup applies the configured cipher list directly when building the Netty OpenSSL context. If any configured cipher is unsupported (and there is no supported cipher in the list before that), TLS context creation throws an error and SCM startup fails. Unsupported ciphers in the configured list should be filtered out and service startup should continue if at least one valid cipher remains.

Instead of this:

sslContextBuilder.ciphers(securityConfig.getGrpcTlsCiphers()); 

It should use Netty SupportedCipherSuiteFilter.INSTANCE when applying configured cipher lists in gRPC server TLS context builders:

sslContextBuilder.ciphers(
    securityConfig.getGrpcTlsCiphers(),
    SupportedCipherSuiteFilter.INSTANCE); 

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15176

How was this patch tested?

Added a unit test for this scenario, which was failing before applying the fix. Green CI on my fork: https://github.com/dombizita/ozone/actions/runs/25378217331

@dombizita dombizita requested review from adoroszlai and fapifta May 5, 2026 14:31
@dombizita
Copy link
Copy Markdown
Contributor Author

@octachoron could you please review it? thanks!

Copy link
Copy Markdown
Contributor

@octachoron octachoron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Points double-checked (most of which we discussed offline as well):

  • Silently removing cipher suites still keeps the server in line with the user-supplied configuration (and one specific suite is selected for each connection anyway).
  • This is a convenient way to filter out exactly the unsupported suites, more so than asking users to manage the list externally.
  • No specification seems to exist that requires failing to start when the list contains unsupported cipher suites, so we should be good here too. (I did not find any such requirement either, and there is no test for it.)
  • Setting the cipher list is idiomatic, and the new test follows this idiom, so it should effectively cover the whole change.
  • Netty usage looks good.
  • All four usages of the configuration parameter are covered.

There is one edge case where we may see surprises: when more than one TLS version is enabled, and one of them is left with zero ciphers after the filter. This can have different results, possibly depending on the selected SslProvider:

  • A hard error despite having allowed and valid combinations in the configuration, like with openssl ciphers -ciphersuites TLS_AES_256_GCM_SHA384 -V ''.
  • Falling back to defaults on the given TLS version, like with openssl ciphers -ciphersuites TLS_AES_256_GCM_SHA384 -V.
  • Disabling the corresponding TLS version.
  • Something else I am not considering.

Are you aware of any reason to prefer a specific one, or anything that guarantees the outcome? (I am checking too, but maybe this has been investigated already.)

@dombizita
Copy link
Copy Markdown
Contributor Author

Thanks for reviewing it @octachoron and for verifying the listed points.

There is one edge case where we may see surprises: when more than one TLS version is enabled, and one of them is left with zero ciphers after the filter. This can have different results, possibly depending on the selected SslProvider:

  • A hard error despite having allowed and valid combinations in the configuration, like with openssl ciphers -ciphersuites TLS_AES_256_GCM_SHA384 -V ''.
  • Falling back to defaults on the given TLS version, like with openssl ciphers -ciphersuites TLS_AES_256_GCM_SHA384 -V.
  • Disabling the corresponding TLS version.
  • Something else I am not considering.

Are you aware of any reason to prefer a specific one, or anything that guarantees the outcome? (I am checking too, but maybe this has been investigated already.)

Regarding this corner case I'm not sure if we should handle it (if yes how) in Ozone. I tried testing this with unit tests with the help of Cursor.

  private boolean isProtocolSupported(String protocol) throws Exception {
    return Arrays.asList(SSLContext.getDefault().getSupportedSSLParameters()
        .getProtocols()).contains(protocol);
  }

  private boolean isCipherSupported(String cipher) throws Exception {
    return Arrays.asList(SSLContext.getDefault().getSupportedSSLParameters()
        .getCipherSuites()).contains(cipher);
  }

  @Test
  public void testMixedProtocolsWithFilteredCipherStillServesCompatibleProtocol()
      throws Exception {
    String tls12Cipher = "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256";
    assumeTrue(isProtocolSupported("TLSv1.2"));
    assumeTrue(isProtocolSupported("TLSv1.3"));
    assumeTrue(isCipherSupported(tls12Cipher));

    HttpServer2 server = buildServer(null, tls12Cipher, "TLSv1.2,TLSv1.3");
    server.start();
    try {
      InetSocketAddress addr = server.getConnectorAddress(0);
      SSLSocketFactory tls12Factory = createSocketFactory(
          new String[]{tls12Cipher}, new String[]{"TLSv1.2"});
      int responseCode = connectWithFactory(tls12Factory, addr);
      assertEquals(HttpURLConnection.HTTP_OK, responseCode);
    } finally {
      server.stop();
    }
  }

  @Test
  public void testMixedProtocolsWithFilteredCipherRejectsProtocolWithoutCipherOnSunJsse()
      throws Exception {
    String tls12Cipher = "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256";
    assumeTrue(isProtocolSupported("TLSv1.2"));
    assumeTrue(isProtocolSupported("TLSv1.3"));
    assumeTrue(isCipherSupported(tls12Cipher));
    assumeTrue("SunJSSE".equals(SSLContext.getDefault().getProvider().getName()),
        "Provider-specific expectation: only asserted for SunJSSE");

    HttpServer2 server = buildServer(null, tls12Cipher, "TLSv1.2,TLSv1.3");
    server.start();
    try {
      InetSocketAddress addr = server.getConnectorAddress(0);
      SSLSocketFactory tls13Factory =
          createSocketFactory(null, new String[]{"TLSv1.3"});
      assertThrows(Exception.class, () -> connectWithFactory(tls13Factory, addr));
    } finally {
      server.stop();
    }
  }

These are passing on this branch, so in the second case, when we only have 1.2 supported cipher and try to connect with 1.3 it'll throw an exception. That's the first point in your list and I think it makes sense. Not a 100% sure I understood your case, please let me know if I missed something.

The SupportedCipherSuiteFilter javadoc says "This class will filter all requested ciphers out that are not supported by the current SSLEngine.", which I think should be enough in Ozone.

@octachoron
Copy link
Copy Markdown
Contributor

Yes, @dombizita, thank you, this definitely helps confirm the behavior in the corner case. As long as an empty (sub-)list doesn't result in a fallback to defaults, we are good, and this points to that. In addition, we checked this kind of test out together as well, with both OPENSSL and JDK as the SslProvider:

  @Test
  public void testNoCiphersForVersionAfterFilter() throws Exception {
    Server server = null;
    ManagedChannel channel = null;
    try {
      String[] configuredCiphers = {
          "TLS_FAKE_CIPHER_SUITE",
          "TLS_AES_256_GCM_SHA384"
      };
      server = setupServer(new String[]{"TLSv1.3", "TLSv1.2"}, configuredCiphers);
      server.start();
      channel = setupClient(server.getPort(), new String[]{"TLSv1.2"}, new String[]{"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256"});
      XceiverClientProtocolServiceStub asyncStub = XceiverClientProtocolServiceGrpc.newStub(channel);
      ContainerCommandResponseProto response = sendRequest(asyncStub);
      assertEquals(SUCCESS, response.getResult());
    } finally {
      shutdown(channel, server);
    }
  }

This fails with both providers (and OPENSSL_REFCNT is not supported), but passes if the server also has TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 enabled. A debugging session revealed that the fake suite is correctly interpreted as a non-1.3 one, further validating the test. We did not find out whether the failure of this test was caused by a mismatch between supported versions (third scenario), or an error because no ciphers were enabled for a version that was (first scenario), and the errors were different across providers, but we consistently got errors on the client side. So we know the server isn't falling back to defaults, and that behavior will be consistent for users, even if it is the filter that results in zero ciphers left for a version. I then tried 1.2 and 1.3 clients (and both providers) with a server cipher list that becomes completely empty after filtering, and it showed the same behavior.

It would probably be nice to have official test coverage for these assumptions (just in case the underlying libraries change), but these should give us reasonable confidence that we don't inadvertently enable ciphers, or create confusion.

In this case, there should be no need to handle this corner case specifically in Ozone. Thank you again for the investigation efforts and patience.

Copy link
Copy Markdown
Contributor

@octachoron octachoron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have addressed the only worry I had, concluding that we don't need any changes. Thank you again, looks good to me.

@adoroszlai adoroszlai merged commit debc8ab into apache:master May 7, 2026
47 checks passed
@adoroszlai
Copy link
Copy Markdown
Contributor

Thanks @dombizita for the patch, @octachoron for extensive review/testing.

@dombizita
Copy link
Copy Markdown
Contributor Author

Thanks for the review @adoroszlai @octachoron!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants