Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with Liberty Java21 #27665

Open
tam512 opened this issue Feb 16, 2024 · 21 comments
Labels
bug This bug is not present in a released version of Open Liberty SVT

Comments

@tam512
Copy link

tam512 commented Feb 16, 2024

Test InstantOn with Open Liberty 24.0.0.2 Java21 on Power9 OCP 4.14 cluster. I deployed the checkpoint app image to the cluster using Open Liberty Operator 1.3.1 with JIT enable (.spec.semeruCloudCompiler.enable: true)

apiVersion: apps.openliberty.io/v1
kind: OpenLibertyApplication
metadata:
  name: myapp
spec:
  applicationImage: <my-checkpoint-app-image>
  semeruCloudCompiler:
    enable: true
    replicas: 1
    ...............
  • View the semeru compiler pod log and see it has the following error:
#INFO:  StartTime: Feb 16 14:37:39 2024
#INFO:  TimeZone: UTC (UTC)

JITServer is ready to accept incoming requests
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
#JITServer: t= 21652 A new client (clientUID=18080393426921679733) connected. Server allocated a new client session.
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
A0F04FE4FE7F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:ssl/record/rec_layer_s3.c:320:
@tam512
Copy link
Author

tam512 commented Feb 19, 2024

I also see the same errors on Azure Kubernetes Service (AKS) cluster

@mpirvu
Copy link

mpirvu commented Feb 20, 2024

I'll have a look. A question though: is this seen only on Power (i.e. is x86-64 running fine)?

@tam512
Copy link
Author

tam512 commented Feb 20, 2024

@mpirvu I also see the errors on AKS cluster which is x86-64

@mpirvu
Copy link

mpirvu commented Feb 20, 2024

Thanks. I have a few more questions to limit my search space:

  • Is it seen only with Java21, or does Java 17 have the same problem?
  • If, Java 17 also shows the problem, what was the first OpenJ9 release that started to show the problem?

@mpirvu
Copy link

mpirvu commented Feb 21, 2024

Another question: did we ever had a combination of InstantOn + Semeru Cloud Compiler that worked ok in OCP when deployed with the Liberty Operator?

I have tested Liberty 24.0.0.1 + Java 21 release 0.42.0 with InstantON and Semeru Cloud Compiler in containers, but not in OCP and it's working ok. This is why I am trying to understand whether this test using the Liberty Operator has ever passed.

@tam512
Copy link
Author

tam512 commented Feb 21, 2024

this is the first time I test InstantOn + Semeru Cloud Compiler on OCP4.14

@mpirvu
Copy link

mpirvu commented Feb 21, 2024

this is the first time I test InstantOn + Semeru Cloud Compiler on OCP4.14

Then, it's possible that there is something wrong at the Operator level on how the client and server are setup. The symptoms indicate that the certificate at the client does not match the one at the server.

@mpirvu
Copy link

mpirvu commented Feb 21, 2024

@leochr could you please weigh-in on this issue?

@tam512 tam512 changed the title unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable JIT with InstantOn checkpoint app image unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with InstantOn checkpoint app image Feb 21, 2024
@tam512 tam512 changed the title unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with InstantOn checkpoint app image unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with InstantOn Feb 21, 2024
@tam512
Copy link
Author

tam512 commented Feb 22, 2024

I also see the error on x86 OCP 4.14 cluster with Java 21 but not with Java 17

@mpirvu
Copy link

mpirvu commented Feb 22, 2024

@tam512 In another issue(performance related) you deployed Liberty+Semeru Cloud Compiler with Java 21 in OCP using yaml files. Did that experiment use encryption? If so, I would rather say it's an Operator issue rather than a JVM issue. Thanks

@leochr
Copy link
Member

leochr commented Feb 22, 2024

InstantOn app with Liberty Operator and Semeru Cloud Compiler was validated on OCP as part of this issue: OpenLiberty/open-liberty-operator#438 (comment)

I am not sure which Java version was used, but Liberty images with Java 21 weren't available then.

Liberty Operator generates the certificate and sets up the server and client the same way whether the app is InstantOn or not. The only difference is that the InstantOn app client gets the server details via the OPENJ9_RESTORE_JAVA_OPTIONS env.

@leochr
Copy link
Member

leochr commented Feb 22, 2024

I'll suggest trying this scenario on OCP without Liberty Operator.

I also wonder whether InstantOn could possibly be overridding the certificate, so instead of using the certificate mounted at runtime (by the Operator), a certificate from the snapshot is used.

@vijaysun-omr
Copy link

I agree with @leochr last thought, and one way to verify if InstantOn could be failing to pick up from the restore environment is to try the same test (using Operator and Semeru Cloud Compiler) that @tam512 was running without InstantOn being used.

@hanczaryk hanczaryk added bug This bug is not present in a released version of Open Liberty SVT labels Feb 23, 2024
@tam512
Copy link
Author

tam512 commented Feb 23, 2024

I tried the same test, deploy app without InstantOn using Open Liberty operator+semeruCloudCompiler, I still see the same errors so it's not instantOn related

@tam512 tam512 changed the title unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with InstantOn unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with InstantOn Java21 Feb 23, 2024
@tam512 tam512 changed the title unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with InstantOn Java21 unexpected eof while reading:ssl/record/rec_layer_s3.c:320 when enable semeruCloudCompiler with Liberty Java21 Feb 23, 2024
@mpirvu
Copy link

mpirvu commented Feb 23, 2024

Once, I'll get my hands on a K8 cluster I'll try to reproduce. Just using containers I don't see the issue.

@mpirvu
Copy link

mpirvu commented Feb 26, 2024

I created a Kubernetes cluster based on Microk8s and a container with Liberty 24.0.0.1 and AcmeAir app.
If I deploy the container with OLO using Java 21 I see the problem mentioned in this issue.
If I deploy the container with OLO using Java 17, everything is fine.
If I deploy the container with yaml, using either Java 17 or Java 21 everything is fine.

Thinking that the certificates created by OLO+Java21 may be wrong, I deployed my client+server solution with yaml files, but pointing at the certificates from the TLS secret created by OLO. This was also successful which means that the certificates are good.

In another experiment, I connected my AcmeAir container deployed with yaml to the server deployed by OLO. This also fails, so this may point at a difference between the server launched by OLO and the server launched with yaml, but so far I haven't found any.

@mpirvu
Copy link

mpirvu commented Feb 27, 2024

I added verbose options to the client and I see that it successfully connects to the server and the server successfully compiles methods even though those "unexpected eof" messages appear at the server.

I googled about those messages and found this text that may or may be not relevant:

OpenSSL 3 (a major release) changed some behaviour compared to 1.1.1 with respect to peers that 
fail to shutdown a TLS connection cleanly.
Previously, if a peer unexpectedly shutdown a connection an OpenSSL IO function (such as SSL_read()) 
would report an error and SSL_get_error() would report SSL_ERROR_SYSCALL and errno would be 0. 
This was considered a bug in 1.1.1 (you should never get SSL_ERROR_SYSCALL but with errno as 0). 
However fixing it in 1.1.1 broke some apps. We delayed the fix until the next major version (OpenSSL 3.0).
In OpenSSL 3.0 this error is now reported from SSL_get_error() as SSL_ERROR_SSL and the 
unexpeced eof while reading error is put on the OpenSSL error stack. 
We also added a new option SSL_OP_IGNORE_UNEXPECTED_EOF which treats an unexpected EOF 
from the peer as if they had performed an orderly shutdown.

In order to further debug this issue, it would be very helpful to enable logging at the server by defining an environment variable:
_JAVA_OPTIONS="-Xjit:verbose={JITServer},verbose={compilePerformance}
@leochr Is there a way to inject JVM options to the Semeru Cloud Compiler pod?
I tried

  semeruCloudCompiler:
    enable: true
    replicas: 1
    env:
      - name: _JAVA_OPTIONS
        value: "-Xjit:verbose"

but spec.semeruCloudCompiler.env is not defined.

@mpirvu
Copy link

mpirvu commented Feb 27, 2024

When I create my containers with Java 21 I see

#JITServer: Successfully initialized SSL context (OpenSSL 3.0.7 1 Nov 2022)

but when I create my containers with Java 17 I see

#JITServer: Successfully initialized SSL context (OpenSSL 1.1.1k  FIPS 25 Mar 2021)

Based on the text that I pasted in the previous comment, it's very likely that the change from OpenSSL 1.1.1 to OpenSSL 3.0.7 made those "unexpected eof" messages appear. I still cannot explain why those messages do not appear when I start JITServer with yaml instead of OLO.

@leochr
Copy link
Member

leochr commented Feb 28, 2024

@mpirvu It's not possible to set env on Semeru Cloud Compiler via the OpenLibertyApplication CR. But you could set it directly on the Semeru Cloud Compiler Deployment (i.e. <CR_name>-semeru-compiler-1) that's created by OLO. Make sure to scale down the OLO Deployment olo-controller-manager to 0, otherwise OLO will revert the manual changes to the compiler's Deployment.
Once debug is complete, scale OLO Deployment back to 1 to resume Operator reconciliation.

Please attach the yamls that were using. I'll take a look to see if I can spot any difference that could cause the behaviour change. Thank you

@mpirvu
Copy link

mpirvu commented Mar 4, 2024

I think I understand what's causing those 'unexpected eof' messages : the server defined by OLO adds liveness and readiness probes. I believe the probes establish a connection on the 38400 port, but not a full ssl connection. I managed to replicate the errors on my JITServer deployed with yaml files by adding liveness/readiness probes like OLO does.
I was puzzled by the fact that the server shows some connection errors, while the client does not show any trace of a connection problem. The probing mechanism explains the mystery.

@mpirvu
Copy link

mpirvu commented Mar 5, 2024

Searching the net I did not find a way to send a probe on a TCP port using encryption.

There are two possible solutions here:

  1. Filter out the error message (basically going for an OpenSSL 1.1.1 solution).
  2. Add another port that will be dedicated to receiving unencrypted probes.

The second solution is used in Hazelcast. Citing from their documentation:

since Kubernetes liveness/readiness probes cannot use SSL, you need to prepare a Hazelcast configuration with a separate non-secured port opened for health checks. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This bug is not present in a released version of Open Liberty SVT
Projects
None yet
Development

No branches or pull requests

5 participants