-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional encryption to the JITServer metrics server #15217
Conversation
20ff61b
to
b38560a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed it entirely, but I wanted to submit the few suggestions that I have.
Also, since we allocate an HttpGetRequest
when sending, I am thinking that maybe we should embed one such data structure in MetricsServer
, basically transform _incompleteRequests
from HttpGetRequest *
to HttpGetRequest
. This will increase the memory by 4K, but we avoid the constant allocation and deallocation.
@@ -1217,6 +1221,27 @@ static bool JITServerParseCommonOptions(J9JavaVM *vm, TR::CompilationInfo *compI | |||
return false; | |||
} | |||
|
|||
// key and cert have to be set as a pair for the metrics server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this section should stay at the point where we parse metrics server options, and execute the code only if the metrics server is enabled.
TR_VerboseLog::writeLineLocked(TR_Vlog_JITServer, "MetricsServer: Socket %d timed-out while writing\n", _pfd[i].fd); | ||
} | ||
closeSocket(i); | ||
} | ||
} | ||
} | ||
_numActiveSockets = 1; // All sockets are closed, but the first one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer true, because we reArmSocketForWriting
, so the number of active sockets does not decrement.
In OpenSSL 1.0.2 the library's global state needs to be explicitly initialized and freed, and this needs to happen before and after, respectively, any threads using SSL start. Would you happen to know a good place to do this? The relevant calls don't need any arguments - they just need to be called at those times. |
Typically, things that need to be initialized once and early can be done in The listener and metrics threads are destroyed in DLLMain.cpp |
The client initializes SSL in
It's not clear to me where the SSL data structures are destroyed, but I think it can be done at the end of |
I believe that's the latest changes all done. In particular the SSL connection accepting and communication has been finished. I've done some basic testing with curl and Prometheus and things seem to be working. To enable SSL for a particular scrape config in Prometheus, something like this is needed: scrape_configs:
- job_name: 'jitserver0'
scheme: https
tls_config:
ca_file: cert.pem
insecure_skip_verify: true # needed if self-signed testing certificates are used
static_configs:
- targets: ['localhost:9404'] For our particular prometheus testing setup I had to include the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few inline comments.
Is it at all possible to test the SSL scenario where we are doing a read, but the error code is WANT_WRITE? It looks somewhat strange that we arm the socket for a write operation, but, after poll returns, we continue with a read operation.
} | ||
else if (SSL_ERROR_WANT_WRITE == err) | ||
{ | ||
return WANT_WRITE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen these types of conditions? I wonder under what circumstances they appear.
HttpGetRequest::~HttpGetRequest() | ||
{ | ||
if (_ssl) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to do anything with that _incompleteSSLConnection ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's freed by the OBIO_free_all
; I took that from the existing Listener.cpp
code. I can double-check the documentation.
} | ||
else if ((*OBIO_should_write)(_ssl)) | ||
{ | ||
return WANT_WRITE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks odd that we are doing a read, but we want to continue with a write in this case. Have you managed to trigger this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't triggered it myself. The documentation for BIO_read notes that a single BIO_read
on an SSL BIO can cause multiple reads and writes internally (possibly for re-negotiation?), and those internal writes might fail due to insufficient outgoing capacity.
75fb7fd
to
0af8788
Compare
Those should be all of the changes, other than the separate PR for |
The symbol loading does work using 1.0.2 and 3.0.3. I've also tested the |
TR_Memory::jitPersistentFree(_incompleteRequests[sockIndex]); | ||
_incompleteRequests[sockIndex] = NULL; | ||
} | ||
_requests[sockIndex].clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OBIO_free_all needs to be done before closing the underlying socket. See #15392
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to just move _requests[sockIndex].clear();
at the top of this function. As it stands, we set _pfd[sockIndex].fd = -1;
and then try to close descriptor -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, made that change a little too quickly.
ee0c0f6
to
8e0aef2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk17 |
plinuxjit test aborted due to timeout. This is the test that had the problem:
Note that in the past I have seen this test failing with the same symptoms, but on x86-64 |
@cjjdespres Please rebase to resolve conflicts. Thanks |
The new options -XX:JITServerMetricsSSLKey and -XX:JITServerMetricsSSLCert behave like their JITServer counterparts, automatically enabling SSL/TLS communication between the JITServer metrics server and clients like Prometheus. Fixes: eclipse-openj9#15052 Signed-off-by: Christian Despres <despresc@ibm.com>
8e0aef2
to
e3128a0
Compare
jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk17 |
On x86 we have this timeout:
On PPC we have the timeout that we have seen before:
|
jenkins test sanity plinuxjit,xlinuxjit jdk17 |
Build on Power failed due to infra |
jenkins test sanity plinuxjit jdk17 |
1 similar comment
jenkins test sanity plinuxjit jdk17 |
One test on PPC failed:
It doesn't appear that this failure can be caused by the code in this PR |
The new options
-XX:JITServerMetricsSSLKey
and-XX:JITServerMetricsSSLCert
behave like their JITServer counterparts, automatically enabling SSL/TLS communication between the JITServer metrics server and clients like Prometheus.Fixes: #15052