New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oom custom trust manager #928
Oom custom trust manager #928
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor comments, but I don't see anything objectionable. I hope others can take a look at this PR too since I'm not too familiar with the code.
int nss_code = Cert.MatchExceptionToNSSError(excpt); | ||
|
||
if (seen_exception) { | ||
if (logger.isDebugEnabled() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be written as:
if (!logger.isDebugEnabled()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. It's a habit I formed a long time ago. I'll change it.
Note. Sonar doesn't like that this method returns nss_code in 2 places. It's saying it's a "Blocker".
https://sonarcloud.io/project/issues?pullRequest=928&issues=AYWX5DBvciWndDK-ESS5&open=AYWX5DBvciWndDK-ESS5&id=dogtagpki_jss
I could change it to:
if (logger.isDebugEnabled()) {
String msg = ...
logger.debug(msg, excpt);
}
return nss_code;
I wouldn't ordinarily write code like that though, and the code wasn't originally ordered like that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sonar isn't marking it as a blocker any more...
private int assignException(Exception excpt, PK11Cert[] chain) { | ||
private int logException(Exception excpt, PK11Cert[] chain) { | ||
int nss_code = Cert.MatchExceptionToNSSError(excpt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does assign an NSS code to the exception (instead of just logging the exception), so probably the original method name is more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Assign" isn't really the verb I'd use as nothing is being set on the Exception or any other object.
Exceptions are being mapped to codes.
09183ca
to
eb7b777
Compare
Break the cycle so the garbage collector doesn't need to figure out it can release these together and then let the JSSEngineReferenceImpl.finalize() run.
This breaks the cycle of SSLFD -> CertValidationTask -> JSSEngineReferenceImpl.
eb7b777
to
52ed2ea
Compare
Kudos, SonarCloud Quality Gate passed! |
I am reviewing this patch but the above sentence is not clear to me. If I get correctly with the patch we have no loops so |
I'm not sure I have much meaningful to contribute to this discussion in a reasonable timeframe, but I seem to recall there was some reason why we wanted to facilitate ExtendedTMs in the first place, that talked to the SSLEngine. What that was, I do not recall any more. Might be worth double checking it isn't used in Dogtag, and if so, feel free to remove it. |
Yes, I meant to say "Without the patch". I updated the PR comment. |
I have looked at Dogtag and it seems the X509ExtendedTrustManager is not used. However, if we remove it then some configuration could not work because the ExtendedTrustManager is required (X509ExtendedTrustManager documentation), such as the TLS1.2 with custom trust manager. |
@fmarco76 wrote:
I'm not really familiar with this. According to the linked doc, it's highly recommended to use |
@uplogix-mmcclain I would do some tests of this patch but the custom trust manager does not get loaded correctly. Have you modified the tomcatjss or do I miss some configurations? Could you provide some hints on how to replicate your setup? Thanks |
In server.xml for the Connector, I have: In that I have an implementation of this:
MyJssSslUtil overrides JSSUtil with this:
I can probably get you a more complete example later this week. |
@uplogix-mmcclain Thanks! This is the information I was looking for. Reading your first comment post I was thinking you did not extend tomcatjss so I was looking to set the trust manager using some configuration (it is possible for JSSE) without success. |
@uplogix-mmcclain before to analyse in more detail the proposed changes I was trying to replicate the problem highlighted in this PR. I have tested using a fedora 37 container with tomcat and jss as follow:
I have created my dummy I initially started in debug mode and follow the execution steps to verify if the new trust manager is used. Then I have verified if the Then, I thought the problem could be related to some race conditions so I run >130000 Is there something I am missing to replicate the problem? Additional consideration, I have thought about the cycle problem solved modifying the nested class of Your solution could solve the problem in any case because remove a reference but it could be not necessary. @edewata I confirm this scenario is never present in dogtag, the |
My application is currently limited to Java 8, so it can't run JSS 5.0+. I just set up a Fedora 37 environment with a minimal configuration. I should be able to investigate Monday. |
I know with the other issue that didn't require a client cert I used ab (Apache benchmark) to test with. I don't remember if I tested this with that. |
I did not know |
If I use ab, I see every connection leak.
I have 10k JSSEngineReferenceImpl$CertValidationTask, 20k PK11InternalTokenCert, and 40k of class ObjectIdentifier. If I use curl, I see very few instances left after the run; these probably just haven't been garbage collected yet. 33 JSSEngineReferenceImpl$CertValidationTask
test-jsp.sh:
|
@uplogix-mmcclain Just curious, does this leak go away on JDK9? I seem to recall there was some difference in our memory handling in JDK9 vs JDK8 that made freeing native references easier... |
In the Fedora 37 environment I just created I was testing with JDK17 (java-17-openjdk-headless-17.0.6.0.10-1.fc37.x86_64). |
@uplogix-mmcclain at least with curl we get the same results and I have debugged also a single connection to see what happen and it is OK. Not clear to me what happen when ab is used. I can't imagine any reason for this divergent behaviour but I'll do some more investigation. Do you or @cipherboy have some idea what it could be? |
My best guess, if you don't see this with your own custom trust manager, but perhaps there's something different about this custom trust manager that causes it to hold and persist references afterwards? The TM itself should be nicely scoped to just the session validation, I don't really see in general what'd cause it to persist the reference afterwards as, after validation, the task should be released as it isn't necessary any more...? That said, using But its been forever since I've looked at this code :-) |
This seems to trigger it too, so I don't think it's a race condition.
I don't know why cleanup() isn't getting called. Since I couldn't trust it to be called, I looked into what was pinning the memory. |
@uplogix-mmcclain yes, it is not a race condition. After several tests I have noticed that the In your tests, using default JSS trust manager, do all the JSSEngine objects get properly removed? From the Java side of JSS I do not get the reason so I need to debug the C side. |
@uplogix-mmcclain @fmarco76 I think, from your discovery, Marco, (i.e., it could be that Tomcat doesn't even finalize/ |
Yes, all JSSEngine objects are removed if JSSNativeTrustManager is used. I see 2 JSSEngineReferenceImpls in a heap dump after running The issue doesn't happen when JSSNativeTrustManager is used because the JSSEngineReferenceImpl never instantiates a CertValidationTask in that code path. |
@uplogix-mmcclain I am back on this PR and doing some test. I have created this draft PR to verify if the problem is really with In my tests the @cipherboy this is not a final solution because several tests failed. If |
Since the leak fixed by this PR were generating by the problem solved by PR #970 I am closing this. @uplogix-mmcclain if not all the leaks are solved and this PR is still relevant we can reopen and evaluate this with the new changes. Thanks! |
The following is similar in nature to CVE-2021-4213, but requires a
non-standard configuration.
We are running Tomcat 9 with JSS 4.9.3 and Tomcat-JSS 7.7.1 (OpenJDK 8),
but the underlying code is unchanged in the master branch on Github.
If a Tomcat endpoint is configured to use JSS with a custom TrustManager
(anything other than JSSNativeTrustManager) and requires client
certificates, then the server will rapidly run out of memory.
If a client establishes a TLS connection with a client cert that passes
validation of the trust manager, the data associated with the connection
will be "leaked": the 2 large buffers (4096 to 18713 bytes each), plus
JNI references (SSLFD, etc), the JSSSession, JSSEngineReferenceImpl, the
CertValidationTask, etc.
The data is not leaked if a client tries to establish a TLS connection,
but never presents a cert or presents a cert that fails the
X509TrustManager.checkClientTrusted() check. In either case, an
exception is internally created and handled, closeInbound() and
closeOutbound() are called followed by tryCleanup(). This releases the
JSSSession and later allows for the JSSEngineReferenceImpl to be garbage
collected.
The attached patches work for our use cases. A more conservative change
could be made to CertValidationTask to store a WeakReference to the
JSSEngineReferenceImpl, but then the code would need to handle if the
JSSEngineReferenceImpl was garbage collected before the
CertValidationTask (check would need to return a non-zero value). This
more conservative change could still support X509ExtendedTrustManager.
Being part of a library, I don't know if anyone relies on such support
though.
Red Hat Security asked if this could be fixed in the custom
TrustManager.
Unfortunately, this can't be fixed in the custom TrustManager.
The issue doesn't happen when JSSNativeTrustManager is used because the
JSSEngineReferenceImpl never instantiates a CertValidationTask in that
code path.
The patch provided changes CertValidationTask to a static inner class so
it no longer holds a reference to JSSEngineReferenceImpl. The
CertValidationTask is held by SSLFDProxy and SSLFDProxy is held by
JSSEngineReferenceImpl completing a cycle.
Without the patch: JSSEngineReferenceImpl -> SSLFDProxy ->
CertValidationTask -> JSSEngineReferenceImpl...
If the TrustManager rejects the certificate, exception handling causes
cleanup() to be called which frees the SSLFDProxy object. When the
TrustManager accepts the certificate, cleanup() doesn't get called until
the JSSEngineReferenceImpl is garbage collected which can't happen
because of the object cycle when a custom TrustManager is used.
In an earlier patch I tried to release the reference to the
CertValidationTask in SSLFDProxy after
SSLFDProxy.invokeCertAuthHandler() had run. This still ended up
leaking memory, but it didn't happen on every request.
Below is the code path JSSEngineReferenceImpl takes when
JSSNativeTrustManager which bypasses the problem.
in https://github.com/dogtagpki/jss/blob/master/base/src/main/java/org/mozilla/jss/ssl/javax/JSSEngineReferenceImpl.java#L549-L569