Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in master branch: replica fails to retrieve subca key #4032

Closed
flo-renaud opened this issue Jun 20, 2022 · 15 comments · Fixed by #4088
Closed

Regression in master branch: replica fails to retrieve subca key #4032

flo-renaud opened this issue Jun 20, 2022 · 15 comments · Fixed by #4088

Comments

@flo-renaud
Copy link

FreeIPA CI detected a regression using the nightly build from @pki/master copr repo, see PR#1784 with the test test_replica_promotion_TestSubCAkeyReplication (Details, report).

Test scenario:

  • install master with CA
  • install replica with CA clone
  • configure a subCA on the master with ipa ca-add test_subca_master --subject cn=test_subca_master --desc subca
  • check that the subca is available on the replica:
[root@replica]# ipa ca-show test_subca_master
ipa: ERROR: The certificate for test_subca_master is not available on this server.
  Name: test_subca_master
  Description: subca
  Authority ID: 6f909275-aa1d-4ec8-9171-88c746897fb5
  Subject DN: CN=test_subca_master
  Issuer DN: CN=Certificate Authority,O=IPA.TEST

The logs in /var/log/pki/pki-tomcat/ca/debug show that PKI properly sees the CA LDAP entry on the replica but it fails to retrieve the key.

2022-06-20 11:22:42 [AuthorityMonitor] INFO: CAEngine: Loading authority record cn=6f909275-aa1d-4ec8-9171-88c746897fb5,ou=authorities,ou=ca,o=ipaca
2022-06-20 11:22:42 [AuthorityMonitor] INFO: CertificateAuthority: Initializing authority 6f909275-aa1d-4ec8-9171-88c746897fb5
2022-06-20 11:22:42 [AuthorityMonitor] INFO: CertificateAuthority: Initializing cert signing unit
2022-06-20 11:22:42 [AuthorityMonitor] SEVERE: Object certificate not found. Error Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5: Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5
Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5: Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5
        at com.netscape.ca.CASigningUnit.init(CASigningUnit.java:95)
        at com.netscape.ca.CertificateAuthority.initCertSigningUnit(CertificateAuthority.java:1302)
        at com.netscape.ca.CertificateAuthority.init(CertificateAuthority.java:409)
        at org.dogtagpki.server.ca.CAEngine.readAuthority(CAEngine.java:1432)
        at com.netscape.ca.AuthorityMonitor.run(AuthorityMonitor.java:140)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.mozilla.jss.crypto.ObjectNotFoundException: Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5
        at org.mozilla.jss.CryptoManager.findCertByNicknameNative(Native Method)
        at org.mozilla.jss.CryptoManager.findCertByNickname(CryptoManager.java:988)
        at com.netscape.ca.CASigningUnit.init(CASigningUnit.java:91)
        ... 5 more

2022-06-20 11:22:42 [AuthorityMonitor] WARNING: CertificateAuthority: CA signing key and cert not (yet) present in NSS database
2022-06-20 11:22:42 [AuthorityMonitor] INFO: CertificateAuthority: Starting KeyRetriever for authority 6f909275-aa1d-4ec8-9171-88c746897fb5
2022-06-20 11:22:43 [AuthorityMonitor] INFO: CAEngine: Loading authority record cn=6f909275-aa1d-4ec8-9171-88c746897fb5,ou=authorities,ou=ca,o=ipaca
2022-06-20 11:22:43 [AuthorityMonitor] INFO: CertificateAuthority: Initializing authority 6f909275-aa1d-4ec8-9171-88c746897fb5
2022-06-20 11:22:43 [AuthorityMonitor] INFO: CertificateAuthority: Initializing cert signing unit
2022-06-20 11:22:43 [AuthorityMonitor] SEVERE: Object certificate not found. Error Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5: Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5
Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5: Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5
        at com.netscape.ca.CASigningUnit.init(CASigningUnit.java:95)
        at com.netscape.ca.CertificateAuthority.initCertSigningUnit(CertificateAuthority.java:1302)
        at com.netscape.ca.CertificateAuthority.init(CertificateAuthority.java:409)
        at org.dogtagpki.server.ca.CAEngine.readAuthority(CAEngine.java:1432)
        at com.netscape.ca.AuthorityMonitor.run(AuthorityMonitor.java:149)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.mozilla.jss.crypto.ObjectNotFoundException: Certificate not found: caSigningCert cert-pki-ca 6f909275-aa1d-4ec8-9171-88c746897fb5
        at org.mozilla.jss.CryptoManager.findCertByNicknameNative(Native Method)
        at org.mozilla.jss.CryptoManager.findCertByNickname(CryptoManager.java:988)
        at com.netscape.ca.CASigningUnit.init(CASigningUnit.java:91)
        ... 5 more

2022-06-20 11:22:43 [AuthorityMonitor] WARNING: CertificateAuthority: CA signing key and cert not (yet) present in NSS database
2022-06-20 11:22:43 [AuthorityMonitor] INFO: CertificateAuthority: KeyRetriever already running for authority 6f909275-aa1d-4ec8-9171-88c746897fb5
2022-06-20 11:22:44 [KeyRetrieverRunner-6f909275-aa1d-4ec8-9171-88c746897fb5] WARNING: Caught exception while executing command: missing "certificate" field
java.lang.RuntimeException: missing "certificate" field
        at com.netscape.ca.ExternalProcessKeyRetriever.parseResult(ExternalProcessKeyRetriever.java:92)
        at com.netscape.ca.ExternalProcessKeyRetriever.retrieveKey(ExternalProcessKeyRetriever.java:71)
        at com.netscape.ca.KeyRetrieverRunner._run(KeyRetrieverRunner.java:102)
        at com.netscape.ca.KeyRetrieverRunner.run(KeyRetrieverRunner.java:69)
        at java.base/java.lang.Thread.run(Thread.java:833)

The issue was first detected with dogtag-pki-ca-11.3.0-0.1.alpha1.20220613143838UTC.6bc4706b.fc36.noarch (the run from the previous week was ok, with dogtag-pki-ca-11.3.0-0.1.alpha1.20220526131710UTC.1ca4e256.fc36.noarch).

Companion issue on freeipa side: https://pagure.io/freeipa/issue/9182

@flo-renaud
Copy link
Author

@ckelleyRH the commits between the 2 builds were the following:

$ git log --pretty=reference 1ca4e25..6bc4706b
6bc4706b1 (Replace %package_option macro, 2022-06-10)
4a88cdd4f (Remove redundant type specification, 2022-06-10)
4e893243d (Disable access to external entities when parsing XML, 2022-06-10)
e1eb44a96 (Update CMake VERSION variable, 2022-06-09)
749e38215 (Update CMake NSS_DEFAULT_DB_TYPE, 2022-06-10)
3659ef28c (Move CTest into %build stage, 2022-06-10)
2056910d0 (Fix incorrect Obsoletes: for symkey, 2022-06-09)
47b3affc0 (Replace raw use of ObjectMapper with JSONObject, 2022-06-08)
2e5417224 (Fix serial type in nss-cert-issue request in nssdb.py, 2022-06-09)
57b063516 (Make remaining classes with to/fromJSON methods implement JSONSerializer, 2022-06-09)
1857aab92 (Fix pylint warnings in upgrade scripts, 2022-06-08)
dee60f0bb (Fix pylint warnings in pki.util module, 2022-06-08)
45a601d1b (Fix pylint warnings in pki.upgrade module, 2022-06-08)
80a7591fd (Fix pylint warnings in pki.server.subsystem module, 2022-06-08)
c96e02c23 (Fix pylint warnings in pki.server.pkidestroy module, 2022-06-08)
2ced95b10 (Fix pylint warnings in pki.server.instance module, 2022-06-08)
ce18f87df (Fix pylint warnings in pki.server.deployment.scriptlets module, 2022-06-08)
673932a5e (Fix pylint warnings in pki.server.deployment.pkiparser module, 2022-06-08)
13368b9bb (Fix pylint warnings in pki.server.deployment.pkimanifest module, 2022-06-08)
70e75f40c (Fix pylint warnings in pki.server.deployment.pkihelper module, 2022-06-08)
e082a530e (Fix pylint warnings in pki.server.deployment module, 2022-06-08)
6af42a75b (Fix pylint warnings in pki.server.cli.subsystem module, 2022-06-08)
fd8e5f02a (Fix pylint warnings in pki.server.cli.nuxwdog module, 2022-06-08)
1b266888e (Fix pylint warnings in pki.server.cli.nss module, 2022-06-08)
21b0f688d (Fix pylint warnings in pki.server.cli.cert module, 2022-06-08)
4f95ffc94 (Fix pylint warnings in pki.server.cli.banner module, 2022-06-08)
ee77d9d83 (Fix pylint warnings in pki.server.cli.audit module, 2022-06-08)
87d07c02f (Fix pylint warnings in pki.server module, 2022-06-08)
a8fe9f4f5 (Fix pylint warnings in pki.profile module, 2022-06-08)
a3528366e (Fix pylint warnings in pki.pkcs12 module, 2022-06-08)
29a9a8cdb (Fix pylint warnings in pki.nssdb module, 2022-06-08)
952629186 (Fix pylint warnings in pki.cli.pkcs12 module, 2022-06-08)
d1d5f2be7 (Fix pylint warnings in pki module, 2022-06-08)
86a9d285d (Update log messages in UGSubsystem, 2022-05-27)
37560131c (Update log messages in CACertRequestSubmitCLI, 2022-04-27)
a658e1354 (Remove unused NSSDatabase.create_self_signed_ca_cert(), 2022-06-08)
04379e99e (Remove unused Configurator constants, 2022-04-27)
92f787ee8 (Remove unused Configurator subclasses, 2022-04-27)
b8b11e076 (Remove unused CMSEngine.createConfigurator(), 2022-04-27)
a30f1b99d (Remove unused SystemConfigService subclasses, 2022-04-27)
24a0f68b8 (Update OpenLDAP clients to use -H option, 2022-06-06)
f7c831e3c (Fix pki-lint script, 2022-06-07)
5db76dc98 (Make TokenData implement JSONSerializer, 2022-05-27)
d55aad6d5 (Make AuthenticatorData implement JSONSerializer, 2022-05-26)
8318ff9d0 (Fix pki ca-cert-request-unassign, 2022-05-27)
820088408 (Update OCSP test, 2022-06-06)
70f08778f (Add SonarCloud analysis action (#4013), 2022-06-07)

47b3aff looks like a good candidates to investigate.

@frasertweedale
Copy link
Contributor

custodia-related logs look fine.

base/ca/src/main/java/com/netscape/ca/ExternalProcessKeyRetriever.java is the code that
receives and processes the key data. Modify it to dump the process stdout and inspect its structureas a first
step. The acutal JSON is generated by ipa-custodia (relevant file: ipaserver/secrets/handlers/nsswrappedcert.py) but I can't see that anything would have changed on that side either.

@edewata
Copy link
Contributor

edewata commented Jun 21, 2022

How is the key retriever configured in IPA? Does the command in features.authority.keyRetrieverConfig.executable return an XML or JSON? We've been converting our code to use JSON instead of XML, so that command might need to be updated as well. Also, we don't have a key retriever in regular PKI installations without IPA so we don't have tests or examples.

edewata added a commit to edewata/pki that referenced this issue Jun 21, 2022
The ExternalProcessKeyRetriever has been updated to provide
more detailed log messages and to improve JSON validation of
the result.

A new constructor has been added to create a JSONObject from
a String.

dogtagpki#4032
edewata added a commit to edewata/pki that referenced this issue Jun 21, 2022
The ExternalProcessKeyRetriever has been updated to provide
more detailed log messages for troubleshooting and to improve
the validation of the JSON result.

A new constructor has been added to create a JSONObject from
a String.

dogtagpki#4032
@edewata
Copy link
Contributor

edewata commented Jun 21, 2022

I created a PR to update the log messages and the validation: #4037
The build is available from my COPR repo: https://copr.fedorainfracloud.org/coprs/edewata/pki/builds/
The official COPR build will be available from @pki/master once the PR is merged.

Please try again with this build. Please feel free to comment directly in the PR as well. Thanks.

edewata added a commit to edewata/pki that referenced this issue Jun 21, 2022
The ExternalProcessKeyRetriever has been updated to provide
more detailed log messages and to perform proper JsonNode
validations of the result to help troubleshooting.

A new constructor has been added to create a JSONObject from
a String.

dogtagpki#4032
@frasertweedale
Copy link
Contributor

@edewata the command is required to return JSON. The actual program is part of the IPA package, and FreeIPA configures Dogtag to use that command. I'll continue investigating.

edewata added a commit that referenced this issue Jun 22, 2022
The ExternalProcessKeyRetriever has been updated to provide
more detailed log messages and to perform proper JsonNode
validations of the result to help troubleshooting.

A new constructor has been added to create a JSONObject from
a String.

#4032
@edewata
Copy link
Contributor

edewata commented Jun 22, 2022

Just FYI, the PR has been merged and the COPR build is available from @pki/master. Thanks!

@amore17
Copy link

amore17 commented Aug 1, 2022

Its still failing in report

@frasertweedale
Copy link
Contributor

frasertweedale commented Aug 3, 2022

@ckelleyRH I strongly suspect this is a regression from commit 47b3aff (June 10, which fits the timeline of this issue).

What was the motivation for the above commit? I don't (yet) fully understand the change but the usage in ExternalProcessKeyRetriever is different from other sites where JSONObject is used to parse an input. For example, from SubsystemClient.java:

JsonNode responseNode = parser.getJsonNode().get("Response"); 
String status = responseNode.get("Status").asText();          
logger.debug("Status: " + status);                            

Compared with ExternalProcessKeyRetriever:

JsonNode root = new JSONObject(result).getRootNode();                                                 
JsonNode certNode = root.path("certificate");        
...

Perhaps we just need to replace .getRootNode() with .getJsonNode()?

@frasertweedale
Copy link
Contributor

@ckelleyRH yeah, that was the issue. There was another regression caused by the more recent commit to emit more diagnostic info from ExternalProcessKeyRetriever. I'll have a PR ready soon.

frasertweedale added a commit to frasertweedale/pki that referenced this issue Aug 3, 2022
Commit 47b3aff refactors the JSON
handling in ExternalProcessKeyRetriever.  It introduced a regression
that causes lightweight CA key replication to fail.  It scrutinises
an empty node instead of the JSON node containing the object loaded
from the InputStream.  Update the code to scrutinise the correct
node.

Separately, commit 6c116bf enhanced
the ExternalProcessKeyRetriever diagnostic output.  It also broke
key retrieval.  The `wrapped_key` field value is expected to be a
text node containing base64-encoded data.  We retrieve its decoded
value using `.binaryValue()`.  That commit added a check that throws
an exception if `node.isBinary()` is false.  However, the node is
parsed as a com.fasterxml.jackson.databind.node.TextNode, for which
`.isBinary()` returns false.  This commit removes the check.
`.binaryValue()` will (as before) throw an `IOException` if base64
decoding fails.

Fixes: dogtagpki#4032
@ckelleyRH
Copy link
Contributor

It was part of a series of changes to encapsulate JSON parsing and reduce boilerplate, to pave the road for the eventual total removal of XML, thanks for figuring this one out!

@flo-renaud
Copy link
Author

@frasertweedale @ckelleyRH
IPA nightly test show that the subca key is now properly retrieved from the other server, but the test is failing because of an (orphan) key seen on both sides.
See for instance this report:

DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 certutil: Checking token "NSS Certificate DB" in slot "NSS User Private Key and Certificate Services"
DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 < 0> rsa      5f512ffae0797bf9cd3ff0dc8102c88b7adb64c3   (orphan)
DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 < 1> rsa      a5975e9539211c6f072bc8534727f4a7642de66a   NSS Certificate DB:caSigningCert cert-pki-ca
DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 < 2> rsa      d812397a8ab2c251e5060bda215065f85c25a29e   NSS Certificate DB:ocspSigningCert cert-pki-ca
DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 < 3> rsa      09ea976d3ce9fa60f7b6a0d5ae15ca0b7e175b93   NSS Certificate DB:Server-Cert cert-pki-ca
DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 < 4> rsa      d9ceb900681aa496cb9ce58bd75c8d021f788994   NSS Certificate DB:subsystemCert cert-pki-ca
DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 < 5> rsa      b7493e285ced9bdf4d221bd58b9242130a6c7686   NSS Certificate DB:auditSigningCert cert-pki-ca
DEBUG    ipatests.pytest_ipa.integration.host.Host.master.cmd80:transport.py:557 < 6> rsa      1c85182759573736226a4d1af681ee942c742394   caSigningCert cert-pki-ca fe8bf529-bc74-4b44-af25-d8cf92c7e67a

We can of course modify the test code to ignore this unexpected orphan key but the behavior is different with pki 11.1.0-1 (no orphan key detected). Does it ring any bell, do you have any idea if it could be caused by new pki code from the master branch?

@flo-renaud
Copy link
Author

Update: the orphan key is not present with dogtag-pki-base-11.2.0-2.fc37.noarch, it only happens when using pki from the copr repo @pki/master.

@ckelleyRH
Copy link
Contributor

ckelleyRH commented Aug 15, 2022

@flo-renaud I think this might be another consequence of the change to have JSS issue the temp cert on server startup rather than NSS, there is no orphan when deploying the main CA. I have not tried with a clone yet but the PKI CI doesn't seem to have any orphans.

@flo-renaud
Copy link
Author

Ticket #4103 opened to track the orphan key.

@frasertweedale
Copy link
Contributor

frasertweedale commented Aug 17, 2022

@flo-renaud just brain-dumping my thoughts about how the IPA test could be made more resilient. I don't suggest any particular course of action, just sharing observations.

  • The relevant file is test_replica_promotion.py, class TestSubCAkeyReplication.
  • Currently we test equality of contents of nssdb against (a copy of) the dict EXPECTED_CERTS, whose keys are nicknames and values are expected trust flags. check_certdb() is the method that implements this check.
  • Alternative approach: instead of dict equality check, perform a subset check, i.e. that every key/value in expected_keys is present in the output of self.get_certinfo(master|replica).

This would make the test resilient if more, unrelated certificates were to appear in the NSSDB (for whatever reason - maybe new features, but also maybe bugs, as in the current case).

So it is a tradeoff. On the one hand, the test spuriously fails in the presence of new certificates (which may be a legitimate change). On the other hand, the failure can reveal real issues, although those issues are probably not related to LWCA key replication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants