Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHDS rarely crashes/shuts down somewhere in a pkidestroy and pkispawn workflow when re-provisioning slaves #2030

Closed
pki-bot opened this issue Oct 3, 2020 · 5 comments

Comments

@pki-bot
Copy link

pki-bot commented Oct 3, 2020

This issue was migrated from Pagure Issue #1471. Originally filed by dminnich (@dminnich) on 2015-07-09 16:47:23:

  • Closed as Duplicate
  • Assigned to nobody

I've had this happen to me twice total out of the hundreads of installs I've done.

The infrastructure is Master CA with RHDS running on a separate machine. Slave CA with RHDS running on a separate machine. 4 total machines or unique instances. What I recall happening is being unhappy with the slave install for some reason. Then doing a pkidestory, then doing a pkispawn on the slave. The pkisapwn never completes because it is unable to contact the LDAP server to setup the replication agreement. When I login to the RHDS node that the slave is pointing at, sure enough RHDS is no longer running. Once I start it back up and run pkispawn on the slave again things work as they should.

I get the feeling that it may have something to do with the replication agreements but I can't reproduce it reliably enough and haven't spent the time digging in the logs to prove it. I'm not sure if removing replication agreement or trying to create the replication causes the crash.

Its also possible that its something else or I'm doing something weird to cause the problem, but I'm never interacting directly with the RHDS box and nothing else is doing any LDAP operations against it, so it definitely seems like something in RHCS is causing the problem.

Should the problem occur again or if I figure out how to reproduce it reliably, I'll flush out this bug some more. Otherwise, I'm curious if other people might chime in and say they've experienced similar and can provide more info. If there is no input after a while, feel free to close the bug.

This isn't a blocker or big deal for us. Just wanted to put it out there in case others are seeing it.

@pki-bot pki-bot added this to the 10.3.0 milestone Oct 3, 2020
@pki-bot pki-bot closed this as completed Oct 3, 2020
@pki-bot
Copy link
Author

pki-bot commented Oct 3, 2020

Comment from mharmsen (@mharmsen) at 2015-07-13 23:51:34

Per CS/DS meeting of 07/13/2015: 10.3

@pki-bot
Copy link
Author

pki-bot commented Oct 3, 2020

Comment from dminnich (@dminnich) at 2015-07-23 17:51:12

attachment
rhds_crash.tar.gz

@pki-bot
Copy link
Author

pki-bot commented Oct 3, 2020

Comment from dminnich (@dminnich) at 2015-07-23 17:51:26

I just saw this happen again.
The setup was like this. Each entity is a seperate machine.
Master ca01 -> ldap01
Clone ca02 -> ldap02
Serveral kras, ocsps and a 3rd CA also existed, but I don't think any of that is relevant.

A full install of these components had taken place in the past and was working fine.
I decided to do a re-insall with a new version of RHCS. To do that I pkidestory'ed and yum removed everything.

I then yum installed the latest release and pkispawn'ed on ca01 and ca02. The pkispawn on ca02 failed because rhds on ldap02 went down. I had not touched the LDAP server between the uninstall and re-install process. And I noticed that RHDS on ldap02 was in fact running before I issued the pkispawn on ca02. So something in the pkispawn of a re-install of a clone CA seems to kill RHDS.

Note that this is happening with pki-ca-10.2.6-2 and redhat-ds-base-10.0.0-1.el7dsrv.x86_64.

Attached are:
pkispawn config of the clone
pkispawn log of the clone
debug log of the clone
access logs for rhds on ldap02
error logs for rhds on ldap02

both the rhcs debug log and the rhds error log talk about vlv.

It almost looks like RHCS tells RHDS to delete some data so that it can import it again. Problem is RHDS shuts down to delete the data so the RHCS install never finishes.
One thing I will mention that is if RHDS is supposed to go down and bring itself back up, the way that we are calling pkispawn through puppet may not be allowing a long enough wait time for this to happen. I'd try to test this theory by using pkispawn directly, but I can't get this to happen often enough or know the exact steps to do so.

RHDS:
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Search (caRenewal-pki-tomcat).
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Index.
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Index.
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Search (caRevocation-pki-tomcat).
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Search (caRevocation-pki-tomcat).
[23/Jul/2015:15:10:40 +0000] - ldbm: Bringing intca02.pki.qa.int.phx1.redhat.com offline...

RHCS:
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: initializing with mininum 3 and maximum 15 connections to host intca02.ldap.qa.int.phx1.redhat.com port 636, secure connection, true, authentication type 1
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: increasing minimum connections by 3
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: new total available connections 3
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: new number of connections 3
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: In LdapBoundConnFactory::getConn()
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: masterConn is connected: true
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: getConn: conn is connected true
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: getConn: mNumConns now 2
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS: param=preop.internaldb.post_ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS(): ldif file = /usr/share/pki/ca/conf/vlv.ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS(): ldif file copy to /var/lib/pki/pki-tomcat/ca/conf/vlv.ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS(): LDAP Errors in importing /var/lib/pki/pki-tomcat/ca/conf/vlv.ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config:netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allExpiredCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allInvalidCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allInValidCertsNotBefore-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config:netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allNonRevokedCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

@pki-bot
Copy link
Author

pki-bot commented Oct 3, 2020

@pki-bot
Copy link
Author

pki-bot commented Oct 3, 2020

Comment from dminnich (@dminnich) at 2017-02-27 14:07:42

Metadata Update from @dminnich:

  • Issue set to the milestone: 10.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant