New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHDS rarely crashes/shuts down somewhere in a pkidestroy and pkispawn workflow when re-provisioning slaves #2030
Comments
Comment from mharmsen (@mharmsen) at 2015-07-13 23:51:34 Per CS/DS meeting of 07/13/2015: 10.3 |
Comment from dminnich (@dminnich) at 2015-07-23 17:51:12 attachment |
Comment from dminnich (@dminnich) at 2015-07-23 17:51:26 I just saw this happen again. A full install of these components had taken place in the past and was working fine. I then yum installed the latest release and pkispawn'ed on ca01 and ca02. The pkispawn on ca02 failed because rhds on ldap02 went down. I had not touched the LDAP server between the uninstall and re-install process. And I noticed that RHDS on ldap02 was in fact running before I issued the pkispawn on ca02. So something in the pkispawn of a re-install of a clone CA seems to kill RHDS. Note that this is happening with pki-ca-10.2.6-2 and redhat-ds-base-10.0.0-1.el7dsrv.x86_64. Attached are: both the rhcs debug log and the rhds error log talk about vlv. It almost looks like RHCS tells RHDS to delete some data so that it can import it again. Problem is RHDS shuts down to delete the data so the RHCS install never finishes. RHDS: RHCS: [23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allExpiredCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1) [23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allInvalidCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1) [23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allInValidCertsNotBefore-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config:netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1) [23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allNonRevokedCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1) |
Comment from mharmsen (@mharmsen) at 2015-10-20 20:36:22 From IRC conversation of 10/20/2015: Filed 389 TRAC Ticket 48315 - RHDS rarely crashes/shuts down somewhere in a pkidestroy and pkispawn workflow when re-provisioning slaves; closing this ticket |
Comment from dminnich (@dminnich) at 2017-02-27 14:07:42 Metadata Update from @dminnich:
|
This issue was migrated from Pagure Issue #1471. Originally filed by dminnich (@dminnich) on 2015-07-09 16:47:23:
I've had this happen to me twice total out of the hundreads of installs I've done.
The infrastructure is Master CA with RHDS running on a separate machine. Slave CA with RHDS running on a separate machine. 4 total machines or unique instances. What I recall happening is being unhappy with the slave install for some reason. Then doing a pkidestory, then doing a pkispawn on the slave. The pkisapwn never completes because it is unable to contact the LDAP server to setup the replication agreement. When I login to the RHDS node that the slave is pointing at, sure enough RHDS is no longer running. Once I start it back up and run pkispawn on the slave again things work as they should.
I get the feeling that it may have something to do with the replication agreements but I can't reproduce it reliably enough and haven't spent the time digging in the logs to prove it. I'm not sure if removing replication agreement or trying to create the replication causes the crash.
Its also possible that its something else or I'm doing something weird to cause the problem, but I'm never interacting directly with the RHDS box and nothing else is doing any LDAP operations against it, so it definitely seems like something in RHCS is causing the problem.
Should the problem occur again or if I figure out how to reproduce it reliably, I'll flush out this bug some more. Otherwise, I'm curious if other people might chime in and say they've experienced similar and can provide more info. If there is no input after a while, feel free to close the bug.
This isn't a blocker or big deal for us. Just wanted to put it out there in case others are seeing it.
The text was updated successfully, but these errors were encountered: