Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipa master system initiated more than a dozen simultaneous replication sessions, shut itself down and wiped out its db #442

Closed
389-ds-bot opened this issue Sep 12, 2020 · 2 comments
Labels
closed: duplicate Migration flag - Issue

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/442


https://bugzilla.redhat.com/show_bug.cgi?id=852202 (Red Hat Enterprise Linux 6)

Description of problem:
Ipa ldap shutdown itself while running, admin unable to restart.  Dev indicates
the system initiated more than a dozen simultaneous replication sessions where
only 1 should be running.  Db was wiped pretty clean.  Currently unable to
recover the server with ipa-replica-manage re-initialize.

Test Env:
2 Ipa Masters
2 Ipa Clients

What was Active:
-Sudo Client running 10 runtime allowes sudo command threads, each thread with
5sec delay.
-One admin client creating 10 sudo rules with 5 min delay after echa rule
-2 UI were currently up, mo load applied via the Ipa UI

Version-Release number of selected component (if applicable):
ipa-server-2.2.0-16.el6.x86_64
389-ds-base-1.2.10.2-19.el6_3.x86_64

How reproducible:
intermittent, have not yet reproduced...

Symptoms:

* Sudo client load began failing, Can't contact LDAP server
* UI not connecting, kinit failing
* Admin load not connecting, kinit failing

*Could not restart via ipactl:
[root@sti-high-1 slapd-TESTRELM-COM]# ipactl start
Starting Directory Service
Starting dirsrv:
    PKI-IPA...[  OK  ]
    TESTRELM-COM...[  OK  ]
Failed to read data from Directory Service: Failed to get list of services to
probe status!
Configured hostname 'sti-high-1.testrelm.com' does not match any master server
in LDAP:
No master found because of error: {'matched': 'dc=testrelm,dc=com', 'desc': 'No
such object'}
Shutting down
Shutting down dirsrv:
    PKI-IPA...[  OK  ]
    TESTRELM-COM...[  OK  ]



***/var/log/dirsrv/slapd-TESTRELM-COM/errors ,last few messages as it went
down...

[27/Aug/2012:12:32:21 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Warning: unable to send
endReplication extended operation (Timed out)
[27/Aug/2012:12:32:26 -0400] NSMMReplicationPlugin -
agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with
GSSAPI auth resumed
[27/Aug/2012:12:35:45 -0400] NSMMReplicationPlugin -
multimaster_be_state_change: replica dc=testrelm,dc=com is going offline;
disabling replication
[27/Aug/2012:12:36:11 -0400] NSMMReplicationPlugin -
replica_replace_ruv_tombstone: failed to update replication update vector for
replica dc=testrelm,dc=com: LDAP error - 1
[27/Aug/2012:12:36:12 -0400] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access the
database
[27/Aug/2012:12:36:32 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:36:52 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:37:12 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:37:32 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:37:52 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:38:12 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:38:32 -0400] - import userRoot: Processed 0 entries -- average
rate 0.0/sec, recent rate 0.0/sec, hit ratio 0%
[27/Aug/2012:12:38:46 -0400] - ERROR bulk import abandoned
[27/Aug/2012:12:38:46 -0400] - import userRoot: Aborting all Import threads...
[27/Aug/2012:12:38:52 -0400] - import userRoot: Import threads aborted.
[27/Aug/2012:12:38:52 -0400] - import userRoot: Closing files...
[27/Aug/2012:12:38:52 -0400] - import userRoot: Import failed.


*Ldap db was wiped out on one of the Ipa Master servers
[root@sti-high-1 slapd-TESTRELM-COM]# ldapsearch -x -D "cn=Directory Manager"
-w Secret123 -b "dc=testrelm,dc=com"
# extended LDIF
#
# LDAPv3
# base <dc=testrelm,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# compat, testrelm.com
dn: cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: compat

# groups, compat, testrelm.com
dn: cn=groups,cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: groups

# ng, compat, testrelm.com
dn: cn=ng,cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: ng

# users, compat, testrelm.com
dn: cn=users,cn=compat,dc=testrelm,dc=com
objectClass: extensibleObject
cn: users

# sudoers, testrelm.com
dn: ou=sudoers,dc=testrelm,dc=com
objectClass: extensibleObject
ou: sudoers

# search result
search: 2
result: 32 No such object

# numResponses: 6
# numEntries: 5


*The other ipa master server info:
[root@sti-high-2 ~]# ldapsearch -x -D "cn=Directory Manager" -w Secret123 -b
"cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com"
# extended LDIF
#
# LDAPv3
# base <cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# masters, ipa, etc, testrelm.com
dn: cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com
cn: masters
objectClass: nsContainer
objectClass: top

# sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com
cn: sti-high-1.testrelm.com
objectClass: top
objectClass: nsContainer

# CA, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=CA,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=c
 om
cn: CA
ipaConfigString: enabledService
ipaConfigString: startOrder 50
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# KDC, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KDC,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
cn: KDC
ipaConfigString: enabledService
ipaConfigString: startOrder 10
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# KPASSWD, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KPASSWD,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm
 ,dc=com
cn: KPASSWD
ipaConfigString: enabledService
ipaConfigString: startOrder 20
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# MEMCACHE, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=MEMCACHE,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrel
 m,dc=com
cn: MEMCACHE
ipaConfigString: enabledService
ipaConfigString: startOrder 39
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# HTTP, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=HTTP,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc
 =com
cn: HTTP
ipaConfigString: enabledService
ipaConfigString: startOrder 40
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# DNS, sti-high-1.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=DNS,cn=sti-high-1.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
cn: DNS
ipaConfigString: enabledService
ipaConfigString: startOrder 30
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top

# sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=com
objectClass: top
objectClass: nsContainer
cn: sti-high-2.testrelm.com

# KDC, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KDC,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 10
cn: KDC

# KPASSWD, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=KPASSWD,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm
 ,dc=com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 20
cn: KPASSWD

# MEMCACHE, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=MEMCACHE,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrel
 m,dc=com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 39
cn: MEMCACHE

# HTTP, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=HTTP,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc
 =com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 40
cn: HTTP

# DNS, sti-high-2.testrelm.com, masters, ipa, etc, testrelm.com
dn: cn=DNS,cn=sti-high-2.testrelm.com,cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=
 com
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
ipaConfigString: enabledService
ipaConfigString: startOrder 30
cn: DNS

# search result
search: 2
result: 0 Success

# numResponses: 15
# numEntries: 14


Additional Dev Comments:
Rich Megginson's chat comments indicate the issue looks like a combination of;
https://fedorahosted.org/389/ticket/374
https://fedorahosted.org/freeipa/ticket/2842
@389-ds-bot 389-ds-bot added the closed: duplicate Migration flag - Issue label Sep 12, 2020
@389-ds-bot
Copy link
Author

Comment from nkinder (@nkinder) at 2012-09-25 02:43:09

This is a duplicate of ticket 374.

@389-ds-bot
Copy link
Author

Comment from nkinder (@nkinder) at 2017-02-11 22:54:58

Metadata Update from @nkinder:

  • Issue assigned to richm
  • Issue set to the milestone: N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: duplicate Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

1 participant