Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

389-ds-base-1.3.2.21-1.fc20 crashed over the weekend #1202

Closed
389-ds-bot opened this issue Sep 12, 2020 · 12 comments
Closed

389-ds-base-1.3.2.21-1.fc20 crashed over the weekend #1202

389-ds-bot opened this issue Sep 12, 2020 · 12 comments
Labels
closed: fixed Migration flag - Issue
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47871


On Aug 8th I run installed FreeIPA server with 389-ds-base-1.3.2.21-1.fc20 which finished OK. The instance was left running when it suddenly crashed over the weekend:

# systemctl status dirsrv@MKOSEK-FEDORA20-TEST.service 
dirsrv@MKOSEK-FEDORA20-TEST.service - 389 Directory Server MKOSEK-FEDORA20-TEST.
   Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled)
   Active: failed (Result: signal) since Sun 2014-08-10 11:43:40 CEST; 1 day 21h ago
 Main PID: 7656 (code=killed, signal=SEGV)

Aug 10 11:13:11 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 2
Aug 10 11:13:11 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 3
Aug 10 11:28:11 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 1
Aug 10 11:28:12 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 2
Aug 10 11:28:12 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 3
Aug 10 11:43:12 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 1
Aug 10 11:43:13 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 2
Aug 10 11:43:13 ipa.mkosek-fedora20.test ns-slapd[7656]: GSSAPI server step 3
Aug 10 11:43:40 ipa.mkosek-fedora20.test systemd[1]: dirsrv@MKOSEK-FEDORA20-TEST.service: main pro...EGV
Aug 10 11:43:40 ipa.mkosek-fedora20.test systemd[1]: Unit dirsrv@MKOSEK-FEDORA20-TEST.service ente...te.
Hint: Some lines were ellipsized, use -l to show in full.

Stacktrace attached.

@389-ds-bot 389-ds-bot added the closed: fixed Migration flag - Issue label Sep 12, 2020
@389-ds-bot 389-ds-bot added this to the 1.3.2.23 milestone Sep 12, 2020
@389-ds-bot
Copy link
Author

Comment from mkosek at 2014-08-12 13:34:09

attachment
stacktrace.1407828638.txt.gz

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2014-08-12 14:23:09

it seems the crash is happening because getting the entry for a changenumber returns a corrputed entry, changelog trimming doesn't check for errors in that place.
Is the error log still available, does the server start and could you perform a search on cn=changelog ?

@389-ds-bot
Copy link
Author

Comment from mkosek at 2014-08-12 14:46:15

No, in the error log, I only see this last entry in /var/log/dirsrv/slapd-MKOSEK-FEDORA20-TEST/errors.20140804-102138:

[08/Aug/2014:11:53:02 +0200] ipapwd_pre_bind - [file prepost.c, line 1437]: kerberos principal in uid=tuser1,cn=users,cn=accounts,dc=mkosek-fedora20,dc=test is expired
[08/Aug/2014:11:53:13 +0200] - Entry "uid=tuser1,cn=users,cn=accounts,dc=mkosek-fedora20,dc=test" -- attribute "random" not allowed

and then just starting the new DS in /var/log/dirsrv/slapd-MKOSEK-FEDORA20-TEST/errors, though with a database error:

        389-Directory/1.3.2.21 B2014.219.1850
        ipa.mkosek-fedora20.test:636 (/etc/dirsrv/slapd-MKOSEK-FEDORA20-TEST)

[12/Aug/2014:10:41:36 +0200] - 389-Directory/1.3.2.21 B2014.219.1850 starting up
[12/Aug/2014:10:41:36 +0200] - WARNING: changelog: entry cache size 2097152B is less than db size 9732096B; We recommend to increase the entry cache size nsslapd-cachememsize.
[12/Aug/2014:10:41:36 +0200] - Detected Disorderly Shutdown last time Directory Server was running, recovering database.
[12/Aug/2014:10:41:38 +0200] schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=mkosek-fedora20,dc=test
[12/Aug/2014:10:41:38 +0200] schema-compat-plugin - warning: no entries set up under cn=ng, cn=compat,dc=mkosek-fedora20,dc=test
[12/Aug/2014:10:41:38 +0200] schema-compat-plugin - warning: no entries set up under ou=sudoers,dc=mkosek-fedora20,dc=test
[12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target ou=sudoers,dc=mkosek-fedora20,dc=test does not exist
[12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=ad,cn=etc,dc=mkosek-fedora20,dc=test does not exist
[12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=mkosek-fedora20,dc=test does not exist
[12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=mkosek-fedora20,dc=test does not exist
[12/Aug/2014:10:41:38 +0200] NSACLPlugin - The ACL target cn=automember rebuild membership,cn=tasks,cn=config does not exist
[12/Aug/2014:10:41:38 +0200] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=mkosek-fedora20,dc=test--no CoS Templates found, which should be added before the CoS Definition.
[12/Aug/2014:10:41:38 +0200] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=mkosek-fedora20,dc=test--no CoS Templates found, which should be added before the CoS Definition.
[12/Aug/2014:10:41:38 +0200] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[12/Aug/2014:10:41:38 +0200] - Listening on All Interfaces port 636 for LDAPS requests
[12/Aug/2014:10:41:38 +0200] - Listening on /var/run/slapd-MKOSEK-FEDORA20-TEST.socket for LDAPI requests
[12/Aug/2014:10:42:09 +0200] - database error -30993
[12/Aug/2014:10:42:09 +0200] - database error -30993

DS did not crash during ldapsearch:

# ldapsearch -D "cn=Directory Manager" -x -w "kokos123" -b cn=changelog
...
# 7837, changelog
dn: changenumber=7837,cn=changelog
objectClass: top
objectClass: changelogentry
objectClass: extensibleObject
targetuniqueid: 72c4d509-1edf11e4-9b00fe62-30505311
changeNumber: 7837
targetDn: fqdn=ipa.mkosek-fedora20.test,cn=computers,cn=accounts,dc=mkosek-fed
 ora20,dc=test
changeTime: 20140812084259Z
changeType: modify
changes:: cmVwbGFjZToga3JiTGFzdFN1Y2Nlc3NmdWxBdXRoCmtyYkxhc3RTdWNjZXNzZnVsQXV0
 aDogMjAxNDA4MTIwODQzMDBaCi0KcmVwbGFjZTogbW9kaWZpZXJzbmFtZQptb2RpZmllcnNuYW1lO
 iBjbj1JUEEgTG9ja291dCxjbj1wbHVnaW5zLGNuPWNvbmZpZwotCnJlcGxhY2U6IG1vZGlmeXRpbW
 VzdGFtcAptb2RpZnl0aW1lc3RhbXA6IDIwMTQwODEyMDg0MjU5WgotCnJlcGxhY2U6IGVudHJ5dXN
 uCmVudHJ5dXNuOiAyMjI4MAotCgA=

# search result
search: 2
result: 0 Success

# numResponses: 31
# numEntries: 30

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2014-08-13 00:03:50

I wonder if there is not a bug in handle_getchangerecord_search. This callback function stores the entry return by the search into a temporary structure (crt->crt_entry). When returning from the internal search, get_changerecord returns that entry.

According to https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.1/html/Plug-in_Guide/Plugin_Programming_Guide-Function_Reference-slapi_search_internal_callback_pb.html, if we want to access the entry outside of the callback we need to do slapi_entry_dup.

@389-ds-bot
Copy link
Author

@389-ds-bot
Copy link
Author

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2014-08-26 16:30:33

I set up 3 freeipa instances on F20, two with the fix, one without.

Ran freeipa unit tests, then let them idle (this was the conditions when the crashed occurred). To increase the probability of crash, it trims the retro changelog each 30s and it evaluates ~ 5000 entries at each trimming.

dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
nsslapd-changelogmaxage: 8d
nsslapd-changelog-trim-interval: 30s

The tests started Friday 22nd. No crash after 3 days.

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2014-08-27 23:14:14

Looks good to me.

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2014-08-27 23:33:27

Push to '''master'''

git merge ticket47871
Updating 1c888bf..495151b
Fast-forward
dirsrvtests/tickets/ticket47871_test.py | 306 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ldap/servers/plugins/retrocl/retrocl.h | 2 +-
ldap/servers/plugins/retrocl/retrocl_trim.c | 163 +++++++++++++++++++++++--------------------
3 files changed, 393 insertions(+), 78 deletions(-)
create mode 100644 dirsrvtests/tickets/ticket47871_test.py

git push origin master
Counting objects: 20, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (11/11), 5.29 KiB, done.
Total 11 (delta 7), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
1c888bf..495151b master -> master

commit 495151b
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Thu Aug 21 11:49:18 2014 +0200

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2014-08-27 23:41:59

Push to '''1.3.2'''

git cherry-pick 495151b
[389-ds-base-1.3.2 47fbe25] Ticket 47871 - 389-ds-base-1.3.2.21-1.fc20 crashed over the weekend
3 files changed, 393 insertions(+), 78 deletions(-)
create mode 100644 dirsrvtests/tickets/ticket47871_test.py

git push origin 389-ds-base-1.3.2
Counting objects: 20, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 5.17 KiB, done.
Total 11 (delta 7), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
f237812..47fbe25 389-ds-base-1.3.2 -> 389-ds-base-1.3.2

commit 47fbe25
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Thu Aug 21 11:49:18 2014 +0200

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2016-09-09 13:25:42

adding the crashing thread backstack to help to retrieve that bug

Thread 1 (Thread 0x7f83a2cb2700 (LWP 30103)):
0  attrlist_find (a=0x105, type=0x7f8395818716 "changetime") at ldap/servers/slapd/attrlist.c:144
1  0x00007f83a286bec2 in slapi_entry_attr_find (e=0x105, type=0x7f8395818716 "changetime", a=0x7f83a2cb1d58) at ldap/servers/slapd/entry.c:2362
2  0x00007f83958176b0 in trim_changelog () at ldap/servers/plugins/retrocl/retrocl_trim.c:287
3  changelog_trim_thread_fn (arg=<optimized out>) at ldap/servers/plugins/retrocl/retrocl_trim.c:348
4  0x00007f83a0c9be5b in _pt_root (arg=0x7f837c164fb0) at ../../../nspr/pr/src/pthreads/ptthread.c:212
5  0x00007f83a063bf33 in start_thread (arg=0x7f83a2cb2700) at pthread_create.c:309
6  0x00007f83a0369ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111    

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2017-02-11 23:02:21

Metadata Update from @nhosoi:

  • Issue assigned to tbordaz
  • Issue set to the milestone: 1.3.2.23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: fixed Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

1 participant