Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanallruv hangs shutdown if not all replicas online #1548

Closed
389-ds-bot opened this issue Sep 12, 2020 · 13 comments
Closed

cleanallruv hangs shutdown if not all replicas online #1548

389-ds-bot opened this issue Sep 12, 2020 · 13 comments
Labels
closed: fixed Migration flag - Issue
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/48217


There are race conditions in some of the cleanallruv code where we can go to sleep without checking if the server is shutting down. Like when checking if replicas are online:

Thread 2 (Thread 0x7f2a4efe5700 (LWP 29721)):
0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:218
1  0x00007f2a86c73217 in pt_TimedWait (cv=cv@entry=0x7f2a8ab43e78, ml=0x7f2a8ab72e20, timeout=timeout@entry=320000) at ../../../nspr/pr/src/pthreads/ptsynch.c:260
2  0x00007f2a86c736de in PR_WaitCondVar (cvar=0x7f2a8ab43e70, timeout=320000) at ../../../nspr/pr/src/pthreads/ptsynch.c:387
3  0x00007f2a7e8282dc in check_agmts_are_alive (replica=0x7f2a8ab8ee40, rid=300, task=0x7f29fc0111f0) at ldap/servers/plugins/replication/repl5_replica_config.c:2275
4  0x00007f2a7e827015 in replica_cleanallruv_thread (arg=0x7f29fc010bc0) at ldap/servers/plugins/replication/repl5_replica_config.c:1816
5  0x00007f2a86c78b46 in _pt_root (arg=0x7f29fc014950) at ../../../nspr/pr/src/pthreads/ptthread.c:204
6  0x00007f2a8661bd14 in start_thread (arg=0x7f2a4efe5700) at pthread_create.c:309
7  0x00007f2a8613968d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7f2a88d35800 (LWP 29195)):
0  0x00007f2a861329f3 in select () at ../sysdeps/unix/syscall-template.S:82
1  0x00007f2a888df81f in DS_Sleep (ticks=100) at ldap/servers/slapd/util.c:1035
2  0x00007f2a7e8277d0 in replica_cleanall_ruv_destructor (task=0x7f29fc0111f0) at ldap/servers/plugins/replication/repl5_replica_config.c:1993
3  0x00007f2a888d4396 in destroy_task (when=0, arg=0x7f29fc0111f0) at ldap/servers/slapd/task.c:621
4  0x00007f2a888d96c5 in task_shutdown () at ldap/servers/slapd/task.c:2539
5  0x00007f2a88d86537 in slapd_daemon (ports=0x7fffac8b2e90) at ldap/servers/slapd/daemon.c:1387
6  0x00007f2a88d8f05d in main (argc=7, argv=0x7fffac8b2fc8) at ldap/servers/slapd/main.c:1115
@389-ds-bot 389-ds-bot added the closed: fixed Migration flag - Issue label Sep 12, 2020
@389-ds-bot 389-ds-bot added this to the 1.3.4.3 milestone Sep 12, 2020
@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2015-07-09 19:49:08

I don't think that the missing shutdown check before sleeping is a big deal, it will be checked at every iteration in the while() conditions, so there is only a small window you miss, but after sleep the loop will be termimnated. Although an extra check befoe sleeping doesn't hurt.

The problem was the missing stop_ruv_cleaning() calls, that's ok now.

But can the stop_ruv_cleaning() in multimaster_stop be removed ? we could stop the plugin without shutdown (in theory).

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2015-07-09 19:57:32

Replying to [comment:2 elkris]:

I don't think that the missing shutdown check before sleeping is a big deal, it will be checked at every iteration in the while() conditions, so there is only a small window you miss, but after sleep the loop will be termimnated. Although an extra check befoe sleeping doesn't hurt.

I was just being overly cautious :-)

The problem was the missing stop_ruv_cleaning() calls, that's ok now.

Correct.

But can the stop_ruv_cleaning() in multimaster_stop be removed ? we could stop the plugin without shutdown (in theory).

I'm not sure we need the plugin running for cleanallruv to finish, but I can add it back(it doesn't hurt). New patch in the works...

@389-ds-bot
Copy link
Author

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2015-07-09 20:01:04

New patch attached...

@389-ds-bot
Copy link
Author

Comment from rmeggins (@richm) at 2015-07-09 20:18:18

The problem is if slapd is shutting down while it is waiting on a condvar. There needs to be a way that something can detect shutdown and do a notifycondvar to wake up those waits immediately upon shutdown.

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2015-07-09 20:25:07

Replying to [comment:6 richm]:

The problem is if slapd is shutting down while it is waiting on a condvar. There needs to be a way that something can detect shutdown and do a notifycondvar to wake up those waits immediately upon shutdown.

yes, but Mark's fix does it now. Shutdown was hanging in replica_cleanall_ruv_destructor() and this now calls stop_ruv_cleaning()

@389-ds-bot
Copy link
Author

Comment from rmeggins (@richm) at 2015-07-09 20:36:52

Replying to [comment:7 elkris]:

Replying to [comment:6 richm]:

The problem is if slapd is shutting down while it is waiting on a condvar. There needs to be a way that something can detect shutdown and do a notifycondvar to wake up those waits immediately upon shutdown.

yes, but Mark's fix does it now. Shutdown was hanging in replica_cleanall_ruv_destructor() and this now calls stop_ruv_cleaning()

ok

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2015-07-09 22:00:21

fdf4681..d6269f2 master -> master
commit d6269f2
Author: Mark Reynolds mreynolds389@redhat.com
Date: Thu Jul 9 09:59:46 2015 -0400

41dff5b..0bb881a 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit 0bb881a

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2015-07-10 05:46:17

Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1241723

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2015-09-18 21:32:23

This fix introduced a regression. When the server was stopped during a clean task, the task would think everyone was cleaned(when in fact they were not). Need to properly detect the shutdown at the end of the task.

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2015-09-18 22:00:12

Fix regression with server shutdown
0001-Ticket-48217-cleanallruv-fix-regression-with-server-.patch

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2015-09-18 23:27:49

a8130ab..c41d36d master -> master
commit c41d36d
Author: Mark Reynolds mreynolds389@redhat.com
Date: Fri Sep 18 11:56:29 2015 -0400

8cd4f45..d9f03f5 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit d9f03f5

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-02-11 23:08:48

Metadata Update from @mreynolds389:

  • Issue assigned to mreynolds389
  • Issue set to the milestone: 1.3.4.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: fixed Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

1 participant