clock thread safety: was hr etime by default #2375

389-ds-bot · 2020-09-13T02:20:42Z

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49316

Created at 2017-07-07 07:04:01 by firstyear (@Firstyear)
Closed at 2017-07-13 05:47:28 as fixed
Assigned to firstyear (@Firstyear)

Issue Description

During an investigation it became apparent that DS clock handling was not thread safe. In an attempt to "speed up" clock access, the architecture used a time thread that would wake every second (or more) and update a global value in time.c. However, this value was not atomically updated, or atomically read. To update this to use atomics would likely cause a performance regression. Worse, these values were relied upon by csngen to generate csns for replication.

With hr log timestamps we showed that ns precision is virtually free (if not faster) on linux today. With this in mind, there is no reason not to enable HR etime by default now. As well, by using clock_gettime in calls to the clock, we take advantage of VDSO's in clock_gettime, that are going to be faster than atomics. Finally, we can use monotonic clock for timers, and utc for all other calls to remove performance and timezone issues.

389-ds-bot · 2020-09-13T02:20:44Z

Comment from firstyear (@Firstyear) at 2017-07-07 07:04:07

Metadata Update from @Firstyear:

Issue assigned to Firstyear

389-ds-bot · 2020-09-13T02:20:45Z

Comment from firstyear (@Firstyear) at 2017-07-10 03:41:31

Metadata Update from @Firstyear:

Custom field type adjusted to defect
Issue priority set to: major
Issue set to the milestone: 1.3.7.0
Issue tagged with: Complex

389-ds-bot · 2020-09-13T02:20:45Z

Comment from firstyear (@Firstyear) at 2017-07-10 03:51:20

Additionally, since it's been 24 years since clock_gettime was released, I'm making it a hard requirement of DS from now.

389-ds-bot · 2020-09-13T02:20:46Z

Comment from tbordaz (@tbordaz) at 2017-07-10 11:56:34

Regarding replication, we have only one time thread that periodically (each second) calls time(&currenttime). If the same thread calls time(&t2) after time(&t1), is there a chance that t1>t2 ?
If not then all DS components (including csn generation) that rely on the timethread are protected.

Now the others components that directly call time() are impacted and there are many of them.
I thought that time(2) was thread safe, so it is not ?

389-ds-bot · 2020-09-13T02:20:47Z

Comment from firstyear (@Firstyear) at 2017-07-10 12:06:26

time(2) is safe. the time thread is not. The write to the g_current_time was not atomic, and the read from current_time was also not atomic. As a result, it's possible for the update from the time thread to never be made visible to other threads (or more practically, it's delayed until the next barrier, and you have to hope it's a store one else our time thread write is blown away), and on the read side, we may never see the update to the time (or more practically, until by some accident we get a barrier elsewhere, assuming the write thread actually barriered as well). So as a result, this is not a safe interface.

Linux clock_gettime is implemented as a VDSO, so there is almost 0 latency to just calling clock_gettime, so instead my fix will wrap this.

389-ds-bot · 2020-09-13T02:20:48Z

Comment from firstyear (@Firstyear) at 2017-07-10 12:07:56

Ohh and even better @tbordaz - most of our timeouts were only accurate to a second, so if you happen to start an op with a time out of 1 second, at the time point "0.9" second, and then the op runs to "1.000001" second, you'll be timed out - even though you only were in the op for 0.1 second, the "timeout" sees the second change and says "nope"

389-ds-bot · 2020-09-13T02:20:49Z

Comment from firstyear (@Firstyear) at 2017-07-12 06:04:30

389-ds-bot · 2020-09-13T02:20:50Z

Comment from firstyear (@Firstyear) at 2017-07-12 06:04:31

Metadata Update from @Firstyear:

Custom field reviewstatus adjusted to review

389-ds-bot · 2020-09-13T02:20:50Z

Comment from firstyear (@Firstyear) at 2017-07-12 06:05:22

This patch has been extensively tested with the full test suite twice now.

389-ds-bot · 2020-09-13T02:20:51Z

Comment from mreynolds (@mreynolds389) at 2017-07-13 04:29:33

A few minor issues....

indentation issue:

 498 diff --git a/ldap/servers/plugins/chainingdb/cb_search.c b/ldap/servers/plugins/chainingdb/cb_search.c
 499 index dad2785..e2aa163 100644
 500 --- a/ldap/servers/plugins/chainingdb/cb_search.c
 501 +++ b/ldap/servers/plugins/chainingdb/cb_search.c
 502 @@ -40,9 +40,8 @@ chainingdb_build_candidate_list ( Slapi_PBlock *pb )
 503  »       const char          *target = NULL;
 504  »       char                *filter;
 505  »       char                **attrs = NULL;
 506 -»       struct timeval      timeout;
 507 -»       time_t              optime;
 508 -»       time_t              endbefore = 0;
 509 +»       struct timespec     expire_time;
 510 +»       struct timeval     timeout;
 511  »       time_t              endtime = 0;
 512  »       char                *matched_msg, *error_msg;
 513  »       char                **referrals = NULL;

Logging issue:

1631 @@ -4422,17 +4460,45 @@ static int checkpoint_threadmain(void *param)
1632                  if (!db || rc ) {
1633                      continue;
1634                  }
1635 -                slapi_log_err(SLAPI_LOG_BACKLDBM, "checkpoint_threadmain", "Compacting DB start: %s\n",
1636 +                slapi_log_err(SLAPI_LOG_NOTICE, "checkpoint_threadmain", "Compacting DB start: %s\n",
1637                                inst->inst_name);
1638 +
1639 +                /*
1640 +                 * It's possible for this to heap us after free because when we access db
1641 +                 * *just* as the server shut's down, we don't know it. So we should probably
1642 +                 * do something like wrapping access to the db var in a rwlock, and have "read"
1643 +                 * to access, and take writes to change the state. This would prevent the issue.
1644 +                 */
1645 +                DBTYPE type;
1646 +                rc = db->get_type(db, &type);
1647 +                if (rc) {
1648 +                    slapi_log_err(SLAPI_LOG_ERR,
1649 +                              "compactdb: failed to determine db type for %s: db error - %d %s\n",
1650 +                              inst->inst_name, rc, db_strerror(rc));

There is no subsystem specified for the logging call, it should be:

slapi_log_err(SLAPI_LOG_ERR, "checkpoint_threadmain",
                            "compactdb: failed to determine db type for %s: db error - %d %s\n",
                             inst->inst_name, rc, db_strerror(rc));

Indentation issue:

2677 +
2678 +  /*
2679 +   * If the duration set is larger than time_t max - current time, we probably have
2680 +   * made it to the heat death of the universe. Congratulations on finding this bug.
2681 +   */
2682 +  if ( errno == ERANGE || duration <= 0 || duration > (MAX_ALLOWED_TIME_IN_SECS_64 - slapi_current_utc_time()) ) {
2683 +    slapi_create_errormsg(errorbuf, SLAPI_DSE_RETURNTEXT_SIZE, "password lockout duration \"%s\" is invalid. ", value);
2684 +    retVal = LDAP_OPERATIONS_ERROR;
2685 +    return retVal;
2686    }

The rest looks good, and no compiler warnings!! I'll ack this, but please fix the above issues.

389-ds-bot · 2020-09-13T02:20:52Z

Comment from mreynolds (@mreynolds389) at 2017-07-13 04:29:34

Metadata Update from @mreynolds389:

Custom field reviewstatus adjusted to ack (was: review)

389-ds-bot · 2020-09-13T02:20:53Z

Comment from firstyear (@Firstyear) at 2017-07-13 05:01:57

I went through the patch 3 times to try and catch all these! Good spotting. I'll fix these any commit :)

Thanks so much,

389-ds-bot · 2020-09-13T02:20:54Z

Comment from mreynolds (@mreynolds389) at 2017-07-13 05:25:38

I went through the patch 3 times to try and catch all these! Good spotting. I'll fix these any commit :)

I swear I don't set out just looking for these indentation issues, but they just really stand out to me and it makes me cringe :-) I'm just glad I caught the missing logging subsystem though. Since it's a macro there is no compiler warning/error, but I'm not trying to reignite that conversation ;-) Anyway, vacation time for me, talk to you next week!

389-ds-bot · 2020-09-13T02:20:55Z

Comment from firstyear (@Firstyear) at 2017-07-13 05:47:36

Fixed all those issues, thanks again!

commit c70baf1d509f6dc144d839d71e874e3d59d72150
To ssh://git@pagure.io/389-ds-base.git
4940c5f..680a1a5 master -> master

389-ds-bot · 2020-09-13T02:20:56Z

Comment from firstyear (@Firstyear) at 2017-07-13 05:47:36

Metadata Update from @Firstyear:

Issue close_status updated to: fixed
Issue status updated to: Closed (was: Open)

389-ds-bot · 2020-09-13T02:20:56Z

Comment from tbordaz (@tbordaz) at 2017-07-13 13:53:36

@Firstyear just for curiosity, time() and clock_gettime() are thread safe. What is the reason the time thread was not thread safe ? is it because it updated/returned a global variable ?. If this is the reason would a fix like this one being thread safe ?

diff --git a/ldap/servers/slapd/time.c b/ldap/servers/slapd/time.c
index 16467af..8315dda 100644
--- a/ldap/servers/slapd/time.c
+++ b/ldap/servers/slapd/time.c
@@ -120,22 +120,13 @@ free_timestring(char *timestr)
 time_t
 poll_current_time()
 {
-    if ( !currenttime_set ) {
-       currenttime_set = 1;
-    }
-
-    time( &currenttime );
-    return( currenttime );
+    return 0;
 }
 
 time_t
 current_time( void )
 {
 -    if ( currenttime_set ) {
 -        return( currenttime );
 -    } else {
 -        return( time( (time_t *)0 ));
 -    }
+    return( time( (time_t *)0 ));
 }

389-ds-bot · 2020-09-13T02:20:57Z

Comment from firstyear (@Firstyear) at 2017-07-14 02:31:59

If you did not have current time set, this would be thread safe, but there is a caveat, and that is you and I the human's.

What is current_time? Thoughout the code I saw current time as:

utc
localtime
relative system time

People who used the api either had it change without knowing underthem, or they didn't know what they were "getting" and assumed. As well, we had a slapi_current_time wrapper that just wrapped the function in a (pointless) extra fn call.

So my solution was to explicitly name the replacement, and make it part of the slapi api so we just get exactly what we want. I'm hoping to avoid human error in the future.

389-ds-bot · 2020-09-13T02:20:58Z

Comment from lkrispen (@elkris) at 2017-07-17 13:21:08

When testing on top of current master I noticed that the event queue thread is consuming 100 % cpu.

Could you revisit your changes in the eventq code. There are loops calling functions ... and taking time, but you use curtime looked up before starting the loop, wheras the previous code was using current_time() inside the loop

389-ds-bot · 2020-09-13T02:20:59Z

Comment from lkrispen (@elkris) at 2017-07-17 13:44:30

I think there was an incorrect macro replacement applying the NOT only to the first and component, the following patch seems to fix it

389-ds-bot · 2020-09-13T02:21:00Z

Comment from firstyear (@Firstyear) at 2017-07-18 02:07:18

I think you're right. I'll apply this now (with my ack)

commit 1b95045
To ssh://git@pagure.io/389-ds-base.git
b95f5f2..1b95045 master -> master

389-ds-bot · 2020-09-13T02:21:01Z

Comment from lkrispen (@elkris) at 2017-07-24 22:06:13

Metadata Update from @elkris:

Custom field component adjusted to None
Custom field origin adjusted to None
Custom field version adjusted to None

389-ds-bot · 2020-09-13T02:21:02Z

Comment from lkrispen (@elkris) at 2017-07-24 22:07:43

this change broke checkpointing, fix attached

389-ds-bot · 2020-09-13T02:21:02Z

Comment from lkrispen (@elkris) at 2017-07-24 22:08:14

389-ds-bot · 2020-09-13T02:21:03Z

Comment from firstyear (@Firstyear) at 2017-07-25 00:52:14

ack

Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Security log buffering is disabled - Make mapping tree check more robust for case relates: 389ds#2375 Reviewed by: spichugi(Thanks!)

Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Security log buffering is disabled - Make mapping tree check more robust for case relates: #2375 Reviewed by: spichugi(Thanks!)

Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Make mapping tree check more robust for case relates: #2375 Reviewed by: spichugi(Thanks!)

mreynolds389 · 2023-06-23T20:18:29Z

0272d99..6981d08 389-ds-base-2.3 -> 389-ds-base-2.3
98644dc..835f12b 389-ds-base-2.2 -> 389-ds-base-2.2
6897dc5..f5112d9 389-ds-base-2.1 -> 389-ds-base-2.1
c06e225..ea3d4e8 389-ds-base-1.4.3 -> 389-ds-base-1.4.3

Description: Updated the healthcheck tests with nsslapd-accesslog-logbuffering settings so it does not report a warning. Also adjusted certificate tests so it covers days in leap year. Relates: 389ds#2375 Reviewed by: ???

@vashirov

Description: Updated the healthcheck tests with nsslapd-accesslog-logbuffering settings so it does not report a warning. Also adjusted certificate tests so it covers days in leap year. Relates: 389ds#2375 Reviewed by: @vashirov (Thanks!)

@vashirov

Description: Updated the healthcheck tests with nsslapd-accesslog-logbuffering settings so it does not report a warning. Also adjusted certificate tests so it covers days in leap year. Relates: #2375 Reviewed by: @vashirov (Thanks!)

Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Make mapping tree check more robust for case relates: 389ds/389-ds-base#2375 Reviewed by: spichugi(Thanks!)

389-ds-bot added closed: fixed Migration flag - Issue complex Fix is complex labels Sep 13, 2020

389-ds-bot added this to the 1.3.7.0 milestone Sep 13, 2020

389-ds-bot closed this as completed Sep 13, 2020

mreynolds389 mentioned this issue Jun 22, 2023

Issue 2375 - CLI - Healthcheck - revise and add new checks #5809

Merged

obaterspok mentioned this issue Jun 28, 2023

Increase sleep time to 250ms to not hog the cpu too much #5813

Closed

bsimonova mentioned this issue Jul 31, 2023

Issue 2375 - CLI - Healthcheck - revise and add new checks #5869

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clock thread safety: was hr etime by default #2375

clock thread safety: was hr etime by default #2375

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

mreynolds389 commented Jun 23, 2023

clock thread safety: was hr etime by default #2375

clock thread safety: was hr etime by default #2375

Comments

389-ds-bot commented Sep 13, 2020

Issue Description

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

mreynolds389 commented Jun 23, 2023