New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clock thread safety: was hr etime by default #2375
Comments
Comment from firstyear (@Firstyear) at 2017-07-07 07:04:07 Metadata Update from @Firstyear:
|
Comment from firstyear (@Firstyear) at 2017-07-10 03:41:31 Metadata Update from @Firstyear:
|
Comment from firstyear (@Firstyear) at 2017-07-10 03:51:20 Additionally, since it's been 24 years since clock_gettime was released, I'm making it a hard requirement of DS from now. |
Comment from tbordaz (@tbordaz) at 2017-07-10 11:56:34 Regarding replication, we have only one time thread that periodically (each second) calls time(¤ttime). If the same thread calls time(&t2) after time(&t1), is there a chance that t1>t2 ? Now the others components that directly call time() are impacted and there are many of them. |
Comment from firstyear (@Firstyear) at 2017-07-10 12:06:26 time(2) is safe. the time thread is not. The write to the g_current_time was not atomic, and the read from current_time was also not atomic. As a result, it's possible for the update from the time thread to never be made visible to other threads (or more practically, it's delayed until the next barrier, and you have to hope it's a store one else our time thread write is blown away), and on the read side, we may never see the update to the time (or more practically, until by some accident we get a barrier elsewhere, assuming the write thread actually barriered as well). So as a result, this is not a safe interface. Linux clock_gettime is implemented as a VDSO, so there is almost 0 latency to just calling clock_gettime, so instead my fix will wrap this. |
Comment from firstyear (@Firstyear) at 2017-07-10 12:07:56 Ohh and even better @tbordaz - most of our timeouts were only accurate to a second, so if you happen to start an op with a time out of 1 second, at the time point "0.9" second, and then the op runs to "1.000001" second, you'll be timed out - even though you only were in the op for 0.1 second, the "timeout" sees the second change and says "nope" |
Comment from firstyear (@Firstyear) at 2017-07-12 06:04:30 |
Comment from firstyear (@Firstyear) at 2017-07-12 06:04:31 Metadata Update from @Firstyear:
|
Comment from firstyear (@Firstyear) at 2017-07-12 06:05:22 This patch has been extensively tested with the full test suite twice now. |
Comment from mreynolds (@mreynolds389) at 2017-07-13 04:29:33 A few minor issues.... indentation issue:
Logging issue:
There is no subsystem specified for the logging call, it should be:
Indentation issue:
The rest looks good, and no compiler warnings!! I'll ack this, but please fix the above issues. |
Comment from mreynolds (@mreynolds389) at 2017-07-13 04:29:34 Metadata Update from @mreynolds389:
|
Comment from firstyear (@Firstyear) at 2017-07-13 05:01:57 I went through the patch 3 times to try and catch all these! Good spotting. I'll fix these any commit :) Thanks so much, |
Comment from mreynolds (@mreynolds389) at 2017-07-13 05:25:38
I swear I don't set out just looking for these indentation issues, but they just really stand out to me and it makes me cringe :-) I'm just glad I caught the missing logging subsystem though. Since it's a macro there is no compiler warning/error, but I'm not trying to reignite that conversation ;-) Anyway, vacation time for me, talk to you next week! |
Comment from firstyear (@Firstyear) at 2017-07-13 05:47:36 Fixed all those issues, thanks again! commit c70baf1d509f6dc144d839d71e874e3d59d72150 |
Comment from firstyear (@Firstyear) at 2017-07-13 05:47:36 Metadata Update from @Firstyear:
|
Comment from tbordaz (@tbordaz) at 2017-07-13 13:53:36 @Firstyear just for curiosity, time() and clock_gettime() are thread safe. What is the reason the time thread was not thread safe ? is it because it updated/returned a global variable ?. If this is the reason would a fix like this one being thread safe ?
|
Comment from firstyear (@Firstyear) at 2017-07-14 02:31:59 If you did not have current time set, this would be thread safe, but there is a caveat, and that is you and I the human's. What is current_time? Thoughout the code I saw current time as:
People who used the api either had it change without knowing underthem, or they didn't know what they were "getting" and assumed. As well, we had a slapi_current_time wrapper that just wrapped the function in a (pointless) extra fn call. So my solution was to explicitly name the replacement, and make it part of the slapi api so we just get exactly what we want. I'm hoping to avoid human error in the future. |
Comment from lkrispen (@elkris) at 2017-07-17 13:21:08 When testing on top of current master I noticed that the event queue thread is consuming 100 % cpu. Could you revisit your changes in the eventq code. There are loops calling functions ... and taking time, but you use curtime looked up before starting the loop, wheras the previous code was using current_time() inside the loop |
Comment from lkrispen (@elkris) at 2017-07-17 13:44:30 I think there was an incorrect macro replacement applying the NOT only to the first and component, the following patch seems to fix it |
Comment from firstyear (@Firstyear) at 2017-07-18 02:07:18 I think you're right. I'll apply this now (with my ack) commit 1b95045 |
Comment from lkrispen (@elkris) at 2017-07-24 22:06:13 Metadata Update from @elkris:
|
Comment from lkrispen (@elkris) at 2017-07-24 22:07:43 this change broke checkpointing, fix attached |
Comment from lkrispen (@elkris) at 2017-07-24 22:08:14 |
Comment from firstyear (@Firstyear) at 2017-07-25 00:52:14 ack |
Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Security log buffering is disabled - Make mapping tree check more robust for case relates: 389ds#2375 Reviewed by: spichugi(Thanks!)
Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Security log buffering is disabled - Make mapping tree check more robust for case relates: #2375 Reviewed by: spichugi(Thanks!)
Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Security log buffering is disabled - Make mapping tree check more robust for case relates: #2375 Reviewed by: spichugi(Thanks!)
Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Make mapping tree check more robust for case relates: #2375 Reviewed by: spichugi(Thanks!)
Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Make mapping tree check more robust for case relates: #2375 Reviewed by: spichugi(Thanks!)
Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Make mapping tree check more robust for case relates: #2375 Reviewed by: spichugi(Thanks!)
Description: Updated the healthcheck tests with nsslapd-accesslog-logbuffering settings so it does not report a warning. Also adjusted certificate tests so it covers days in leap year. Relates: 389ds#2375 Reviewed by: ???
Description: Updated the healthcheck tests with nsslapd-accesslog-logbuffering settings so it does not report a warning. Also adjusted certificate tests so it covers days in leap year. Relates: 389ds#2375 Reviewed by: @vashirov (Thanks!)
Description: Add check for - unauthorized binds are allowed - Access log buffering is disabled - Make mapping tree check more robust for case relates: 389ds/389-ds-base#2375 Reviewed by: spichugi(Thanks!)
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49316
Issue Description
During an investigation it became apparent that DS clock handling was not thread safe. In an attempt to "speed up" clock access, the architecture used a time thread that would wake every second (or more) and update a global value in time.c. However, this value was not atomically updated, or atomically read. To update this to use atomics would likely cause a performance regression. Worse, these values were relied upon by csngen to generate csns for replication.
With hr log timestamps we showed that ns precision is virtually free (if not faster) on linux today. With this in mind, there is no reason not to enable HR etime by default now. As well, by using clock_gettime in calls to the clock, we take advantage of VDSO's in clock_gettime, that are going to be faster than atomics. Finally, we can use monotonic clock for timers, and utc for all other calls to remove performance and timezone issues.
The text was updated successfully, but these errors were encountered: