New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow in performance counters #2048
Comments
Comment from nhosoi (@nhosoi) at 2017-02-11 23:10:11 Metadata Update from @nhosoi:
|
Comment from mreynolds (@mreynolds389) at 2017-03-13 17:49:24 Metadata Update from @mreynolds389:
|
Comment from mreynolds (@mreynolds389) at 2017-03-13 17:49:49 Metadata Update from @mreynolds389: |
Comment from mreynolds (@mreynolds389) at 2017-03-20 16:13:58 Metadata Update from @mreynolds389:
|
Comment from mreynolds (@mreynolds389) at 2017-03-20 18:22:29 |
Comment from mreynolds (@mreynolds389) at 2017-03-20 18:22:39 Metadata Update from @mreynolds389:
|
Comment from firstyear (@Firstyear) at 2017-03-21 09:14:40 Don't use slapi_long, just use uint64_t from inttypes.h please. We should avoid long long, as it's not deterministic. We don't need 1866 -#define NSPRIu64» "llu", because inttypes.h provides PRIu64 for string formats, so we should use this instead. For now, don't use __ATOMIC_RELAXED, __ATOMIC_RELAXED, use SEQ_CST, we can relax these as we go with tests to be sure of the effect. Avoid ,"%llu", use the PRI types from inttypes.h Sorry to be so difficult about this, I think if we are going to do this, we should get it right. Thanks for your time on this. |
Comment from mreynolds (@mreynolds389) at 2017-03-21 14:06:29
Neither is uint64_t - we already talked about this. We NEED to do this for printf'ing. That is the whole point of all of this - otherwise we get overflows or compiler errors.
Yeah I left that as is to keep the patch small - figured all of this could be replaced by another patch.
I was also wondering about __ATOMIC_SEQ_CST in slapi_counters.c. I saw that you changed this to atomic release/acquire in nunc-stans. Is that something we want to do here as well?
I can do that __PRI64_PREFIX + PRIu64 = "llu" - this needs testing though...
You're not, but I don't think you know what's going on exactly. If we don't use "%llu" for our logging functions/printfs we overflow. So although PRUint64 and uint64_t behave as unsigned long longs, the compiler thinks they are just unsigned longs. So to avoid the overflows we must use "%llu" for these types, but... this will cause compiler warnings about the wrong type being used because gcc thinks unint64_t is a unsigned long, etc. So to solve both the issues we have to cast. I would rather cast when we return the counter value, than cast every time we use printf formatting. Hence, slapi_long to make things consistent and easy. I'm all ears if you can tell me how to get this working without casting, AND without overflows & compiler warnings. Anyway I need to test inttypes.h string formats to see if it works as expected. Thanks. |
Comment from mreynolds (@mreynolds389) at 2017-03-21 16:58:52 I found the root cause of ALL of this... NSPR printf functions! The standard printf functions work correctly with uint64_t/PRUint64 (no overflows - no compiler warnings). This means we can remove slapi_long and all the casting. So I need to redo the entire fix, but this is good news and it will be significantly cleaner. |
Comment from mreynolds (@mreynolds389) at 2017-03-21 20:37:24 New patch |
Comment from firstyear (@Firstyear) at 2017-03-21 23:15:32 Ah hah! That would explain it. I think we should be reducing our reliance on NSPR, it adds a lot of overhead and it's relevance is not as high these days. I'm much happier with this patch: I think one thing is that we need to be sure we only read from this struct via the functions rather than the direct access, but I think this is a great start for now. |
Comment from firstyear (@Firstyear) at 2017-03-21 23:17:10 Metadata Update from @Firstyear:
|
Comment from mreynolds (@mreynolds389) at 2017-03-22 01:18:37 |
Comment from mreynolds (@mreynolds389) at 2017-03-22 01:18:46 Metadata Update from @mreynolds389:
|
Comment from firstyear (@Firstyear) at 2017-03-22 06:21:35 |
Comment from firstyear (@Firstyear) at 2017-03-22 06:21:45 Metadata Update from @Firstyear:
|
Comment from firstyear (@Firstyear) at 2017-03-22 06:21:55 Metadata Update from @Firstyear:
|
Comment from mreynolds (@mreynolds389) at 2017-03-22 17:23:09 You removed the atomic counter assembly code for 32bit archs on linux, so is __atomic_store_8() actually atomic on 32bit envs? Meaning does it use its "backup" internal mutex, or is it truly atomic? I don't want to remove functionality for code cleanliness. The rest looks fine :) |
Comment from firstyear (@Firstyear) at 2017-03-22 23:07:35 It depends on the platform. IIRC there is capability to do 64bit atomics on x86 systems in some cases, but I may be remembering incorrectly. Regardless, the whole point of this patch is to fix the overflow, which is a 32bit limit. The hardcoded inline asm only works on a 32bit int. So it's wrong anyway. This way we have a guarantee that:
This is exactly what we want :) |
Comment from mreynolds (@mreynolds389) at 2017-03-23 15:02:55 OK, ack |
Comment from mreynolds (@mreynolds389) at 2017-03-23 15:03:06 Metadata Update from @mreynolds389:
|
Comment from firstyear (@Firstyear) at 2017-03-24 00:37:25 commit 15ce646 |
Comment from firstyear (@Firstyear) at 2017-03-24 00:37:35 Metadata Update from @Firstyear:
|
Comment from firstyear (@Firstyear) at 2017-03-24 01:01:11 commit 1155d50 One line fix for missing return. |
Comment from mreynolds (@mreynolds389) at 2017-03-27 21:23:01 We have portability issues on RHEL ppc: libtool: link: gcc -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m32 -mcpu=power7 -mtune=power7 -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/pwdhash-bin ldap/servers/slapd/tools/pwdhash_bin-pwenc.o ./.libs/libslapd.so -L/usr/lib -ltcmalloc -lkrb5 -lk5crypto -lcom_err -lpcre -lpthread -lsystemd -lplc4 -lplds4 -lnspr4 -lssl3 -lnss3 -lsvrcore -lldap_r -llber -lsasl2 -ldl -Wl,-rpath -Wl,/usr/lib/dirsrv |
Comment from mreynolds (@mreynolds389) at 2017-03-27 21:23:08 We have portability issues on RHEL ppc:
|
Comment from mreynolds (@mreynolds389) at 2017-03-27 21:23:10 Metadata Update from @mreynolds389:
|
Comment from firstyear (@Firstyear) at 2017-03-27 22:57:46 You mean 32bit ppc? or ppc64? Is there a link to a build you can send me? |
Comment from mreynolds (@mreynolds389) at 2017-03-27 23:01:41
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12870341 |
Comment from firstyear (@Firstyear) at 2017-03-28 05:52:21 |
Comment from firstyear (@Firstyear) at 2017-03-28 05:53:21 Metadata Update from @Firstyear:
|
Comment from mreynolds (@mreynolds389) at 2017-03-28 17:59:53 |
Comment from mreynolds (@mreynolds389) at 2017-03-28 18:00:05 Metadata Update from @mreynolds389:
|
Comment from mreynolds (@mreynolds389) at 2017-05-02 19:55:55 Reopening, since one of the origin l issues was not addressed. |
Comment from mreynolds (@mreynolds389) at 2017-05-02 19:55:59 Metadata Update from @mreynolds389:
|
Comment from firstyear (@Firstyear) at 2017-05-03 02:31:59 ack! I'm all for removing code like this. We should really check all our counters and see if they make sense, because there is a cost to using atomics that is not free. :) |
Comment from firstyear (@Firstyear) at 2017-05-03 02:32:03 Metadata Update from @Firstyear:
|
Comment from mreynolds (@mreynolds389) at 2017-05-03 16:08:41
I hear you, but the rest of the stats are mostly pure Berkeley DB stats - which should not be removed. |
Comment from mreynolds (@mreynolds389) at 2017-05-03 16:08:46 Metadata Update from @mreynolds389:
|
Comment from firstyear (@Firstyear) at 2017-05-10 08:14:05 The backport to 1.3.5 was incomplete :( commit 0b5b193 |
Comment from vashirov (@vashirov) at 2017-08-31 16:42:07 Currently test for this issue is the slowest, it runs for 10-15 minutes. We can cheat a little bit and change the counter value to something close to 2^32 than 0. This change requires installed gdb and debuginfo packages on a system under test. It's not an elegant solution, so I'm open to other opinions on how we can improve this test run time. |
Comment from vashirov (@vashirov) at 2017-08-31 16:42:07 Metadata Update from @vashirov:
|
Comment from vashirov (@vashirov) at 2017-08-31 16:42:22 Metadata Update from @vashirov:
|
Comment from firstyear (@Firstyear) at 2017-09-01 00:04:28 @vashirov There is a cmocka test for counters in the code base: we don't need to test them from the lib389 tools. I think we can remove the lib389 test instead because this is really slow. |
Comment from vashirov (@vashirov) at 2017-09-01 01:32:11 Ah, great! I agree, cmocka is a better tool for testing this. New patch is attached. |
Comment from firstyear (@Firstyear) at 2017-09-01 02:29:09 Metadata Update from @Firstyear:
|
Comment from vashirov (@vashirov) at 2017-09-01 07:54:38 To ssh://pagure.io/389-ds-base.git |
Comment from vashirov (@vashirov) at 2017-09-01 07:54:38 Metadata Update from @vashirov:
|
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/48989
Description of problem:
The text was updated successfully, but these errors were encountered: