Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow in performance counters #2048

Closed
389-ds-bot opened this issue Sep 13, 2020 · 48 comments
Closed

Integer overflow in performance counters #2048

389-ds-bot opened this issue Sep 13, 2020 · 48 comments
Labels
closed: fixed Migration flag - Issue
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/48989


Description of problem:

From https://bugzilla.redhat.com/show_bug.cgi?id=1349577

There is another overflow in performance counters:
./ldap/servers/slapd/back-ldbm/perfctrs.h:
 33     PRUint32    page_access_rate;

./ldap/servers/slapd/back-ldbm/perfctrs.c:
167             perf->cache_size_bytes = mpstat->st_gbytes * ONEG +
mpstat->st_bytes;
168             perf->page_access_rate = mpstat->st_cache_hit +
mpstat->st_cache_miss;       <===
169             perf->cache_hit = mpstat->st_cache_hit;
170             perf->cache_try = mpstat->st_cache_hit + mpstat->st_cache_miss;
<===

Also objects-locked has the value of page_access_rate:
260     { SLAPI_LDBM_PERFCTR_AT_PREFIX "objects-locked",
261             offsetof( performance_counters, page_access_rate ) },
262     { SLAPI_LDBM_PERFCTR_AT_PREFIX "page-create-rate",
263             offsetof( performance_counters, page_create_rate ) },

ldapsearch -x -D "cn=Directory Manager" -w Secret123  -b "cn=monitor,cn=ldbm
database,cn=plugins,cn=config"  -LLL dbcachetries  nsslapd-db-objects-locked
dn: cn=monitor,cn=ldbm database,cn=plugins,cn=config
dbcachetries: 509565

dn: cn=database,cn=monitor,cn=ldbm database,cn=plugins,cn=config
nsslapd-db-objects-locked: 509565

Version-Release number of selected component (if applicable):
389-ds-base-1.3.5.10-11.el7.x86_64
@389-ds-bot 389-ds-bot added the closed: fixed Migration flag - Issue label Sep 13, 2020
@389-ds-bot 389-ds-bot added this to the 1.3.6.0 milestone Sep 13, 2020
@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2017-02-11 23:10:11

Metadata Update from @nhosoi:

  • Issue set to the milestone: 1.3.6.0

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-13 17:49:24

Metadata Update from @mreynolds389:

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-20 16:13:58

Metadata Update from @mreynolds389:

  • Issue assigned to mreynolds389

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-20 18:22:29

0001-Issue-48989-Overflow-in-counters-and-monitor.patch

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-20 18:22:39

Metadata Update from @mreynolds389:

  • Custom field reviewstatus adjusted to review (was: new)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-21 09:14:40

Don't use slapi_long, just use uint64_t from inttypes.h please. We should avoid long long, as it's not deterministic.

We don't need 1866 -#define NSPRIu64» "llu", because inttypes.h provides PRIu64 for string formats, so we should use this instead.

For now, don't use __ATOMIC_RELAXED, __ATOMIC_RELAXED, use SEQ_CST, we can relax these as we go with tests to be sure of the effect.

Avoid ,"%llu", use the PRI types from inttypes.h

Sorry to be so difficult about this, I think if we are going to do this, we should get it right. Thanks for your time on this.

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-21 14:06:29

Don't use slapi_long, just use uint64_t from inttypes.h please. We should avoid long long, as it's not deterministic.

Neither is uint64_t - we already talked about this. We NEED to do this for printf'ing. That is the whole point of all of this - otherwise we get overflows or compiler errors.

We don't need 1866 -#define NSPRIu64» "llu", because inttypes.h provides PRIu64 for string formats, so we should use this instead.

Yeah I left that as is to keep the patch small - figured all of this could be replaced by another patch.

For now, don't use __ATOMIC_RELAXED, __ATOMIC_RELAXED, use SEQ_CST, we can relax these as we go with tests to be sure of the effect.

I was also wondering about __ATOMIC_SEQ_CST in slapi_counters.c. I saw that you changed this to atomic release/acquire in nunc-stans. Is that something we want to do here as well?

Avoid ,"%llu", use the PRI types from inttypes.h

I can do that __PRI64_PREFIX + PRIu64 = "llu" - this needs testing though...

Sorry to be so difficult about this, I think if we are going to do this, we should get it right. Thanks for your time on this.

You're not, but I don't think you know what's going on exactly. If we don't use "%llu" for our logging functions/printfs we overflow. So although PRUint64 and uint64_t behave as unsigned long longs, the compiler thinks they are just unsigned longs. So to avoid the overflows we must use "%llu" for these types, but... this will cause compiler warnings about the wrong type being used because gcc thinks unint64_t is a unsigned long, etc. So to solve both the issues we have to cast. I would rather cast when we return the counter value, than cast every time we use printf formatting. Hence, slapi_long to make things consistent and easy.

I'm all ears if you can tell me how to get this working without casting, AND without overflows & compiler warnings.

Anyway I need to test inttypes.h string formats to see if it works as expected.

Thanks.
Mark

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-21 16:58:52

I found the root cause of ALL of this... NSPR printf functions!

The standard printf functions work correctly with uint64_t/PRUint64 (no overflows - no compiler warnings). This means we can remove slapi_long and all the casting. So I need to redo the entire fix, but this is good news and it will be significantly cleaner.

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-21 20:37:24

New patch

0001-Issue-48989-Overflow-in-counters-and-monitor.patch

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-21 23:15:32

Ah hah! That would explain it.

I think we should be reducing our reliance on NSPR, it adds a lot of overhead and it's relevance is not as high these days.

I'm much happier with this patch: I think one thing is that we need to be sure we only read from this struct via the functions rather than the direct access, but I think this is a great start for now.

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-21 23:17:10

Metadata Update from @Firstyear:

  • Custom field reviewstatus adjusted to ack (was: review)

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-22 01:18:37

fdf78dc..434a92f master -> master

ee63e40..70084f1 389-ds-base-1.3.5 -> 389-ds-base-1.3.5

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-22 01:18:46

Metadata Update from @mreynolds389:

  • Issue close_status updated to: fixed
  • Issue status updated to: Closed (was: Open)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-22 06:21:35

0001-Ticket-48989-Improve-counter-overflow-fix.patch

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-22 06:21:45

Metadata Update from @Firstyear:

  • Issue status updated to: Open (was: Closed)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-22 06:21:55

Metadata Update from @Firstyear:

  • Custom field reviewstatus adjusted to review (was: ack)

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-22 17:23:09

You removed the atomic counter assembly code for 32bit archs on linux, so is __atomic_store_8() actually atomic on 32bit envs? Meaning does it use its "backup" internal mutex, or is it truly atomic? I don't want to remove functionality for code cleanliness.

The rest looks fine :)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-22 23:07:35

It depends on the platform. IIRC there is capability to do 64bit atomics on x86 systems in some cases, but I may be remembering incorrectly.

Regardless, the whole point of this patch is to fix the overflow, which is a 32bit limit. The hardcoded inline asm only works on a 32bit int. So it's wrong anyway. This way we have a guarantee that:

  • it's 64bit capable on all platforms
  • it's always atomic (by cpu instruction, or mutex)

This is exactly what we want :)

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-23 15:02:55

OK, ack

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-23 15:03:06

Metadata Update from @mreynolds389:

  • Custom field reviewstatus adjusted to ack (was: review)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-24 00:37:25

commit 15ce646
To ssh://git@pagure.io/389-ds-base.git
85807be..15ce646 master -> master

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-24 00:37:35

Metadata Update from @Firstyear:

  • Issue close_status updated to: fixed
  • Issue status updated to: Closed (was: Open)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-24 01:01:11

commit 1155d50
To ssh://git@pagure.io/389-ds-base.git
15ce646..1155d50 master -> master

One line fix for missing return.

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-27 21:23:01

We have portability issues on RHEL ppc:

libtool: link: gcc -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m32 -mcpu=power7 -mtune=power7 -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/pwdhash-bin ldap/servers/slapd/tools/pwdhash_bin-pwenc.o ./.libs/libslapd.so -L/usr/lib -ltcmalloc -lkrb5 -lk5crypto -lcom_err -lpcre -lpthread -lsystemd -lplc4 -lplds4 -lnspr4 -lssl3 -lnss3 -lsvrcore -lldap_r -llber -lsasl2 -ldl -Wl,-rpath -Wl,/usr/lib/dirsrv
./.libs/libslapd.so: undefined reference to __atomic_fetch_sub_8' ./.libs/libslapd.so: undefined reference to __atomic_load_8'
./.libs/libslapd.so: undefined reference to __atomic_store_8' ./.libs/libslapd.so: undefined reference to __atomic_fetch_add_8'
collect2: error: ld returned 1 exit status

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-27 21:23:08

We have portability issues on RHEL ppc:

libtool: link: gcc -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m32 -mcpu=power7 -mtune=power7 -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/pwdhash-bin ldap/servers/slapd/tools/pwdhash_bin-pwenc.o  ./.libs/libslapd.so -L/usr/lib -ltcmalloc -lkrb5 -lk5crypto -lcom_err -lpcre -lpthread -lsystemd -lplc4 -lplds4 -lnspr4 -lssl3 -lnss3 -lsvrcore -lldap_r -llber -lsasl2 -ldl -Wl,-rpath -Wl,/usr/lib/dirsrv
./.libs/libslapd.so: undefined reference to `__atomic_fetch_sub_8'
./.libs/libslapd.so: undefined reference to `__atomic_load_8'
./.libs/libslapd.so: undefined reference to `__atomic_store_8'
./.libs/libslapd.so: undefined reference to `__atomic_fetch_add_8'
collect2: error: ld returned 1 exit status

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-27 21:23:10

Metadata Update from @mreynolds389:

  • Issue status updated to: Open (was: Closed)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-27 22:57:46

You mean 32bit ppc? or ppc64? Is there a link to a build you can send me?

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-27 23:01:41

You mean 32bit ppc? or ppc64? Is there a link to a build you can send me?

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12870341

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-28 05:52:21

0001-Ticket-48989-Re-implement-lock-counter.patch

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-28 05:53:21

Metadata Update from @Firstyear:

  • Custom field reviewstatus adjusted to review (was: ack)

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-28 17:59:53

3c869cc..e59420e master -> master

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-03-28 18:00:05

Metadata Update from @mreynolds389:

  • Custom field reviewstatus adjusted to ack (was: review)
  • Issue close_status updated to: fixed
  • Issue status updated to: Closed (was: Open)

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-05-02 19:55:55

Reopening, since one of the origin l issues was not addressed.

0001-Ticket-48989-fix-perf-counters.patch

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-05-02 19:55:59

Metadata Update from @mreynolds389:

  • Custom field reviewstatus adjusted to review (was: ack)
  • Issue status updated to: Open (was: Closed)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-05-03 02:31:59

ack! I'm all for removing code like this. We should really check all our counters and see if they make sense, because there is a cost to using atomics that is not free. :)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-05-03 02:32:03

Metadata Update from @Firstyear:

  • Custom field reviewstatus adjusted to ack (was: review)

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-05-03 16:08:41

ack! I'm all for removing code like this. We should really check all our counters and see if they make sense, because there is a cost to using atomics that is not free. :)

I hear you, but the rest of the stats are mostly pure Berkeley DB stats - which should not be removed.

770bf3a..18a77e9 master -> master

0b116ee..f84cdfd 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2017-05-03 16:08:46

Metadata Update from @mreynolds389:

  • Issue close_status updated to: fixed
  • Issue status updated to: Closed (was: Open)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-05-10 08:14:05

The backport to 1.3.5 was incomplete :(

commit 0b5b193
commit abe6d51
commit 0ff9bb3
To ssh://git@pagure.io/389-ds-base.git
e2ee1c1..0ff9bb3 389-ds-base-1.3.5 -> 389-ds-base-1.3.5

@389-ds-bot
Copy link
Author

Comment from vashirov (@vashirov) at 2017-08-31 16:42:07

Currently test for this issue is the slowest, it runs for 10-15 minutes. We can cheat a little bit and change the counter value to something close to 2^32 than 0. This change requires installed gdb and debuginfo packages on a system under test.

It's not an elegant solution, so I'm open to other opinions on how we can improve this test run time.
0001-Issue-48989-Improve-test-execution-time.patch

@389-ds-bot
Copy link
Author

Comment from vashirov (@vashirov) at 2017-08-31 16:42:07

Metadata Update from @vashirov:

  • Custom field version adjusted to None
  • Issue status updated to: Open (was: Closed)

@389-ds-bot
Copy link
Author

Comment from vashirov (@vashirov) at 2017-08-31 16:42:22

Metadata Update from @vashirov:

  • Custom field reviewstatus adjusted to review (was: ack)

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-09-01 00:04:28

@vashirov There is a cmocka test for counters in the code base: we don't need to test them from the lib389 tools. I think we can remove the lib389 test instead because this is really slow.

@389-ds-bot
Copy link
Author

Comment from vashirov (@vashirov) at 2017-09-01 01:32:11

Ah, great! I agree, cmocka is a better tool for testing this. New patch is attached.
0001-Issue-48989-Delete-slow-lib389-test.patch

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-09-01 02:29:09

Metadata Update from @Firstyear:

  • Custom field reviewstatus adjusted to ack (was: review)

@389-ds-bot
Copy link
Author

Comment from vashirov (@vashirov) at 2017-09-01 07:54:38

To ssh://pagure.io/389-ds-base.git
01272f7..e9dd75d master -> master

@389-ds-bot
Copy link
Author

Comment from vashirov (@vashirov) at 2017-09-01 07:54:38

Metadata Update from @vashirov:

  • Issue close_status updated to: fixed
  • Issue status updated to: Closed (was: Open)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: fixed Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

1 participant