Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log internal unindexed searches (notes=A) #4219

Closed
389-ds-bot opened this issue Sep 13, 2020 · 14 comments
Closed

Log internal unindexed searches (notes=A) #4219

389-ds-bot opened this issue Sep 13, 2020 · 14 comments
Labels
closed: fixed Migration flag - Issue
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/51166


Issue Description

At some point in time we stopped logging internal unindexed searches in the errors log because they were noisy and often false positives, but we have since made it more fine grained with notes=A. For a notes=A case we should log the search details in the errors log. This will help determine the root cause of some of the db lock exhaustion cases we are seeing.

@389-ds-bot 389-ds-bot added the closed: fixed Migration flag - Issue label Sep 13, 2020
@389-ds-bot 389-ds-bot added this to the 1.4.2 milestone Sep 13, 2020
@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2020-06-18 21:37:46

Metadata Update from @mreynolds389:

  • Issue assigned to mreynolds389

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2020-06-18 21:38:05

Metadata Update from @mreynolds389:

  • Custom field origin adjusted to None
  • Custom field reviewstatus adjusted to None
  • Issue priority set to: normal
  • Issue set to the milestone: 1.4.2

@389-ds-bot
Copy link
Author

389-ds-bot commented Sep 13, 2020

Comment from mreynolds (@mreynolds389) at 2020-06-18 21:39:24

#4220

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2020-06-22 15:02:58

Commit 16e47bff relates to this ticket

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2020-06-22 15:16:53

Commit 16e47bf relates to this ticket

c7d81c2..9e1a88c 389-ds-base-1.4.3 -> 389-ds-base-1.4.3

a53c73b..3fdb2ed 389-ds-base-1.4.2 -> 389-ds-base-1.4.2

7cc0f1d..49cbd94 389-ds-base-1.3.10 -> 389-ds-base-1.3.10

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2020-06-22 15:16:53

Metadata Update from @mreynolds389:

  • Issue close_status updated to: fixed
  • Issue status updated to: Closed (was: Open)

@389-ds-bot
Copy link
Author

Comment from tmihinto at 2020-07-02 10:48:51

Hi Mark,

I have built a patch on top of 1.3.10.1-5 using https://pagure.io/389-ds-base/c/49cbd94a0

After configuring the RHDS instance to trigger a full DB scan upon a deletion of an entry,
the DEL command does actually kill the server:

# ldapdelete -x -D "cn=Directory Manager" -w<PASSWORD> -h <HOST> -p <PORT> "uid=test2,ou=People,dc=example,dc=com"
ldap_result: Can't contact LDAP server (-1)
#

Running the instance under gdb gives a hint about the issue:

# gdb /usr/sbin/ns-slapd
...
(gdb) set args  -D /etc/dirsrv/slapd-test -i /var/run/dirsrv/slapd-test.pid -w /var/run/dirsrv/slapd-test.startpid -d 0
(gdb) run
Starting program: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-test -i /var/run/dirsrv/slapd-test.pid -w /var/run/dirsrv/slapd-test.startpid -d 0
...
[New Thread 0x7fffbbe29700 (LWP 31223)]
/usr/sbin/ns-slapd: symbol lookup error: /usr/lib64/dirsrv/plugins/libback-ldbm.so: undefined symbol: get_internal_conn_op
[Thread 0x7fffbbe29700 (LWP 31223) exited]
...
#

The function get_internal_conn_op() is used in the patch but its definition is missing.

It is defined in the 1.4 branch in the file ldap/servers/slapd/util.c

void
get_internal_conn_op (uint64_t *connid, int32_t *op_id, int32_t *op_internal_id, int32_t *op_nested_count) {
    struct slapi_td_log_op_state_t *op_state = slapi_td_get_log_op_state();

    if (op_state != NULL) {
        *connid = op_state->conn_id;
        *op_id = op_state->op_id;
        *op_internal_id = op_state->op_int_id;
        *op_nested_count = op_state->op_nest_count;

    } else {
        *connid = 0;
        *op_id = 0;
        *op_internal_id = 0;
        *op_nested_count = 0;
    }
}

The 1.3.10 branch doesn't seem to have the function definition:

$ find 389-ds-base-1.3.10/ -name *.c | xargs grep get_internal_conn_op
389-ds-base-1.3.10//389-ds-base/ldap/servers/slapd/back-ldbm/ldbm_search.c:            get_internal_conn_op(&connid, &op_id, &op_internal_id, &op_nested_count);
$

Would you mind to add the missing part to the patch?

Regards,
Têko.

@389-ds-bot
Copy link
Author

389-ds-bot commented Sep 13, 2020

Comment from tbordaz (@tbordaz) at 2020-07-02 11:23:05

@tmihinto this missing part comes from #2088. It allows to retrieve which client operation (op/conn) triggered the internal operation. 49029 was fixed in several commits that makes the backport a bit more complex.
For a hotfix, an option is to just log the unindexed internal search without the reference to the client operation. The goal is to know which attributes need to be indexed and a simple log will give the information. Knowing which client operation triggered it, is interesting but not crucial for reindexing.

@389-ds-bot
Copy link
Author

Comment from tmihinto at 2020-07-02 11:51:39

Thanks Thierry!

I would prefer if Engineering can fix the patch so that when CEE folks need to build a hotfix, they don't hit the same issue ;-)

Meanwhile could you please confirm that the following code changes should be enough?

get_internal_conn_op(&connid, &op_id, &op_internal_id, &op_nested_count);

            slapi_log_err(SLAPI_LOG_NOTICE, "ldbm_back_search",
                     "Internal unindexed search: source (%s) search base=\"%s\" scope=%d filter=\"%s\",
                     plugin_dn, base_dn, scope, filter_str);
             slapi_ch_free_string(&plugin_dn);

Regards,
Têko.

@389-ds-bot
Copy link
Author

Comment from tmihinto at 2020-07-03 14:35:54

@mreynolds389
Hi Mark,

Please let me know when the new patch is ready.

Thanks,
Têko.

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2020-07-03 15:38:36

@mreynolds389
Hi Mark,
Please let me know when the new patch is ready.
Thanks,
Têko.

Try this one:

diff --git a/ldap/servers/slapd/back-ldbm/ldbm_search.c b/ldap/servers/slapd/back-ldbm/ldbm_search.c
index 212b6cb61..81eef3ac9 100644
--- a/ldap/servers/slapd/back-ldbm/ldbm_search.c
+++ b/ldap/servers/slapd/back-ldbm/ldbm_search.c
@@ -831,10 +831,6 @@ ldbm_back_search(Slapi_PBlock *pb)
         char *plugin_dn;
         char *base_dn;
         int32_t internal_op = operation_is_flag_set(operation, OP_FLAG_INTERNAL);
-        uint64_t connid;
-        int32_t op_id;
-        int32_t op_internal_id;
-        int32_t op_nested_count;
 
         /*
          * Return error if nsslapd-require-index is set and
@@ -873,10 +869,9 @@ ldbm_back_search(Slapi_PBlock *pb)
                 slapi_pblock_get(pb, SLAPI_PLUGIN, &plugin);
             }
             plugin_dn = plugin_get_dn(plugin);
-            get_internal_conn_op(&connid, &op_id, &op_internal_id, &op_nested_count);
             slapi_log_err(SLAPI_LOG_NOTICE, "ldbm_back_search",
-                    "Internal unindexed search: source (%s) search base=\"%s\" scope=%d filter=\"%s\" conn=%" PRIu64 " op=%d (internal op=%d count=%d)\n",
-                    plugin_dn, base_dn, scope, filter_str, connid, op_id, op_internal_id, op_nested_count);
+                    "Internal unindexed search: source (%s) search base=\"%s\" scope=%d filter=\"%s\" conn=%" PRIu64 " op=%d\n",
+                    plugin_dn, base_dn, scope, filter_str, pb_op->o_connid, pb_op->o_opid);
             slapi_ch_free_string(&plugin_dn);
         } else {
             slapi_log_err(SLAPI_LOG_NOTICE, "ldbm_back_search",

@389-ds-bot
Copy link
Author

Comment from tmihinto at 2020-07-03 19:35:48

@mreynolds389
Thank you Mark!
It works now. Just had to figure out that it was cumulative with the previous version ;-)

[03/Jul/2020:19:17:07.121844143 +0200] - NOTICE - ldbm_back_search - Internal unindexed search: source (cn=referential integrity postoperation,cn=plugins,cn=config) search base="dc=example,dc=com" scope=2 filter="(description=uid=test2,ou=People,dc=example,dc=com)" conn=0 op=0

@vashirov
Copy link
Member

vashirov commented Dec 8, 2020

Hey @mreynolds389, could you please push this patch to 1.3.10 branch? It is still affected by this issue. And it also made it to downstream rpms...

diff --git a/ldap/servers/slapd/back-ldbm/ldbm_search.c b/ldap/servers/slapd/back-ldbm/ldbm_search.c
index 212b6cb61..81eef3ac9 100644
--- a/ldap/servers/slapd/back-ldbm/ldbm_search.c
+++ b/ldap/servers/slapd/back-ldbm/ldbm_search.c
@@ -831,10 +831,6 @@ ldbm_back_search(Slapi_PBlock *pb)
         char *plugin_dn;
         char *base_dn;
         int32_t internal_op = operation_is_flag_set(operation, OP_FLAG_INTERNAL);
-        uint64_t connid;
-        int32_t op_id;
-        int32_t op_internal_id;
-        int32_t op_nested_count;
 
         /*
          * Return error if nsslapd-require-index is set and
@@ -873,10 +869,9 @@ ldbm_back_search(Slapi_PBlock *pb)
                 slapi_pblock_get(pb, SLAPI_PLUGIN, &plugin);
             }
             plugin_dn = plugin_get_dn(plugin);
-            get_internal_conn_op(&connid, &op_id, &op_internal_id, &op_nested_count);
             slapi_log_err(SLAPI_LOG_NOTICE, "ldbm_back_search",
-                    "Internal unindexed search: source (%s) search base=\"%s\" scope=%d filter=\"%s\" conn=%" PRIu64 " op=%d (internal op=%d count=%d)\n",
-                    plugin_dn, base_dn, scope, filter_str, connid, op_id, op_internal_id, op_nested_count);
+                    "Internal unindexed search: source (%s) search base=\"%s\" scope=%d filter=\"%s\" conn=%" PRIu64 " op=%d\n",
+                    plugin_dn, base_dn, scope, filter_str, pb_op->o_connid, pb_op->o_opid);
             slapi_ch_free_string(&plugin_dn);
         } else {
             slapi_log_err(SLAPI_LOG_NOTICE, "ldbm_back_search",

mreynolds389 added a commit that referenced this issue Dec 8, 2020
Description:  In 1.3.x there is no function get_internal_conn_op()
              which was beign called and crashing the server.  Just
              use the regular conn and op numbers.

Fixes: #4219

Reviewed by:  tmihinto(Thanks!)
@mreynolds389
Copy link
Contributor

Pushed upstream:

92fdaba..f917145 389-ds-base-1.3.10 -> 389-ds-base-1.3.10

vashirov added a commit that referenced this issue Jan 10, 2021
Description:
Add a test case.

Relates: #4219

Reviewed by: @mreynolds389, @droideck, @tmihinto (Thanks!)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: fixed Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

3 participants