New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug in 389-ds-base-1.2.11.15-91.el6_9.x86_64 with large paged searches #2488
Comments
Comment from mreynolds (@mreynolds389) at 2017-10-26 21:11:23 Tickets of interest
|
Comment from mreynolds (@mreynolds389) at 2017-10-26 21:11:28 Metadata Update from @mreynolds389:
|
Comment from mreynolds (@mreynolds389) at 2017-10-30 18:47:41 Connection 2 is the paged result search, connection 1 is just performing search and search.
So there is a blocking event between the page being returned, and the next page being prepared. I had to use a large database and very large page size (20000) to get this to occur, but I would like test this over the network to see if I can get the same results (etime=20) but with a smaller page size. |
Comment from jeaton at 2017-10-30 18:54:57 Its likely that the particular search in question in my example triggers it more easily due to the large number of objects with isMemberOf attributes (~80K) as well as the high number of values for that attribute (some objects have upwards of 2000 values). I did try to recreate this on RH7 with 389-ds-base-1.3.6.1-21.el7_4.x86_64 and was not able to reproduce it there yet. |
Comment from mreynolds (@mreynolds389) at 2017-10-30 19:42:13 Actually I did not reproduce the issue, the problem was that my python test script was hanging, not the server. |
Comment from jeaton at 2017-10-30 20:42:51 I may be able to generate a sufficiently anonymized database sample to replicate the problem if you would like. |
Comment from mreynolds (@mreynolds389) at 2018-01-23 17:32:42 @jeaton any update on a reproducer? Thanks! |
Comment from mreynolds (@mreynolds389) at 2018-06-14 19:52:14 Metadata Update from @mreynolds389:
|
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49429
Issue Description
I recently updated to 389-ds-base-1.2.11.15-91.el6_9.x86_64 (from 389-ds-base-1.2.11.15-50.el6_6.x86_64) and have uncovered a bug with large paged searches blocking all other searches on the server.
The case I've discovered is as follows:
I have a search which returns about 28,000 entries. The system which generates this search does a paged search with 1000 results per page. I can replicate the behavior with a command line ldap search like the following
ldapsearch -H ldaps://servername:636 -D 'uid=binddn,ou=specials,dc=cmu,dc=edu' -x -W -E pr=1000/noprompt "(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))",
As soon as the first 1000 entries are returned, and the search requests the second page, all other searches being done to the same server block.
Package Version and Platform
Red Hat Enterprise Linux Server release 6.9 (Santiago)
389-ds-base-1.2.11.15-91.el6_9.x86_64
Steps to reproduce
Start a loop with a trivial query, eg: while true; do time ldapsearch -h servername -x uid=jeaton; sleep 1 ; done
Run a large paged query, eg: ldapsearch -H ldaps://servername:636 -D 'uid=binddn,ou=specials,dc=cmu,dc=edu' -x -W -E pr=1000/noprompt "(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))"
(this query returns a total of 28K entries, in pages of 1000
[26/Oct/2017:10:10:47 -0400] conn=202698 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 UNBIND
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:47 -0400] conn=202698 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 UNBIND
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:48 -0400] conn=202657 op=2 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:48 -0400] conn=202699 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:11:33 -0400] conn=202657 op=2 RESULT err=0 tag=101 nentries=1000 etime=45 notes=P pr_idx=0
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 RESULT err=0 tag=101 nentries=1 etime=0
Actual results
The server performs fine during the request for the first page of results. As soon as the second page is requested, all other requests to the server block until the response for the page is sent. The server continues to block for each additional page of responses, until the entire long search is completed.
Expected results
Other queries continue to get responses.
Full log excerpts below:
Everything looks good in the access log at first:
[26/Oct/2017:10:10:02 -0400] conn=202655 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:02 -0400] conn=202655 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:02 -0400] conn=202655 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:02 -0400] conn=202655 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:02 -0400] conn=202655 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:02 -0400] conn=202655 op=2 UNBIND
[26/Oct/2017:10:10:02 -0400] conn=202655 op=2 fd=85 closed - U1
[26/Oct/2017:10:10:03 -0400] conn=202656 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:04 -0400] conn=202656 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:04 -0400] conn=202656 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:04 -0400] conn=202656 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:04 -0400] conn=202656 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:04 -0400] conn=202656 op=2 UNBIND
[26/Oct/2017:10:10:04 -0400] conn=202656 op=2 fd=85 closed - U1
[26/Oct/2017:10:10:04 -0400] conn=202657 fd=85 slot=85 SSL connection from 10.0.0.2 to 10.1.1.1
[26/Oct/2017:10:10:04 -0400] conn=202657 TLS1.0 256-bit AES
[26/Oct/2017:10:10:04 -0400] conn=202657 op=0 BIND dn="uid=binddn,ou=specials,dc=cmu,dc=edu" method=128 version=3
[26/Oct/2017:10:10:04 -0400] conn=202657 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=binddn,ou=specials,dc=cmu,dc=edu"
[26/Oct/2017:10:10:04 -0400] conn=202657 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:05 -0400] conn=202658 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:05 -0400] conn=202658 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:05 -0400] conn=202658 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:05 -0400] conn=202658 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:05 -0400] conn=202658 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:05 -0400] conn=202658 op=2 UNBIND
[26/Oct/2017:10:10:05 -0400] conn=202658 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:06 -0400] conn=202659 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:06 -0400] conn=202659 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:06 -0400] conn=202659 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:06 -0400] conn=202659 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:06 -0400] conn=202659 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:06 -0400] conn=202659 op=2 UNBIND
[26/Oct/2017:10:10:06 -0400] conn=202659 op=2 fd=86 closed - U1
until we get to the point where the first page is returned and the client requests the second page of results:
[26/Oct/2017:10:10:46 -0400] conn=202697 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:46 -0400] conn=202697 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:46 -0400] conn=202697 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:46 -0400] conn=202697 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:46 -0400] conn=202697 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:46 -0400] conn=202697 op=2 UNBIND
[26/Oct/2017:10:10:46 -0400] conn=202697 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:47 -0400] conn=202698 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 UNBIND
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:48 -0400] conn=202657 op=1 RESULT err=0 tag=101 nentries=1000 etime=44 notes=P pr_idx=0
[26/Oct/2017:10:10:48 -0400] conn=202657 op=2 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:48 -0400] conn=202699 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:11:33 -0400] conn=202657 op=2 RESULT err=0 tag=101 nentries=1000 etime=45 notes=P pr_idx=0
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:11:33 -0400] conn=202657 op=3 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=2 UNBIND
[26/Oct/2017:10:11:33 -0400] conn=202699 op=2 fd=86 closed - U1
[26/Oct/2017:10:12:14 -0400] conn=202657 op=3 RESULT err=0 tag=101 nentries=1000 etime=41 notes=P pr_idx=0
[26/Oct/2017:10:12:14 -0400] conn=202700 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:12:14 -0400] conn=202700 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:12:14 -0400] conn=202657 op=4 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:12:14 -0400] conn=202700 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:12:14 -0400] conn=202700 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:12:14 -0400] conn=202700 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:12:14 -0400] conn=202700 op=2 UNBIND
[26/Oct/2017:10:12:14 -0400] conn=202700 op=2 fd=86 closed - U1
[26/Oct/2017:10:12:58 -0400] conn=202657 op=4 RESULT err=0 tag=101 nentries=1000 etime=44 notes=P pr_idx=0
Note the large gap in time between 10:10:48 and 10:11:33. You can see my client doing the trivial search connect at 10:10:48 with conn=202699. At that point the client just hangs, waiting on the server's response. Only after the server responds with the second page of results at 10:11:33, is my client able to send the bind request and search. This same pattern happens with every further page of searches, eg, 10:11:33 - 10:12:14.
Then when we get to the final page, as soon as that completes, the trivial searches once again complete at one per second:
[26/Oct/2017:10:25:30 -0400] conn=202657 op=28 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:25:30 -0400] conn=202737 op=0 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:25:30 -0400] conn=202738 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:25:30 -0400] conn=202738 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:25:30 -0400] conn=202738 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:25:58 -0400] conn=202738 op=2 UNBIND
[26/Oct/2017:10:25:58 -0400] conn=202740 fd=89 slot=89 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:25:58 -0400] conn=202657 op=28 RESULT err=0 tag=101 nentries=1000 etime=28 notes=P pr_idx=0
[26/Oct/2017:10:25:58 -0400] conn=202737 op=1 UNBIND
[26/Oct/2017:10:25:58 -0400] conn=202737 op=1 fd=87 closed - U1
[26/Oct/2017:10:25:58 -0400] conn=202738 op=2 fd=88 closed - U1
[26/Oct/2017:10:25:58 -0400] conn=202740 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:25:58 -0400] conn=202657 op=29 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:25:58 -0400] conn=202740 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:25:58 -0400] conn=202740 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:25:58 -0400] conn=202740 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:08 -0400] conn=202741 fd=87 slot=87 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:08 -0400] conn=202740 op=2 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202741 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:08 -0400] conn=202741 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:08 -0400] conn=202740 op=2 fd=89 closed - U1
[26/Oct/2017:10:26:08 -0400] conn=202741 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:08 -0400] conn=202657 op=29 RESULT err=0 tag=101 nentries=455 etime=10 notes=P pr_idx=0
[26/Oct/2017:10:26:08 -0400] conn=202741 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:08 -0400] conn=202657 op=30 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202741 op=2 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202657 op=30 fd=85 closed - U1
[26/Oct/2017:10:26:08 -0400] conn=202741 op=2 fd=87 closed - U1
[26/Oct/2017:10:26:09 -0400] conn=202742 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:09 -0400] conn=202742 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:09 -0400] conn=202742 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:09 -0400] conn=202742 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:09 -0400] conn=202742 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:09 -0400] conn=202742 op=2 UNBIND
[26/Oct/2017:10:26:09 -0400] conn=202742 op=2 fd=85 closed - U1
[26/Oct/2017:10:26:10 -0400] conn=202743 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:10 -0400] conn=202743 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:10 -0400] conn=202743 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:10 -0400] conn=202743 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:10 -0400] conn=202743 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:10 -0400] conn=202743 op=2 UNBIND
I did some testing and think that the bug was introduced around 389-ds-base-1.2.11.15-72.el6_7. I can probably upgrade one by one and find it specifically if desired.
The text was updated successfully, but these errors were encountered: