Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SASL GSSAPI & GSS-SPNEGO performance issue #4506

Closed
tiran opened this issue Dec 17, 2020 · 29 comments · Fixed by #4516 or #4555
Closed

SASL GSSAPI & GSS-SPNEGO performance issue #4506

tiran opened this issue Dec 17, 2020 · 29 comments · Fixed by #4516 or #4555
Milestone

Comments

@tiran
Copy link
Contributor

tiran commented Dec 17, 2020

Issue Description

Today I noticed again that LDAP SASL bind with GSSAPI and GSS-SPNEGO are sometimes slow. The issue is similar to https://pagure.io/freeipa/issue/6656. In about half the cases SASL bind is faster. In other cases the operation is more than a magnitude slower. I'm seeing the same performance spikes with GSS-SPNEGO. In fast case GSS-SPNEGO is a bit faster than GSSAPI. In slow cases it degrades like GSSAPI.

Package Version and Platform:

  • Platform: Fedora 32
  • Package and version: 389-ds-base-2.0.1-20201215gitae2da1018.fc32.x86_64 and 1.4.3.16-1.fc32, cyrus-sasl-2.1.27-4.fc32.x86_64, krb5-workstation-1.18.2-29.fc32.x86_64

Steps to Reproduce
Steps to reproduce the behavior:

  • Install FreeIPA server
  • kinit -kt /etc/named.keytab DNS/$(hostname)
  • run time ldapwhoami -H ldapi://%2Frun%2Fslapd-IPA-TEST.socket -Y GSSAPI multiple times

Expected results
Command finishes in around 20ms consistently.

Actual result

Wall clock time for the command is all over the board. Sometimes the command finishes in less than 20ms, sometimes it's around 250 and every know and then it takes more than 500ms.

real    0m0,030s
real    0m0,018s
real    0m0,017s
real    0m0,266s
real    0m0,267s
real    0m0,265s
real    0m0,017s
real    0m0,266s
real    0m0,264s
real    0m0,513s
real    0m0,266s
real    0m0,267s
real    0m0,017s
real    0m0,264s
real    0m0,266s
real    0m0,017s
real    0m0,266s
real    0m0,018s
real    0m0,264s
real    0m0,265s
real    0m0,016s
real    0m0,265s
real    0m0,266s
real    0m0,267s
real    0m0,267s
real    0m0,516s
real    0m0,018s
real    0m0,017s
real    0m0,266s

Additional context

For testing I have stop all IPA services except DS, disabled and removed all IPA plugins that may affect 389-DS performance, and restarted 389-DS. It did not affect performance.

fast bind

    [17/Dec/2020:12:18:24.059835167 +0100] conn=286 fd=109 slot=109 connection from local to /var/run/slapd-IPA-TEST.socket
    [17/Dec/2020:12:18:24.061934129 +0100] conn=286 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI
    [17/Dec/2020:12:18:24.064890914 +0100] conn=286 op=0 RESULT err=14 tag=97 nentries=0 wtime=0.000350396 optime=0.002962757 etime=0.003311237, SASL bind in progress
    [17/Dec/2020:12:18:24.065255570 +0100] conn=286 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
    [17/Dec/2020:12:18:24.066715112 +0100] conn=286 op=1 RESULT err=14 tag=97 nentries=0 wtime=0.000065520 optime=0.001463546 etime=0.001527052, SASL bind in progress
    [17/Dec/2020:12:18:24.067034922 +0100] conn=286 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI
    [17/Dec/2020:12:18:24.068130645 +0100] conn=286 op=2 RESULT err=0 tag=97 nentries=0 wtime=0.000079306 optime=0.001160701 etime=0.001237869 dn="krbprincipalname=dns/server.ipa.test@ipa.test,cn=services,cn=accounts,dc=ipa,dc=test"
    [17/Dec/2020:12:18:24.068538087 +0100] conn=286 op=3 EXT oid="1.3.6.1.4.1.4203.1.11.3" name="whoami-plugin"
    [17/Dec/2020:12:18:24.068629931 +0100] conn=286 op=3 RESULT err=0 tag=120 nentries=0 wtime=0.000232724 optime=0.000098566 etime=0.000328895
    [17/Dec/2020:12:18:24.069395429 +0100] conn=286 op=4 UNBIND
    [17/Dec/2020:12:18:24.069450050 +0100] conn=286 op=4 fd=109 closed error - U1

slow bind (delay between op=1 and op=2)

    [17/Dec/2020:12:18:36.625205796 +0100] conn=292 fd=109 slot=109 connection from local to /var/run/slapd-IPA-TEST.socket
    [17/Dec/2020:12:18:36.627115266 +0100] conn=292 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI
    [17/Dec/2020:12:18:36.630209431 +0100] conn=292 op=0 RESULT err=14 tag=97 nentries=0 wtime=0.000370102 optime=0.003100261 etime=0.003468232, SASL bind in progress
    [17/Dec/2020:12:18:36.630610597 +0100] conn=292 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
    [17/Dec/2020:12:18:36.632120068 +0100] conn=292 op=1 RESULT err=14 tag=97 nentries=0 wtime=0.000076306 optime=0.001513595 etime=0.001587760, SASL bind in progress
    [17/Dec/2020:12:18:36.881177050 +0100] conn=292 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI
    [17/Dec/2020:12:18:36.882648579 +0100] conn=292 op=2 RESULT err=0 tag=97 nentries=0 wtime=0.000174524 optime=0.001485382 etime=0.001657851 dn="krbprincipalname=dns/server.ipa.test@ipa.test,cn=services,cn=accounts,dc=ipa,dc=test"
    [17/Dec/2020:12:18:36.883102483 +0100] conn=292 op=3 EXT oid="1.3.6.1.4.1.4203.1.11.3" name="whoami-plugin"
    [17/Dec/2020:12:18:36.883221013 +0100] conn=292 op=3 RESULT err=0 tag=120 nentries=0 wtime=0.000229879 optime=0.000150909 etime=0.000378423
    [17/Dec/2020:12:18:36.884186770 +0100] conn=292 op=4 UNBIND
    [17/Dec/2020:12:18:36.884207276 +0100] conn=292 op=4 fd=109 closed error - U1

At one point I also enabled tracing to trace function calls into SASL stack. There is a long delay between the first and second ids_sasl_listmech call. You can find a full log at https://cheimes.fedorapeople.org/sasl.log

[17/Dec/2020:12:58:38.133255294 +0100] - DEBUG - ids_sasl_server_new - => (vm-171-075.abc.idm.lab.eng.brq.redhat.com)
[17/Dec/2020:12:58:38.145986011 +0100] - DEBUG - ids_sasl_getopt - plugin= option=log_level
[17/Dec/2020:12:58:38.147450940 +0100] - DEBUG - ids_sasl_getopt - plugin= option=auto_transition
[17/Dec/2020:12:58:38.149436807 +0100] - DEBUG - ids_sasl_getopt - plugin= option=mech_list
[17/Dec/2020:12:58:38.157977606 +0100] - DEBUG - ids_sasl_server_new - <=
[17/Dec/2020:12:58:39.043715851 +0100] - DEBUG - ids_sasl_listmech - =>
[17/Dec/2020:12:58:39.045289486 +0100] - DEBUG - ids_sasl_listmech - <=
[17/Dec/2020:12:58:39.841690920 +0100] - DEBUG - ids_sasl_listmech - =>
[17/Dec/2020:12:58:39.843231866 +0100] - DEBUG - ids_sasl_listmech - <=
@tiran tiran added the needs triage The issue will be triaged during scrum label Dec 17, 2020
@tiran
Copy link
Contributor Author

tiran commented Dec 17, 2020

My gut feeling tells me that the issue could be related to global lock. I see a log of activity related to replication and index management in the trace log. ids_sasl_check_bind() is protected by the big lock pthread_mutex_lock(&(pb_conn->c_mutex)); /* BIG LOCK */

@mreynolds389 mreynolds389 added this to the 2.0.0 milestone Dec 17, 2020
@Firstyear
Copy link
Contributor

I doubt it's the lock you mention here - more likely it's related to #3032

The change mentioned here has to add a global krb lock to resolve an issue/race with freeing memory, and it was noted at the time that significant performance issues may arise from this change. Certainly this would also have interesting interleaving effects with the conn lock. The conn lock however, is correctly placed and is per-connection.

It was also discussed that the "long term" solution would be to move to GSSAPI as the api rather than krb directly, but that is an extensive, invasive and complex piece of work.

@Firstyear
Copy link
Contributor

See commit ff9387b

@Firstyear
Copy link
Contributor

As a work around you may find that reducing your worker thread count to match the number of available CPU's will increase performance due to a reduction in lock contention.

@tiran
Copy link
Contributor Author

tiran commented Dec 18, 2020

Thanks @Firstyear!

I have done my best to isolate the system, reduce any interference, and avoid lock contention from external sources.

  • Test host is a VM on RHEVM with 4 dedicated CPUs and 8 GB allocated RAM. The system is idling with load average 0.06 0.02 0.00 during a test.
  • The test setup has no replicas or clients enrolled. There are no external hosts or internal services that access 389-DS. cn=monitor connection attribute shows one connection.
  • I have stopped all FreeIPA services (ipactl stop, systemctl stop sssd.service) after I acquired the Kerberos service ticket. KRB5 authentication works without a running KDC when the client has the service ticket cached.
  • I'm running ldapwhoami sequentially.

@tiran
Copy link
Contributor Author

tiran commented Dec 18, 2020

slapi_ldap_bind performs serialization for mech GSSAPI only. I can reproduce slow bind with mech GSS-SPNEGO, too.

if (mech && !strcmp(mech, "GSSAPI")) {
krb5_serialized = 1;
}

tiran added a commit to tiran/389-ds-base that referenced this issue Dec 18, 2020
Add log trace for sasl_server_start() and sasl_server_step() to
investigate performance issues with SASL bind.

sasl_errstring() perform a simple and fast switch case mapping from
error code to const string.

See: 389ds#4506
Signed-off-by: Christian Heimes <cheimes@redhat.com>
@tiran
Copy link
Contributor Author

tiran commented Dec 18, 2020

I just realized that the problem may also be on the client side. SASL GSSAPI authentication takes multiple roundtrips. The client may be slow reading or responding to SASL responses. I'm also seeing the same issue with python-ldap, which is just a wrapper for OpenSSL and Cyrus SASL client libs.

We need to keep in mind that it might be the client side or an issue in SASL library on either server or client side.

@tiran
Copy link
Contributor Author

tiran commented Dec 19, 2020

I noticed that the performance of SASL bind operation has a peculiar distribution. It's not random at all. Slow bind operations are exactly a multiple of 250ms slower. This cannot be a coincidence. It looks like a sleep(0.250) or a background thread that blocks SASL for multiple of 250ms.

3ms     329
4ms     1004
5ms     52
6ms     1
253ms   182
254ms   324
255ms   66
256ms   3
263ms   1
502ms   1
503ms   26
504ms   9
505ms   1
753ms   1

2000 SASL GSSAPI bind over LDAPI. I only measured the actual bind operation with a perf counter:

def profile(uri, meth):
    conn = ldap.initialize(uri)
    conn.whoami_s()  # start connection
    start = time.perf_counter()
    try:
        conn.sasl_interactive_bind_s('', meth)
        return time.perf_counter() - start
    finally:
        conn.unbind()

@tiran
Copy link
Contributor Author

tiran commented Dec 19, 2020

It's the wakeup timer! I changed the wakeup period from 250 to 25 and the performance of SASL bind operations changed dramatically. The fastest connections are a bit slower and slower operations are now quantized with multiple of 25ms.

Is it possible that some SASL calls are either blocked by or executed in the main thread?

--- a/ldap/servers/slapd/daemon.c
+++ b/ldap/servers/slapd/daemon.c
@@ -61,9 +61,9 @@
 #endif /* ENABLE_LDAPI */
 
 #if defined(LDAP_IOCP)
-#define SLAPD_WAKEUP_TIMER 250
+#define SLAPD_WAKEUP_TIMER 25
 #else
-#define SLAPD_WAKEUP_TIMER 250
+#define SLAPD_WAKEUP_TIMER 25
 #endif
 
 int slapd_wakeup_timer = SLAPD_WAKEUP_TIMER; /* time in ms to wakeup */
6ms     896
7ms     230
8ms     19
9ms     7
10ms    1
11ms    1
12ms    1
13ms    3
14ms    3
15ms    1
28ms    4
29ms    322
30ms    377
31ms    45
32ms    6
33ms    1
34ms    2
35ms    2
36ms    4
37ms    3
38ms    1
52ms    33
53ms    30
54ms    4
55ms    2
61ms    1
77ms    1

@Firstyear
Copy link
Contributor

Riggghhhttttt interesting, that would indicate to me that there is a problem with sasl/krb from indicating that the socket is ready for work.

@tiran
Copy link
Contributor Author

tiran commented Jan 4, 2021

I have tried a different patch that only modifies the timout of POLL_FN (PR_Poll()). I'm now convinced that the performance issue is related to the core connection handling loop in slapd_daemon() function:

/* The meat of the operation is in a loop on a call to select */
while (!g_get_shutdown()) {
int select_return = 0;
PRErrorCode prerr;
setup_pr_read_pds(the_connection_table, n_tcps, s_tcps, i_unix, &num_poll);
select_return = POLL_FN(the_connection_table->fd, num_poll, pr_timeout);
switch (select_return) {
case 0: /* Timeout */
break;
case -1: /* Error */
prerr = PR_GetError();
slapi_log_err(SLAPI_LOG_TRACE, "slapd_daemon", "PR_Poll() failed, " SLAPI_COMPONENT_NAME_NSPR " error %d (%s)\n",
prerr, slapd_system_strerror(prerr));
break;
default: /* either a new connection or some new data ready */
/* handle new connections from the listeners */
handle_listeners(the_connection_table);
/* handle new data ready */
handle_pr_read_ready(the_connection_table, connection_table_size);
clear_signal(the_connection_table->fd);
break;
}
}

@tbordaz
Copy link
Contributor

tbordaz commented Jan 4, 2021

This is a very nice finding. I have to admit I am a bit puzzled.

reducing pr_timeout to 0.025s means that if no operation (like sasl_bind) comes in within 0.025s (on an established connection), it will timeout/recompute the fd set/poll again. So the gain is to take into account faster new incoming connections.
My understanding would be that

3ms     329
4ms     1004
5ms     52
6ms     1
253ms   182
...

~1380 (out of 2000) connections were established on the first poll and their sasl_bind processed.
remaining connections (e.g. 182) had to wait for poll timeout to be taken into account.
The lack of response time is also due because no new operation come in on the first 1380 connections (after sasl_bind).

Is it possible to test a scenario with 2000 established connection, then 2000 sasl bind, to confirm if pr_timeout is impacting or not the response time ?

At the moment, I have the feeling that reducing the timeout improve the response time of new incoming connections (and established connections are idle (after sasl_bind)).

The benefit of response time will have the cost of the recomputation of the established connections.

@tiran
Copy link
Contributor Author

tiran commented Jan 4, 2021

I choose 25ms for testing only. My hack is not a proper solution.

You can find my trivial test script at https://gist.github.com/tiran/4b56faf5a8b14b9828ef8ba2ad9292ba

@Firstyear
Copy link
Contributor

i think the problem here is that if there is no new work/delay on the POLL_FN and we wait up to pr_timeout, then we are delaying the call to handle_listeners to accept new connections. Then in handle_new_connection when we call to accept_and_configure we again, call PR_Accept with pr_timeout set to the slapd_wakeup_timer.

So POLL_FN is not polling the listeners, only connections in the CT. So that then cascades to

  • while accepting, we are not polling the existing connections for new work.
  • while polling for new work, we are delaying the acceptance of new connections.
  • while accepting we may also need to timeout before we can go back to polling.

Which certainly would align to @tiran's observations that binds are delayed sporadicly, because a bind generally is a new connection, and since it would complete "rapidly" then the poll for new work would hit timeout because there is no longer term connections active in the conntable.

The solution is probably not trivial, but we could consider breaking out each fd listener for accept to a unique thread and changing their timeout to PR_INTERVAL_NO_TIMEOUT so that they wake and accept immediately on a new connection, and the core select loop is only for the CT and has no interaction with PR_Accept.

A slightly more "palatable" intermediate fix is rather than per-thread for each listener, to break out accept to a single thread and have it iterate on just accepts. IE move handle_listeners to a unique thread in a while (!g_get_shutdown()) loop.

@tbordaz Thoughts?

@tbordaz
Copy link
Contributor

tbordaz commented Jan 5, 2021

i think the problem here is that if there is no new work/delay on the POLL_FN and we wait up to pr_timeout, then we are delaying the call to handle_listeners to accept new connections. Then in handle_new_connection when we call to accept_and_configure we again, call PR_Accept with pr_timeout set to the slapd_wakeup_timer.

So POLL_FN is not polling the listeners, only connections in the CT. So that then cascades to

....

@tbordaz Thoughts?

This is exactly what puzzled me. IMHO listeners are polled. ct->fd (polled array) is filled with n_tcps, s_tcps, i_unix, then with established & !gettingber connections.
Before diving into patch review, do you mind showing me in which cases listeners are ignored from the poll ?.
Also note that changing only deamon/poll pr_timeout improves response time so I am not sure how pr_timeout in accept_and_configure contribute.

@Firstyear
Copy link
Contributor

@tbordaz Yeah, @tiran reported my hack fix didn't work, so I'm going to re-read to see what I misunderstood. Something seems wrong here, but I think it's because we combine accept + read io events in the one poll loop.

@tbordaz
Copy link
Contributor

tbordaz commented Jan 6, 2021

slapi_ldap_bind performs serialization for mech GSSAPI only. I can reproduce slow bind with mech GSS-SPNEGO, too.

if (mech && !strcmp(mech, "GSSAPI")) {
krb5_serialized = 1;
}

@tiran, this portion of code is specific to GSSAPI auth on outgoing connection (replication agreements, chaining). It is not related to the reported concern (response time of incoming GSSAPI auth).

Firstyear added a commit to Firstyear/389-ds-base that referenced this issue Jan 12, 2021
Bug Description: While investigating 4506 it was noticed that
it was possible to exceed the capacity of the connection table
fd array if you had many listeners and a large number of
connections. The number of connections required and in the
correct state to cause this is in the thousands and would
be infeasible in reality, but it is still worth defending
from this.

Fix Description: Add the correct bound on the while loop
setting up the fd for polling.

relates: 389ds#4506

Author: William Brown <william@blackhats.net.au>

Review by: ???
Firstyear added a commit to Firstyear/389-ds-base that referenced this issue Jan 19, 2021
Bug Description: Previously we accepted connections and
selected for new work in the same event loop. This could
cause connection table polling to delay accepts, and
accepts to delay connection activity from being ready.

Fix Description: This seperates those functions allowing
accept to occur in parallel to our normal work.

fixes: 389ds#4506

Author: William Brown <william@blackhats.net.au>

Review by: ???
Firstyear added a commit that referenced this issue Jan 19, 2021
Issue 4506 - RFE - connection accept thread

Bug Description: Previously we accepted connections and
selected for new work in the same event loop. This could
cause connection table polling to delay accepts, and
accepts to delay connection activity from being ready.

Fix Description: This seperates those functions allowing
accept to occur in parallel to our normal work.

fixes: #4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @progier389 (Thanks!)
@mreynolds389
Copy link
Contributor

We have a major problem. This broke LDAPI --> the installer, and CLI tools. Installer and CLI tools now hang indefinitely. Please fix this soon, or we will have to revert it.

@mreynolds389 mreynolds389 reopened this Jan 19, 2021
@Firstyear
Copy link
Contributor

@mreynolds389 Please revert it, I can't reproduce this so it may take me some time to investigate and solve. I will re-submit once I can reproduce and have a fix. Are there more details about the hang?

@mreynolds389
Copy link
Contributor

I took a stack trace:

Here are the only two threads of interest during the hang:

Thread 26 (Thread 0x7f9d9fdff700 (LWP 1639111)):
#0  0x00007f9df89a3aaf in poll () from target:/lib64/libc.so.6
#1  0x00007f9df867d416 in poll (__timeout=250, __nfds=2, __fds=0x7f9d9fdfcfc0) at /usr/include/bits/poll2.h:46
#2  _pr_poll_with_poll (pds=0x7f9da9a02000, npds=npds@entry=2, timeout=timeout@entry=250) at ../../.././nspr/pr/src/pthreads/ptio.c:4227
#3  0x00007f9df8680e29 in PR_Poll (pds=<optimized out>, npds=npds@entry=2, timeout=timeout@entry=250) at ../../.././nspr/pr/src/pthreads/ptio.c:4647
#4  0x0000555d51d61739 in accept_thread (vports=0x7ffd02f0a4f0) at ldap/servers/slapd/daemon.c:838
#5  0x00007f9df8682554 in _pt_root (arg=0x7f9df440ea00) at ../../.././nspr/pr/src/pthreads/ptthread.c:201
#6  0x00007f9df8614432 in start_thread () from target:/lib64/libpthread.so.0
#7  0x00007f9df89ae913 in clone () from target:/lib64/libc.so.6


Thread 1 (Thread 0x7f9df792a2c0 (LWP 1639086)):
#0  0x00007f9df89a3aaf in poll () from target:/lib64/libc.so.6
#1  0x00007f9df867d416 in poll (__timeout=250, __nfds=1, __fds=0x7ffd02f0a050) at /usr/include/bits/poll2.h:46
#2  _pr_poll_with_poll (pds=0x7f9daf4ed240, npds=npds@entry=1, timeout=timeout@entry=250) at ../../.././nspr/pr/src/pthreads/ptio.c:4227
#3  0x00007f9df8680e29 in PR_Poll (pds=<optimized out>, npds=npds@entry=1, timeout=timeout@entry=250) at ../../.././nspr/pr/src/pthreads/ptio.c:4647
#4  0x0000555d51d60355 in slapd_daemon (ports=<optimized out>) at ldap/servers/slapd/daemon.c:1469
#5  0x0000555d51d53f7f in main (argc=5, argv=0x7ffd02f0a918) at ldap/servers/slapd/main.c:1119

I'll try and do some more testing tonight if I get time...

@Firstyear
Copy link
Contributor

What test produced the issue?

@mreynolds389
Copy link
Contributor

What test produced the issue?

Running dscreate :-) Then I tried dsconf and it also just hung. Appears to be LDAPI specific... I rebuilding the server now to confirm the behavior. I should have confirmation soon...

@mreynolds389
Copy link
Contributor

dscreate hangs when it tries to bind as root:

DEBUG: open(): Connecting to uri ldapi://%2Fvar%2Frun%2Fslapd-localhost.socket
DEBUG: Using dirsrv ca certificate /etc/dirsrv/slapd-localhost
DEBUG: Using external ca certificate /etc/dirsrv/slapd-localhost
DEBUG: Using external ca certificate /etc/dirsrv/slapd-localhost
DEBUG: Using /etc/openldap/ldap.conf certificate policy
DEBUG: ldap.OPT_X_TLS_REQUIRE_CERT = 2
DEBUG: open(): Using root autobind ...

Same thing for dsconf. The server is not detecting/accepting the LDAPI connection

Firstyear added a commit to Firstyear/389-ds-base that referenced this issue Jan 20, 2021
Bug Description: during review it was requested that a piece
of code be changed which seemed quite innocent. The code was
moved but the logic around the code wasn't considered
causing the fd array for the accept thread to be allocated with
a size of zero, causing the values to be lost.

Fix Description: Move the allocation to the correct location.

fixes: 389ds#4506

Author: William Brown <william@blackhats.net.au>

Review by: ???
Firstyear added a commit that referenced this issue Jan 21, 2021
Bug Description: during review it was requested that a piece
of code be changed which seemed quite innocent. The code was
moved but the logic around the code wasn't considered
causing the fd array for the accept thread to be allocated with
a size of zero, causing the values to be lost.

Fix Description: Move the allocation to the correct location.

fixes: #4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @droideck
@tbordaz
Copy link
Contributor

tbordaz commented Jan 21, 2021

a4a53e1..f3bedfd Issue 4506 - BUG - fix oob alloc for fds (#4555)
9015bff..e4f282e Issue 4506 - Temporary fix for io issues (#4516)

mreynolds389 pushed a commit that referenced this issue Jan 28, 2021
Issue 4506 - RFE - connection accept thread

Bug Description: Previously we accepted connections and
selected for new work in the same event loop. This could
cause connection table polling to delay accepts, and
accepts to delay connection activity from being ready.

Fix Description: This seperates those functions allowing
accept to occur in parallel to our normal work.

fixes: #4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @progier389 (Thanks!)
mreynolds389 pushed a commit that referenced this issue Jan 28, 2021
Bug Description: during review it was requested that a piece
of code be changed which seemed quite innocent. The code was
moved but the logic around the code wasn't considered
causing the fd array for the accept thread to be allocated with
a size of zero, causing the values to be lost.

Fix Description: Move the allocation to the correct location.

fixes: #4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @droideck
@mreynolds389 mreynolds389 removed the needs triage The issue will be triaged during scrum label Jan 28, 2021
mreynolds389 added a commit to mreynolds389/389-ds-base that referenced this issue Apr 30, 2021
Description:

Converted all SLAPI_LOG_TRACE logging to Connection logging (SLAPI_LOG_CONNS).

sasl_errstring() perform a simple and fast switch case mapping from
error code to const string.

relates : 389ds#4506

Signed-off-by: Christian Heimes <cheimes@redhat.com>

Reviewed by: mreynolds
mreynolds389 added a commit that referenced this issue Jun 16, 2021
Description:

Converted all SLAPI_LOG_TRACE logging to Connection logging (SLAPI_LOG_CONNS).

sasl_errstring() perform a simple and fast switch case mapping from
error code to const string.

relates : #4506

Signed-off-by: Christian Heimes <cheimes@redhat.com>

Reviewed by: mreynolds
mreynolds389 added a commit that referenced this issue Jun 16, 2021
Description:

Converted all SLAPI_LOG_TRACE logging to Connection logging (SLAPI_LOG_CONNS).

sasl_errstring() perform a simple and fast switch case mapping from
error code to const string.

relates : #4506

Signed-off-by: Christian Heimes <cheimes@redhat.com>

Reviewed by: mreynolds
mreynolds389 added a commit that referenced this issue Jun 16, 2021
Description:

Converted all SLAPI_LOG_TRACE logging to Connection logging (SLAPI_LOG_CONNS).

sasl_errstring() perform a simple and fast switch case mapping from
error code to const string.

relates : #4506

Signed-off-by: Christian Heimes <cheimes@redhat.com>

Reviewed by: mreynolds
@mreynolds389
Copy link
Contributor

1e3f32d..fa46922 389-ds-base-1.4.4 -> 389-ds-base-1.4.4
f8f616b..ddc2803 389-ds-base-1.4.3 -> 389-ds-base-1.4.3

jchapma pushed a commit to jchapma/389-ds-base that referenced this issue Aug 11, 2021
Issue 4506 - RFE - connection accept thread

Bug Description: Previously we accepted connections and
selected for new work in the same event loop. This could
cause connection table polling to delay accepts, and
accepts to delay connection activity from being ready.

Fix Description: This seperates those functions allowing
accept to occur in parallel to our normal work.

fixes: 389ds#4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @progier389 (Thanks!)
jchapma pushed a commit to jchapma/389-ds-base that referenced this issue Aug 11, 2021
Bug Description: during review it was requested that a piece
of code be changed which seemed quite innocent. The code was
moved but the logic around the code wasn't considered
causing the fd array for the accept thread to be allocated with
a size of zero, causing the values to be lost.

Fix Description: Move the allocation to the correct location.

fixes: 389ds#4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @droideck
jchapma added a commit that referenced this issue Aug 16, 2021
Issue 4506 - Temporary fix for io issues (#4516)

Bug Description: Previously we accepted connections and
selected for new work in the same event loop. This could
cause connection table polling to delay accepts, and
accepts to delay connection activity from being ready.

Fix Description: This seperates those functions allowing
accept to occur in parallel to our normal work.

fixes: #4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @progier389 (Thanks!)

Issue 4506 - BUG - fix oob alloc for fds (#4555)

Bug Description: during review it was requested that a piece
of code be changed which seemed quite innocent. The code was
moved but the logic around the code wasn't considered
causing the fd array for the accept thread to be allocated with
a size of zero, causing the values to be lost.

Fix Description: Move the allocation to the correct location.

fixes: #4506

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389 @droideck

Co-authored-by: Firstyear <william@blackhats.net.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants