Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpd/3.2.2 segfaults when accessing /admin/proc #3154

Closed
felixjogris opened this issue Aug 23, 2020 · 17 comments
Closed

httpd/3.2.2 segfaults when accessing /admin/proc #3154

felixjogris opened this issue Aug 23, 2020 · 17 comments
Labels
3.2 affects 3.2 3.3 affects 3.3 dev releases

Comments

@felixjogris
Copy link

Hi,

httpd of a non-murder 3.2.2 setup segfaults when accessing /admin/proc with a webbrowser to get a list of currently running services. Syslog on the server:

process type:SERVICE name:http path:/usr/local/cyrus/libexec/httpd age:0.586s pid:9828 signaled to death by signal 11 (Segmentation fault, core dumped)

lldb reveals:

$ lldb -c ./httpd.core /usr/local/cyrus/libexec/httpd                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
(lldb) target create "/usr/local/cyrus/libexec/httpd" --core "./httpd.core"
Core file '/root/httpd.core' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'httpd', stop reason = signal SIGSEGV
  * frame #0: 0x000000000022eb7a httpd`sort_procinfo(k=0x0000000802243700, pa=0x0000000802243708, pb=0x00007fffffff9ff7) at http_admin.c:505:9
    frame #1: 0x0000000800c5b193 libcyrus.so.0`qsort_r_compar_func(a=0x0000000802243700, b=0x0000000802243708) at cyr_qsort_r.c:24:12
    frame #2: 0x0000000801731a70 libc.so.7`qsort + 4256
    frame #3: 0x0000000800c5b13a libcyrus.so.0`qsort_r(base=0x0000000802243700, nmemb=5, size=8, thunk=0x00007fffffff9ff7, compar=(httpd`sort_procinfo at http_admin.c:494)) at cyr_qsort_r.c:33:5
    frame #4: 0x000000000022c16a httpd`action_proc(txn=0x00007fffffffa718) at http_admin.c:615:5
    frame #5: 0x000000000022b8c6 httpd`meth_get(txn=0x00007fffffffa718, params=0x0000000000000000) at http_admin.c:212:17
    frame #6: 0x000000000029a763 httpd`process_request(txn=0x00007fffffffa718) at httpd.c:1799:9
    frame #7: 0x00000000002a3220 httpd`http1_input(txn=0x00007fffffffa718) at httpd.c:1871:29
    frame #8: 0x000000000029826f httpd`cmdloop(conn=0x00007fffffffd9a8) at httpd.c:2015:13
    frame #9: 0x0000000000297c94 httpd`service_main(argc=1, argv=0x0000000801bf4000, envp=0x00007fffffffed20) at httpd.c:951:5
    frame #10: 0x00000000002b0205 httpd`main(argc=1, argv=0x00007fffffffed10, envp=0x00007fffffffed20) at service.c:0:5
    frame #11: 0x000000000022b10f httpd`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7

Trivial fix:

--- imap/http_admin.c.orig      2020-06-22 02:44:18.000000000 +0200
+++ imap/http_admin.c   2020-08-24 00:38:50.095298000 +0200
@@ -493,47 +493,47 @@
                          void *k)
 {
     int r;
-    const struct proc_info **a = (const struct proc_info**)pa;
-    const struct proc_info **b = (const struct proc_info**)pb;
+    const struct proc_info *a = (const struct proc_info*)pa;
+    const struct proc_info *b = (const struct proc_info*)pb;
     char *key = (char*)k;
     int rev = islower((int) *key);
 
     switch (toupper((int) *key)) {
     default:
     case 'P':
-        r = (*a)->pid - (*b)->pid;
+        r = a->pid - b->pid;
         break;
 
     case 'S':
-        r = strcmp((*a)->servicename, (*b)->servicename);
+        r = strcmp(a->servicename, b->servicename);
         break;
 
     case 'Q':
-        r = (*a)->state - (*b)->state;
+        r = a->state - b->state;
         break;
 
     case 'T':
-        r = (*a)->start - (*b)->start;
+        r = a->start - b->start;
         break;
 
     case 'V':
-        r = (*a)->vmsize - (*b)->vmsize;
+        r = a->vmsize - b->vmsize;
         break;
 
     case 'H':
-        r = strcmp((*a)->host, (*b)->host);
+        r = strcmp(a->host, b->host);
         break;
 
     case 'U':
-        r = strcmp((*a)->user, (*b)->user);
+        r = strcmp(a->user, b->user);
         break;
 
     case 'R':
-        r = strcmp((*a)->mailbox, (*b)->mailbox);
+        r = strcmp(a->mailbox, b->mailbox);
         break;
 
     case 'C':
-        r = strcmp((*a)->cmdname, (*b)->cmdname);
+        r = strcmp(a->cmdname, b->cmdname);
         break;
     }

My environment:

$ uname -v
FreeBSD 12.1-RELEASE-p8 GENERIC 
$ pkg info | grep -i cyrus
cyrus-imapd32-3.2.2            Cyrus mail server, supporting POP3 and IMAP4 protocols
cyrus-sasl-2.1.27_1            RFC 2222 SASL (Simple Authentication and Security Layer)
$ grep -E 'altname|unixhi|virtdom' /usr/local/etc/imapd.conf
altnamespace:                   1
unixhierarchysep:               0
virtdomains:                    userid
@elliefm
Copy link
Contributor

elliefm commented Aug 24, 2020

I think the double-dereferencing there is deliberate, and that the real problem here is that FreeBSD is differentish from other platforms in some way that we aren't properly accommodating.

Have a look at 8e23292 -- we're going to some lengths here to detect platform quirks and interact with various qsort_r()s correctly.

I don't understand qsort_r's portability concerns well enough to just intuit the correct fix, but I think it will involve changing the block of macros before sort_procinfo(), and not changing sort_procinfo() itself.

@elliefm
Copy link
Contributor

elliefm commented Aug 24, 2020

Ahh, those macros were later moved to lib/cyr_qsort_r.h by 3302c80, which is probably why you didn't see them.

Do you have cunit installed? There's a unit test for this function... what does make check-cyr_qsort_r report?

@felixjogris
Copy link
Author

After ./configure'ing with --enable-unit-tests:

/usr/ports/mail/cyrus-imapd32/work/cyrus-imapd-3.2.2 # gmake check-cyr_qsort_r
depbase=`echo cunit/timeofday.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
cc -DHAVE_CONFIG_H -I.  -I/usr/ports/mail/cyrus-imapd32/work/cyrus-imapd-3.2.2/com_err/et -I. -I./lib -I. -I./lib -DLIBEXEC_DIR=\"/usr/local/cyrus/libexec\" -DSBIN_DIR=\"/usr/local/cyrus/sbin\" -DSYSCONF_DIR=\"/usr/local/etc\" -DHAVE_CONFIG_H  -I/usr/local/include -I/usr/include -I/usr/local/include   -I/usr/local/include/libxml2   -I/usr/local/include   -I/usr/local/include  -I/usr/local/include   -I/usr/local/include  -I/usr/local/include  -I/usr/local/include   -I/usr/local/include   -I/usr/local/include -I/usr/local/include -DPIC -fPIC  -O2 -pipe  -fstack-protector-strong -fno-strict-aliasing  -MT cunit/timeofday.o -MD -MP -MF $depbase.Tpo -c -o cunit/timeofday.o cunit/timeofday.c &&\
mv -f $depbase.Tpo $depbase.Po
cunit/timeofday.c:213:2: error: "Don't know how to intercept gettimeofday for this libc"
#error "Don't know how to intercept gettimeofday for this libc"
 ^
1 error generated.
gmake: *** [Makefile:4863: cunit/timeofday.o] Error 1

After adding this to cunit/timeofday.c at line 213:

#elif defined(__FreeBSD__)
static int real_gettimeofday(struct timeval *tv, ...)
{
    return gettimeofday(tv, NULL);
}
/usr/ports/mail/cyrus-imapd32/work/cyrus-imapd-3.2.2 # gmake check-cyr_qsort_r
./cunit/cunit.pl --project cunit/default.cunit --generate-wrapper cunit/imapurl.testc
cc -DHAVE_CONFIG_H -I.  -I/usr/ports/mail/cyrus-imapd32/work/cyrus-imapd-3.2.2/com_err/et -I. -I./lib -I. -I./lib -DLIBEXEC_DIR=\"/usr/local/cyrus/libexec\" -DSBIN_DIR=\"/usr/local/cyrus/sbin\" -DSYSCONF_DIR=\"/usr/local/etc\" -DHAVE_CONFIG_H  -I/usr/local/include -I/usr/include -I/usr/local/include   -I/usr/local/include/libxml2   -I/usr/local/include   -I/usr/local/include  -I/usr/local/include   -I/usr/local/include  -I/usr/local/include  -I/usr/local/include   -I/usr/local/include   -I/usr/local/include -I/usr/local/include -DPIC -fPIC  -O2 -pipe  -fstack-protector-strong -fno-strict-aliasing  -c -o cunit/imapurl.o cunit/imapurl.testc-cunit.c
In file included from cunit/imapurl.testc-cunit.c:2:
In file included from ./cunit/imapurl.testc:1:
/usr/include/malloc.h:3:2: error: "<malloc.h> has been replaced by <stdlib.h>"
#error "<malloc.h> has been replaced by <stdlib.h>"
 ^
In file included from cunit/imapurl.testc-cunit.c:2:
./cunit/imapurl.testc:29:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl.freeme);
    ^
./cunit/imapurl.testc:55:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl.freeme);
    ^
./cunit/imapurl.testc:81:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl.freeme);
    ^
./cunit/imapurl.testc:107:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl.freeme);
    ^
./cunit/imapurl.testc:137:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl.freeme);
    ^
./cunit/imapurl.testc:167:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl.freeme);
    ^
./cunit/imapurl.testc:369:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl.freeme);
    ^
./cunit/imapurl.testc:395:5: warning: implicit declaration of function 'free' is invalid in C99 [-Wimplicit-function-declaration]
    free(iurl2.freeme);
    ^
8 warnings and 1 error generated.

Finally, after replacing a couple of #include <malloc.h> by #include <stdlib.h>:

/usr/ports/mail/cyrus-imapd32/work/cyrus-imapd-3.2.2 # gmake check-cyr_qsort_r
Running unit tests for cyr_qsort_r


     CUnit - A unit testing framework for C - Version 2.1-3
     http://cunit.sourceforge.net/


Suite: cyr_qsort_r
  Test: qsort_r ...FAILED
    1. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=2,i=0)
    2. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=7,i=1)
    3. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=12,i=2)
    4. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=8,i=3)
    5. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=12,i=4)
    6. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=8,i=5)
    7. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=14,i=6)
    8. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=11,i=7)
    9. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=18,i=8)
    10. ./cunit/cyr_qsort_r.testc:24  - CU_ASSERT_EQUAL(data[i]=10,i=9)
    11. ./cunit/cyr_qsort_r.testc:26  - CU_ASSERT_NOT_EQUAL(count=0,0=0)

Run Summary:    Type  Total    Ran Passed Failed Inactive
              suites     42      1    n/a      0        0
               tests    486      1      0      1        0
             asserts     11     11      0     11      n/a

Elapsed time =    0.000 seconds
gmake: *** [Makefile:7122: check-cyr_qsort_r] Error 1

The FreeBSD ports system applies this patch before building:

/usr/ports/mail/cyrus-imapd32/files # cat patch-lib__cyr_qsort_r.c 
--- lib/cyr_qsort_r.c.orig      2019-12-04 02:17:01 UTC
+++ lib/cyr_qsort_r.c
@@ -18,14 +18,14 @@ EXPORTED void cyr_qsort_r(void *base, size_t nmemb, si
 // NOTE: this is kinda ugly, but it's OK if you're not multithreaded
 
 static void *qsort_r_thunk;
-static int (*qsort_r_compar)(const void *, const void *, void *);
+static int (*qsort_r_compar)QSORT_R_COMPAR_ARGS(const void *, const void *, void *);
 static int qsort_r_compar_func(const void *a, const void *b)
 {
     return qsort_r_compar(a, b, qsort_r_thunk);
 }
 
 EXPORTED void cyr_qsort_r(void *base, size_t nmemb, size_t size,
-                          int (*compar)(const void *, const void *, void *),
+                          int (*compar)QSORT_R_COMPAR_ARGS(const void *, const void *, void *),
                           void *thunk)
 {
     qsort_r_thunk = thunk;

I just added this to qsort_r_compar_func():

 static int qsort_r_compar_func(const void *a, const void *b)
 {
-    return qsort_r_compar QSORT_R_COMPAR_ARGS(a, b, qsort_r_thunk);
+    return qsort_r_compar(a, b, qsort_r_thunk);
 }

Now the unit test succeeds:

/usr/ports/mail/cyrus-imapd32/work/cyrus-imapd-3.2.2 # gmake check-cyr_qsort_r
Running unit tests for cyr_qsort_r


     CUnit - A unit testing framework for C - Version 2.1-3
     http://cunit.sourceforge.net/


Suite: cyr_qsort_r
  Test: qsort_r ...passed

Run Summary:    Type  Total    Ran Passed Failed Inactive
              suites     42      1    n/a      0        0
               tests    486      1      1      0        0
             asserts     11     11     11      0      n/a

Elapsed time =    0.000 seconds

@felixjogris
Copy link
Author

httpd has been running ok with the last change to qsort_r_compar_func() - no segfaults, and html process table is sorted correctly.

elliefm added a commit that referenced this issue Aug 26, 2020
malloc.h is nonstandard, and some platforms complain about it.
Reported in #3154.
elliefm added a commit that referenced this issue Aug 26, 2020
malloc.h is nonstandard, and some platforms complain about it.
Reported in #3154.
@elliefm
Copy link
Contributor

elliefm commented Aug 26, 2020

That's good to hear. I'm having trouble following the sequence of cyr_qsort_r changes in the patches (both the ports system's and your own). Can you please attach your current (working) cyr_qsort_r.h and cyr_qsort_r.c, so that I can diff them against our git sources locally? I think that will make it a bit easier for me to see what's going on.

@elliefm
Copy link
Contributor

elliefm commented Aug 26, 2020

I went looking around to see what other projects experiences of this are, and it's a bit of a minefield. Interestingly, FreeBSD appears to be moving towards using the glibc-style qsort_r argument order: https://reviews.freebsd.org/D17083. I don't completely understand their process but from the discussion it sounds like they're expecting to be glibc-compatible here for FreeBSD-13, so at some point we'll need to update our detection macros in cyr_qsort_r to cope with that.

I don't really understand the motivation behind the existing patch-lib__cyr_qsort_r.c. It would be very interesting to see what happens if you can build/test it without that specific patch -- like, maybe the bug is in the patch.

It would also be interesting to see what happens if you build from a pure source tarball and or a git clone, but skimming through the other patches in the port, it might be fiddly/distracting at the moment. So the main thing I'm interested in, in addition to your current working sources, is what happens when cyr_qsort_r.c specifically is completely unpatched.

@felixjogris
Copy link
Author

cyr_qsort.h gets not modified on FreeBSD:

/* cyr_qsort_r.h -- Cyrus compatibility header for qsort_r
 *
 * Always use the QSORT_R_COMPAR_ARGS macro to declare the
 * comparison function, e.g.
 *
 *     int mysort QSORT_R_COMPAR_ARGS(const void *pa,
 *                                    const void *pb,
 *                                    void *arg)
 *     {
 *          ... your code here ...
 *     }
 *
 */
#ifndef INCLUDED_CYR_QSORT_R_H
#define INCLUDED_CYR_QSORT_R_H

#include "config.h"

#include <stdlib.h>

#if defined(_GNU_SOURCE) && defined (__GLIBC__) && \
        ((__GLIBC__ > 2) || ((__GLIBC__ == 2) && (__GLIBC_MINOR__ >=0)))
#define HAVE_GLIBC_QSORT_R
#endif

#if defined(__NEWLIB__) && \
        ((__NEWLIB__ > 2) || ((__NEWLIB__ == 2) && (__NEWLIB_MINOR__ >= 2)))
#if defined(_GNU_SOURCE)
#define HAVE_GLIBC_QSORT_R
#else
#define HAVE_BSD_QSORT_R
#endif
#endif

#if !defined(HAVE_GLIBC_QSORT_R) && \
        (defined(__FreeBSD__) || defined(__DragonFly__) || defined(__APPLE__))
#define HAVE_BSD_QSORT_R
#endif

#ifdef HAVE_BSD_QSORT_R
#define QSORT_R_COMPAR_ARGS(a,b,c) (c,a,b)
#define cyr_qsort_r(base, nmemb, size, compar, thunk) qsort_r(base, nmemb, size, thunk, compar)
#else
#define QSORT_R_COMPAR_ARGS(a,b,c) (a,b,c)
#  if defined(HAVE_GLIBC_QSORT_R)
#define cyr_qsort_r(base, nmemb, size, compar, thunk) qsort_r(base, nmemb, size, compar, thunk)
#  elif defined(__GNUC__)
extern void cyr_qsort_r(void *base, size_t nmemb, size_t size,
                        int (*compar)(const void *, const void *, void *),
                        void *thunk);
#  else
#    error No qsort_r support
#  endif
#endif

#endif /* INCLUDED_CYR_QSORT_R_H */

Working cyr_qsort.c:

#include "cyr_qsort_r.h"

#ifdef HAVE_FUNCTION_NESTING

EXPORTED void cyr_qsort_r(void *base, size_t nmemb, size_t size,
                          int (*compar)(const void *, const void *, void *),
                          void *thunk)
{
    int compar_func(const void *a, const void *b)
    {
        return compar(a, b, thunk);
    }
    qsort(base, nmemb, size, compar_func);
}

#else

// NOTE: this is kinda ugly, but it's OK if you're not multithreaded

static void *qsort_r_thunk;
static int (*qsort_r_compar)QSORT_R_COMPAR_ARGS(const void *, const void *, void *);
static int qsort_r_compar_func(const void *a, const void *b)
{
    return qsort_r_compar QSORT_R_COMPAR_ARGS(a, b, qsort_r_thunk);
}

EXPORTED void cyr_qsort_r(void *base, size_t nmemb, size_t size,
                          int (*compar)QSORT_R_COMPAR_ARGS(const void *, const void *, void *),
                          void *thunk)
{
    qsort_r_thunk = thunk;
    qsort_r_compar = compar;
    qsort(base, nmemb, size, qsort_r_compar_func);
    qsort_r_thunk = NULL;
    qsort_r_compar = NULL;
}

#endif

qsort_r() on FreeBSD 12 expects argument void *thunk before function pointer int (*compar)(), and compar() expects thunk as first argument.

@elliefm elliefm added 3.2 affects 3.2 3.3 affects 3.3 dev releases labels Aug 28, 2020
@elliefm
Copy link
Contributor

elliefm commented Aug 28, 2020

This is quite strange: I'm stepping through the macros in cyr_qsort_r.h and I think that the cyr_qsort_r() function in cyr_qsort_r.c should not matter.

Like, we use that function-based implementation if-and-only-if neither "HAVE_GLIBC_QSORT_R" nor "HAVE_BSD_QSORT_R" are defined. If either are defined, then cyr_qsort_r() is a defined as a macro that calls qsort_r() with the appropriate argument re-ordering, and the actual function should be unused.

Notice that the cyr_qsort_r() function is an absolute bare-bones fallback that calls qsort() (NOT qsort_r()!), so we would only expect to be using this function at all on a platform that has no qsort_r!

So, maybe the problem is that the function needs to be not-compiled if we're using the macros... but that would explain compilation errors, but doesn't explain the crash. And it doesn't explain why patching the function fixes the crash. Hmm.

Can you please run something like gcc -I. -Ilib -Iimap -Icom_err/et -E -dD lib/cyr_qsort_r.c > some_file -- this will produce a huge amount of output, so please capture it to a file and attach it as a file here. You can attach a file using the comment box on the Github website, but I'm not sure if you can by email.

The -E argument is "just run the preprocessor, and spit out everything that would have been compiled", and the -dD is "also spit out all the macros that were defined along the way". I'm not sure if these are gcc-specific options or if they're more general, and I'm also not sure which compiler FreeBSD uses by default, so it might take some experimentation to figure out the right invocation that produces the desired output -- I need the preprocessor output including the macro definitions. I also needed to add all the -I (uppercase i) arguments because I was calling gcc directly, rather than having make do it; you might need to fiddle around with these as well.

@felixjogris
Copy link
Author

Please see attached file which is the output of cc -I. -Ilib -Iimap -Icom_err/et -E -dD lib/cyr_qsort_r.c > /tmp/lib_cyr_qsort_r.txt
lib_cyr_qsort_r.txt

@felixjogris
Copy link
Author

cc --version:

FreeBSD clang version 8.0.1 (tags/RELEASE_801/final 366581) (based on LLVM 8.0.1)
Target: x86_64-unknown-freebsd12.1
Thread model: posix
InstalledDir: /usr/bin

@felixjogris
Copy link
Author

felixjogris commented Aug 30, 2020

As cyr_qsort_r() from cyr_qsort_r.c is not hidden by the macros from cyr_qsort_r.h, I'm suggesting this patch based on FreeBSD's ones:

--- lib/cyr_qsort_r.c.orig      2020-08-24 02:27:42.000000000 +0200
+++ lib/cyr_qsort_r.c   2020-08-30 14:29:39.684480000 +0200
@@ -13,19 +13,19 @@
     qsort(base, nmemb, size, compar_func);
 }
 
-#else
+#elif !defined cyr_qsort_r
 
 // NOTE: this is kinda ugly, but it's OK if you're not multithreaded
 
 static void *qsort_r_thunk;
-static int (*qsort_r_compar)(const void *, const void *, void *);
+static int (*qsort_r_compar)QSORT_R_COMPAR_ARGS(const void *, const void *, void *);
 static int qsort_r_compar_func(const void *a, const void *b)
 {
-    return qsort_r_compar(a, b, qsort_r_thunk);
+    return qsort_r_compar QSORT_R_COMPAR_ARGS(a, b, qsort_r_thunk);
 }
 
 EXPORTED void cyr_qsort_r(void *base, size_t nmemb, size_t size,
-                          int (*compar)(const void *, const void *, void *),
+                          int (*compar)QSORT_R_COMPAR_ARGS(const void *, const void *, void *),
                           void *thunk)
 {
     qsort_r_thunk = thunk;

It currently runs on my BSD box, avoids compiling cyr_qsort_r if defined by a macro, and honors reversed arguments of compar() on BSD when there's no native qsort_r() available.

@elliefm
Copy link
Contributor

elliefm commented Aug 31, 2020

What happens if you exclude the FreeBSD patch entirely, and only leave this change:

-#else
+#elif !defined cyr_qsort_r

I suspect /admin/proc will work correctly, but it will start breaking in a bunch of new places...

I'm looking very closely at the comment at the top of cyr_qsort_r.h, which reads:

Always use the QSORT_R_COMPAR_ARGS macro to declare the
comparison function, e.g.

int mysort QSORT_R_COMPAR_ARGS(const void *pa,
[...]

and I think its intent is this macro MUST be used for the "compar" functions, but not for cyr_qsort_r.

But, looking through our source, I can see a bunch of places where our compar function has not been declared using this macro (e.g. _contactsquery_cmp() in jmap_contact.c). Whereas sort_procinfo() is declared correctly.

My gut feeling at the moment is that maybe the FreeBSD patch was mistakenly added to fix the calls that were missing the macro. But then that has meant that the ones that are already correctly declared (sort_procinfo() et al) crash because their arguments are now being re-ordered twice.

If my hunch is correct, there will be three parts to this fixing this properly:

  1. compiling cyr_qsort_r() out entirely when it's already being handled as a macro, like your #elif ... change does
  2. fixing all the compar functions to use QSORT_R_COMPAR_ARGS correctly, like sort_procinfo() already does
  3. removing the patch from FreeBSD ports, because the problem it tried to solve has now been solved correctly by 2

I guess step 3 will require coordination with the ports maintainer, once there is a real release containing this set of fixes. I'm not currently sure how to reach them, but I guess the first step is confirming that 1 & 2 work, and we can worry about 3 once we're confident they do.

@elliefm
Copy link
Contributor

elliefm commented Aug 31, 2020

Actually I'm looking at the fallback cyr_qsort_r() function closely again and realising our version doesn't use QSORT_R_COMPAR_ARGS at all, and surely it must need to, otherwise it would still break on BSD's without qsort_r. I think I had misremembered the FreeBSD patch contents. So maybe the FreeBSD patch was already mostly-correct, and with your qsort_r_compar_func() fix it becomes fully-correct. Regardless, we still need to fix all our own compar functions that are missing the macro. I'm working on a patch for that now...

elliefm added a commit to elliefm/cyrus-imapd that referenced this issue Aug 31, 2020
i.e. if the system has NO qsort_r function and we need to fake it
with qsort

Part of cyrusimap#3154
elliefm added a commit to elliefm/cyrus-imapd that referenced this issue Aug 31, 2020
Based on a patch from FreeBSD ports and modified further by @felixjogris

Part of cyrusimap#3154
elliefm added a commit to elliefm/cyrus-imapd that referenced this issue Aug 31, 2020
@elliefm
Copy link
Contributor

elliefm commented Aug 31, 2020

This is where I'm at: cyrus-imapd-3.2...elliefm:v32/3154-cyr_qsort_r

  • 6ad13a8 cyr_qsort_r: only compile fallback function if we need it ...

This is the equivalent of your "#elif !defined cyr_qsort_r" patch, but it also protects the nested-functions implementation, in case clang starts supporting these.

  • b497151 cyr_qsort_r: fallback func needs arguments ordered correctly ...

This is the FreeBSD patch plus your fix. I might still remove this, more discussion below.

  • cd43737 jmap_contact: cyr_qsort_r compar func needs args in correct order ...

And this one fixes up the bad compar function in jmap_contact, which turned out to be the only one that was missing the macro after all.

I have finally understood how the FreeBSD patch worked:

When we #define cyr_qsort_r(...) qsort_r(...) for whichever implementation, this was also rewriting the implementation of the fallback function, which would end up being compiled as qsort_r() -- which would then maybe or maybe not take precedence over the system qsort_r, depending on linking order and such. It seems like on FreeBSD it was taking precedence, which meant that we were never calling the real qsort_r, only our fallback. This explains why patching the fallback fixed the crash!

But with the new commit such that we only compile the fallback function if we need it, it will no longer accidentally replace the real qsort_r, and instead cyr_qsort_r will just be a wrapper macro as intended. So, now the fallback should really be a fallback! This should make the FreeBSD patch and your fix to it redundant for FreeBSD, since it will now use the BSD qsort_r and not the fallback. Actually, it might make it completely redundant for every system (since a system with no qsort_r at all cannot possibly have opinions about the ordering of the thunk argument)... I need to think about this more still.

But I think, all along, the problem was that we didn't make the fallback function conditional, and thus accidentally replaced qsort_r instead of just wrapping it.

Can you please try it out with this set of patches? (You can download any git commit as a patch file from github by just adding ".patch" to the end of the address.) They should apply cleanly on 3.2.2, or you can try with 3.2.3 which came out a few days ago. You'll still need your cunit/timeofday.c patch either way, but the "malloc.h -> stdlib.h" problem is fixed in 3.2.3.

(I didn't integrate the cunit/timeofday.c patch, because it doesn't replace the function, just drops the real one in place. This is adequate for the cyr_qsort_r test, which doesn't care about fiddling with dates/times anyway, but I think it won't work for many of the tests, which expect to be able to mock the system time.)

@felixjogris
Copy link
Author

What happens if you exclude the FreeBSD patch entirely, and only leave this change:

-#else
+#elif !defined cyr_qsort_r

Compiles and runs fine. Sorting on /admin/proc works ok.

@felixjogris
Copy link
Author

Can you please try it out with this set of patches? (You can download any git commit as a patch file from github by just adding ".patch" to the end of the address.) They should apply cleanly on 3.2.2, or you can try with 3.2.3 which came out a few days ago. You'll still need your cunit/timeofday.c patch either way, but the "malloc.h -> stdlib.h" problem is fixed in 3.2.3.

All 3 patches apply and run ok in regards to /admin/proc. FreeBSD uses 3.2.3 already.
I haven't been using JMAP, thus can not test if the changes to imap/jmap_contact.c are ok. But I guess imap/jmap_calendar.c needs patching, too:

--- imap/jmap_calendar.c.orig   2020-08-31 15:54:46.663555000 +0200
+++ imap/jmap_calendar.c        2020-08-31 15:56:04.689266000 +0200
@@ -3897,7 +3897,10 @@
     size_t nsort;
 };
 
-static int eventquery_cmp(const void *va, const void *vb, void *vrock)
+static int eventquery_cmp QSORT_R_COMPAR_ARGS(
+                          const void *va,
+                          const void *vb,
+                          void *vrock)
 {
     enum caldav_sort *sort = ((struct eventquery_cmp_rock*)vrock)->sort;
     size_t nsort = ((struct eventquery_cmp_rock*)vrock)->nsort;

@elliefm
Copy link
Contributor

elliefm commented Sep 1, 2020

Oh, I missed that one somehow, thanks!

Though it turns out the two JMAP ones were already fixed on master a few months, so I'll replace my bespoke commit with a cherry-pick of that one. :)

elliefm added a commit to elliefm/cyrus-imapd that referenced this issue Sep 2, 2020
i.e. if the system has NO qsort_r function and we need to fake it
with qsort

Part of cyrusimap#3154
elliefm added a commit to elliefm/cyrus-imapd that referenced this issue Sep 4, 2020
i.e. if the system has NO qsort_r function and we need to fake it
with qsort

Part of cyrusimap#3154
@elliefm elliefm closed this as completed Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.2 affects 3.2 3.3 affects 3.3 dev releases
Projects
None yet
Development

No branches or pull requests

2 participants