Unrecoverable memory corruption if db concurrency > 6 #542

ixmatus · 2018-03-08T20:23:50Z

We're having a fairly serious issue with our Hydra deployment in that, occasionally (we haven't tracked down what tends to cause this), a double free or other memory-related crash will happen that subsequent restarts of the hydra-queue-runner can't recover from (i.e. they crash immediately upon start):

Mar 08 12:16:17 hydra hydra-queue-runner[39630]: warning: 14 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: *** Error in `hydra-queue-runner': double free or corruption (fasttop): 0x00007efd78035e50 ***
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: ======= Backtrace: =========
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/g1g31ah55xdia1jdqabv1imf6mcw0nb1-glibc-2.25-49/lib/libc.so.6(+0x711e6)[0x7efdee7cd1e6]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/g1g31ah55xdia1jdqabv1imf6mcw0nb1-glibc-2.25-49/lib/libc.so.6(+0x775c6)[0x7efdee7d35c6]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/g1g31ah55xdia1jdqabv1imf6mcw0nb1-glibc-2.25-49/lib/libc.so.6(+0x77dce)[0x7efdee7d3dce]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: hydra-queue-runner(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0x46)[0x42aab6]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store13queryPathInfoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvNS_3refINS_13ValidPathInfoEEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x138)[0x7efdefc171a8]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x1394ee)[0x7efdefbc84ee]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x138579)[0x7efdefbc7579]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11callSuccessINS_3refINS_13ValidPathInfoEEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEEOS5_+0x55)[0x7efdefc1ddd5]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c548)[0x7efdefc1b548]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c6cf)[0x7efdefc1b6cf]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix10sync2asyncISt10shared_ptrINS_13ValidPathInfoEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEERKS4_IFS5_vEE+0x4b)[0x7efdefb9eedb]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11RemoteStore21queryPathInfoUncachedERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvSt10shared_ptrINS_13ValidPathInfoEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x4b)[0x7efdefbf5b1b]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store13queryPathInfoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvNS_3refINS_13ValidPathInfoEEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x5d3)[0x7efdefc17643]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x1394ee)[0x7efdefbc84ee]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x138579)[0x7efdefbc7579]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11callSuccessINS_3refINS_13ValidPathInfoEEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEEOS5_+0x55)[0x7efdefc1ddd5]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c548)[0x7efdefc1b548]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c6cf)[0x7efdefc1b6cf]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix10sync2asyncISt10shared_ptrINS_13ValidPathInfoEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEERKS4_IFS5_vEE+0x4b)[0x7efdefb9eedb]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11RemoteStore21queryPathInfoUncachedERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvSt10shared_ptrINS_13ValidPathInfoEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x4b)[0x7efdefbf5b1b]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store13queryPathInfoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvNS_3refINS_13ValidPathInfoEEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x5d3)[0x7efdefc17643]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x1394ee)[0x7efdefbc84ee]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x138579)[0x7efdefbc7579]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11callSuccessINS_3refINS_13ValidPathInfoEEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEEOS5_+0x55)[0x7efdefc1ddd5]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c548)[0x7efdefc1b548]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c6cf)[0x7efdefc1b6cf]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix10sync2asyncISt10shared_ptrINS_13ValidPathInfoEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEERKS4_IFS5_vEE+0x4b)[0x7efdefb9eedb]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11RemoteStore21queryPathInfoUncachedERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvSt10shared_ptrINS_13ValidPathInfoEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x4b)[0x7efdefbf5b1b]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store13queryPathInfoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvNS_3refINS_13ValidPathInfoEEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x5d3)[0x7efdefc17643]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x1394ee)[0x7efdefbc84ee]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x138579)[0x7efdefbc7579]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11callSuccessINS_3refINS_13ValidPathInfoEEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEEOS5_+0x55)[0x7efdefc1ddd5]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c548)[0x7efdefc1b548]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c6cf)[0x7efdefc1b6cf]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix10sync2asyncISt10shared_ptrINS_13ValidPathInfoEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEERKS4_IFS5_vEE+0x4b)[0x7efdefb9eedb]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11RemoteStore21queryPathInfoUncachedERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvSt10shared_ptrINS_13ValidPathInfoEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x4b)[0x7efdefbf5b1b]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store13queryPathInfoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvNS_3refINS_13ValidPathInfoEEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x5d3)[0x7efdefc17643]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x1394ee)[0x7efdefbc84ee]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x138579)[0x7efdefbc7579]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11callSuccessINS_3refINS_13ValidPathInfoEEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEEOS5_+0x55)[0x7efdefc1ddd5]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c548)[0x7efdefc1b548]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x18c6cf)[0x7efdefc1b6cf]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix10sync2asyncISt10shared_ptrINS_13ValidPathInfoEEEEvRKSt8functionIFvT_EERKS4_IFvNSt15__exception_ptr13exception_ptrEEERKS4_IFS5_vEE+0x4b)[0x7efdefb9eedb]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix11RemoteStore21queryPathInfoUncachedERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvSt10shared_ptrINS_13ValidPathInfoEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x4b)[0x7efdefbf5b1b]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store13queryPathInfoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvNS_3refINS_13ValidPathInfoEEEEES9_IFvNSt15__exception_ptr13exception_ptrEEE+0x5d3)[0x7efdefc17643]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(+0x1394ee)[0x7efdefbc84ee]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store16computeFSClosureERKSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS7_ESaIS7_EERSB_bbb+0x1ca)[0x7efdefbc807a]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/fipq97y7kcbhgvgarzi7387dird5qn79-nix-1.12pre5788_e3013543/lib/libnixstore.so(_ZN3nix5Store16computeFSClosureERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt3setIS6_St4lessIS6_ESaIS6_EEbbb+0xce)[0x7efdefbc923e]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: hydra-queue-runner[0x471e6f]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: hydra-queue-runner[0x453e14]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: hydra-queue-runner[0x45766c]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: hydra-queue-runner[0x44f2b7]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/rf22nb44kggi8r4fbw5xa0bh79fimyj7-gcc-6.4.0-lib/lib/libstdc++.so.6(+0xb76df)[0x7efdef0db6df]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/g1g31ah55xdia1jdqabv1imf6mcw0nb1-glibc-2.25-49/lib/libpthread.so.0(+0x7234)[0x7efdf007e234]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: /nix/store/g1g31ah55xdia1jdqabv1imf6mcw0nb1-glibc-2.25-49/lib/libc.so.6(clone+0x3f)[0x7efdee84475f]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: ======= Memory map: ========
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 00400000-0048b000 r-xp 00000000 fe:05 2217217184                         /nix/store/i9hj8a3lv8lyn90kv1dhmpmqvgpvcs8w-hydra-2017-11-21/bin/hydra-queue-runner
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 0068b000-0068d000 r--p 0008b000 fe:05 2217217184                         /nix/store/i9hj8a3lv8lyn90kv1dhmpmqvgpvcs8w-hydra-2017-11-21/bin/hydra-queue-runner
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 0068d000-0068e000 rw-p 0008d000 fe:05 2217217184                         /nix/store/i9hj8a3lv8lyn90kv1dhmpmqvgpvcs8w-hydra-2017-11-21/bin/hydra-queue-runner
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 010a8000-010fe000 rw-p 00000000 00:00 0                                  [heap]
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd4c000000-7efd4c021000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd4c021000-7efd50000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd54000000-7efd54021000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd54021000-7efd58000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd58000000-7efd58021000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd58021000-7efd5c000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd5c000000-7efd5c038000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd5c038000-7efd60000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd60000000-7efd60021000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd60021000-7efd64000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd64000000-7efd64021000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd64021000-7efd68000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd68000000-7efd6803f000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd6803f000-7efd6c000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd6c000000-7efd6c065000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd6c065000-7efd70000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd70000000-7efd7005f000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd7005f000-7efd74000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd74000000-7efd74021000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd74021000-7efd78000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd78000000-7efd7804c000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd7804c000-7efd7c000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd7c000000-7efd7c053000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd7c053000-7efd80000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd80000000-7efd8004f000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd8004f000-7efd84000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd84000000-7efd84057000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd84057000-7efd88000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd88000000-7efd88072000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd88072000-7efd8c000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd8c000000-7efd8c068000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd8c068000-7efd90000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd90000000-7efd90067000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd90067000-7efd94000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd94000000-7efd9404d000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9404d000-7efd98000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd987f9000-7efd987fa000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd987fa000-7efd98ffa000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd98ffa000-7efd98ffb000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd98ffb000-7efd997fb000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd997fb000-7efd997fc000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd997fc000-7efd99ffc000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd99ffc000-7efd99ffd000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd99ffd000-7efd9a7fd000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9a7fd000-7efd9a7fe000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9a7fe000-7efd9affe000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9affe000-7efd9afff000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9afff000-7efd9b7ff000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9b7ff000-7efd9b800000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9b800000-7efd9c000000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9c000000-7efd9c06b000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efd9c06b000-7efda0000000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efda07f9000-7efda07fa000 ---p 00000000 00:00 0
Mar 08 12:16:17 hydra hydra-queue-runner[39630]: 7efda07fa000-7efda0ffa000 rw-p 00000000 00:00 0
Mar 08 12:16:17 hydra systemd[1]: hydra-queue-runner.service: Main process exited, code=killed, status=6/ABRT
Mar 08 12:16:17 hydra systemd[1]: hydra-queue-runner.service: Unit entered failed state.
Mar 08 12:16:17 hydra systemd[1]: hydra-queue-runner.service: Failed with result 'signal'.
Mar 08 12:16:17 hydra systemd[1]: hydra-queue-runner.service: Service hold-off time over, scheduling restart.
Mar 08 12:16:17 hydra systemd[1]: Stopped hydra-queue-runner.service.
Mar 08 12:16:17 hydra systemd[1]: Started hydra-queue-runner.service.
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 7 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 8 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 9 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 10 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 11 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 12 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 13 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 14 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 15 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 15 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 16 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 17 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra hydra-queue-runner[39808]: warning: 18 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:18 hydra systemd[1]: hydra-queue-runner.service: Main process exited, code=killed, status=11/SEGV
Mar 08 12:16:18 hydra systemd[1]: hydra-queue-runner.service: Unit entered failed state.
Mar 08 12:16:18 hydra systemd[1]: hydra-queue-runner.service: Failed with result 'signal'.
Mar 08 12:16:19 hydra systemd[1]: hydra-queue-runner.service: Service hold-off time over, scheduling restart.
Mar 08 12:16:19 hydra systemd[1]: Stopped hydra-queue-runner.service.
Mar 08 12:16:19 hydra systemd[1]: Started hydra-queue-runner.service.
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 7 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 8 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 9 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 10 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 11 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 12 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 13 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 14 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 15 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 16 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 17 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra hydra-queue-runner[40014]: warning: 18 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:19 hydra systemd[1]: hydra-queue-runner.service: Main process exited, code=killed, status=6/ABRT
Mar 08 12:16:20 hydra systemd[1]: hydra-queue-runner.service: Unit entered failed state.
Mar 08 12:16:20 hydra systemd[1]: hydra-queue-runner.service: Failed with result 'signal'.
Mar 08 12:16:20 hydra systemd[1]: hydra-queue-runner.service: Service hold-off time over, scheduling restart.
Mar 08 12:16:20 hydra systemd[1]: Stopped hydra-queue-runner.service.
Mar 08 12:16:20 hydra systemd[1]: Started hydra-queue-runner.service.
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 7 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 8 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 10 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 11 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 12 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 13 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 14 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 14 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 15 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 16 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 17 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:20 hydra hydra-queue-runner[40200]: warning: 18 concurrent database updates; PostgreSQL may be stalled
Mar 08 12:16:21 hydra systemd[1]: hydra-queue-runner.service: Main process exited, code=killed, status=6/ABRT
Mar 08 12:16:21 hydra systemd[1]: hydra-queue-runner.service: Unit entered failed state.
Mar 08 12:16:21 hydra systemd[1]: hydra-queue-runner.service: Failed with result 'signal'.
Mar 08 12:16:21 hydra systemd[1]: hydra-queue-runner.service: Service hold-off time over, scheduling restart.
Mar 08 12:16:21 hydra systemd[1]: Stopped hydra-queue-runner.service.
Mar 08 12:16:21 hydra systemd[1]: hydra-queue-runner.service: Start request repeated too quickly.
Mar 08 12:16:21 hydra systemd[1]: Failed to start hydra-queue-runner.service.
Mar 08 12:16:21 hydra systemd[1]: hydra-queue-runner.service: Unit entered failed state.
Mar 08 12:16:21 hydra systemd[1]: hydra-queue-runner.service: Failed with result 'signal'.

The only way we've been able to recover from this error is by reducing max_db_concurrency to 6, letting Hydra chew through builds for a while, then increasing the max_db_concurrency.

We can't leave the max_db_concurrency at 6 because Hydra can't otherwise keep up with build volume at all and ends up taking many hours to schedule new builds being added by the evaluator.

Has anyone else encountered this before? Is it a postgresql configuration issue?

@edolstra could you weigh in? I've asked in IRC and no one else has any insight into these issues.

The text was updated successfully, but these errors were encountered:

ixmatus · 2018-03-08T20:45:04Z

Help would be appreciated because this issue is murdering us right now.

ixmatus · 2018-03-08T20:45:18Z

CC: @domenkozar

domenkozar · 2018-03-09T10:30:42Z

I haven't seen this one before but I'd recommend doing a coredump that can be inspected.

edolstra · 2018-03-09T13:29:47Z

This should be fixed by NixOS/nix@24b7398.

ixmatus · 2018-03-09T17:12:08Z

@edolstra thank you for the quick response! I will apply that patch to our configuration and test it today.

I'm providing more context about our configuration for posterity.

We are pinned to the following version of nixpkgs:

{
  "url": "https://github.com/NixOS/nixpkgs.git",
  "rev": "74286ec9e76be7cd00c4247b9acb430c4bd9f1ce",
  "date": "2018-01-15T12:35:29-05:00",
  "sha256": "13ydgpzl5nix4gc358iy9zjd5nrrpbpwpxmfhis4aai2zmkja3ak",
  "fetchSubmodules": true
}

This, is the version of Hydra we are running: 2017-11-21 (using nix-1.12pre5788_e3013543)

We have a few patches applied to the Hydra derivation, one for building pull requests off of our Enterprise Github instance, one fixing a hydra-notify script error, and a patch trimming github pr store paths.

Our Hydra configuration is:

build-fallback = true
max_db_connections = 128
max_output_size = 4294967296
        
<githubstatus>
  jobs = <redacted
  github = <redacted>
  inputs = src
  authorization = <redacted>
 </githubstatus>

Our Postgresql configuration (some of which is cribbed from https://github.com/NixOS/nixos-org-configurations/blob/master/delft/chef.nix) is:

log_min_duration_statement = 5000
log_duration = off
log_statement = 'none'
max_connections = 250
work_mem = 16MB
shared_buffers = 4GB

# Checkpoint every 256 MB.
min_wal_size = 128MB
max_wal_size = 256MB

# We can risk losing some transactions.
synchronous_commit = off
effective_cache_size = 16GB

# We're on SSDs, random access isn't as expensive
random_page_cost = 1
effective_io_concurrency = 5

The Nix configuration has maxJobs = 10 and buildCores = 4.

The host is a bare metal machine and it has 251GB of high-speed RAM, two 500GB SSDs, and one 500GB SSD on NVMe.

The hosts CPU hardware is:

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              40
On-line CPU(s) list: 0-39
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               62
Model name:          Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
Stepping:            4
CPU MHz:             2799.829
CPU max MHz:         2800.0000
CPU min MHz:         1200.0000
BogoMIPS:            5599.96
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            25600K
NUMA node0 CPU(s):   0-9,20-29
NUMA node1 CPU(s):   10-19,30-39
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts

We have four build slaves (one of which is the Hydra host itself) that Hydra is configured to use as well, here is the machines configuration:

    hydra-queue-runner@hydra x86_64-linux /etc/hydra-queue-runner/hydra-queue-runner_rsa 10 1 local
    hydra-queue-runner@hydra-slave03 x86_64-linux /etc/hydra-queue-runner/hydra-queue-runner_rsa 8 10 kvm,big-parallel,nixos-test
    hydra-queue-runner@hydra-slave04 x86_64-linux /etc/hydra-queue-runner/hydra-queue-runner_rsa 8 10 kvm,big-parallel,nixos-test
    nixbuilder@nix-osx-builder x86_64-darwin /etc/hydra-queue-runner/hydra-queue-runner_rsa 4 10

ixmatus · 2018-03-09T17:15:44Z

I tried to debug the hydra-queue-runner crashes by running it with valgrind; unfortunately (and happily, since we have a way to stabilize Hydra now) that eliminated the crashes entirely.

I will test this patch by not running hydra-queue-runner in valgrind and see if a crash occurs. Crashes weren't easy to produce but they were common enough under load during the workday that we'd see at least two or three in a day.

I will report back to this issue ticket if we see hydra-queue-runner performing under heavy load stably.

ixmatus · 2018-03-09T23:19:46Z

I can confirm that hydra-queue-runner has been running all day with a patched nixUnstable using the patch @edolstra created that closed this issue and has seen no memory corruption-related crashes.

Additionally, another issue we were observing but didn't think was related and have not seen at all today, was hydra-queue-runner would wedge without any feedback as to what happened.

More time will tell but I think @edolstra's fix was the ticket.

It was holding on to a Value* (i.e. a std::shared_ptr<ValidPathInfo>*) outside of the pathInfoCache lock, so the std::shared_ptr could be destroyed between the release of the lock and the decrement of the std::shared_ptr refcount. This can happen if more than 'path-info-cache-size' paths are added in the meantime, *or* if clearPathInfoCache() is called. The hydra-queue-runner queue monitor thread periodically calls the later, so is likely to trigger a crash. Fixes NixOS/hydra#542. (cherry picked from commit 24b7398)

edolstra closed this as completed in NixOS/nix@24b7398 Mar 9, 2018

ixmatus mentioned this issue Aug 15, 2018

Hydra-queue-runner intermittently wedges with no log output (even in debug mode) #550

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrecoverable memory corruption if db concurrency > 6 #542

Unrecoverable memory corruption if db concurrency > 6 #542

ixmatus commented Mar 8, 2018

ixmatus commented Mar 8, 2018

ixmatus commented Mar 8, 2018

domenkozar commented Mar 9, 2018

edolstra commented Mar 9, 2018

ixmatus commented Mar 9, 2018 •

edited

Loading

ixmatus commented Mar 9, 2018

ixmatus commented Mar 9, 2018

Unrecoverable memory corruption if db concurrency > 6 #542

Unrecoverable memory corruption if db concurrency > 6 #542

Comments

ixmatus commented Mar 8, 2018

ixmatus commented Mar 8, 2018

ixmatus commented Mar 8, 2018

domenkozar commented Mar 9, 2018

edolstra commented Mar 9, 2018

ixmatus commented Mar 9, 2018 • edited Loading

ixmatus commented Mar 9, 2018

ixmatus commented Mar 9, 2018

ixmatus commented Mar 9, 2018 •

edited

Loading