Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: init of ServerPolicies crashes on FreeBSD stable/14 #13766

Closed
omoerbeek opened this issue Feb 5, 2024 · 12 comments · Fixed by #13771
Closed

dnsdist: init of ServerPolicies crashes on FreeBSD stable/14 #13766

omoerbeek opened this issue Feb 5, 2024 · 12 comments · Fixed by #13771
Labels

Comments

@omoerbeek
Copy link
Member

omoerbeek commented Feb 5, 2024

Originally reported by jostreff

* thread #1, name = 'dnsdist', stop reason = signal SIGBUS: object-specific hardware error
    frame #0: 0x000035570a927bc5 dnsdist`ServerPolicy::ServerPolicy(ServerPolicy const&) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str="roundrobin") at string:899:20
   896 	  _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(const basic_string& __str)
   897 	      : __r_(__default_init_tag(), __alloc_traits::select_on_container_copy_construction(__str.__alloc())) {
   898 	    if (!__str.__is_long())
-> 899 	      __r_.first() = __str.__r_.first();
   900 	    else
   901 	      __init_copy_ctor_external(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
   902 	  }

The backtrace looks like this (using lldb):

* thread #1, name = 'dnsdist', stop reason = signal SIGBUS: object-specific hardware error
  * frame #0: 0x000035570a927bc5 dnsdist`ServerPolicy::ServerPolicy(ServerPolicy const&) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str="roundrobin") at string:899:20
    frame #1: 0x000035570a927bb4 dnsdist`ServerPolicy::ServerPolicy(this=0x00000000000824f8, (null)=0x0000355f2b9711a0) at dnsdist-lbpolicies.hh:32:7
    frame #2: 0x000035570a9279ce dnsdist`LuaContext::PushedObject LuaContext::Pusher<ServerPolicy, void>::push<ServerPolicy&>(state=0x0000000000076378, value=0x0000355f2b9711a0) at LuaContext.hpp:1647:37
    frame #3: 0x000035570a9278c0 dnsdist`void LuaContext::setTable<ServerPolicy, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, ServerPolicy&>(state=0x0000000000076378, (null)=Globals, index="roundrobin", data=0x0000355f2b9711a0) at LuaContext.hpp:1010:23
    frame #4: 0x000035570a882fd7 dnsdist`setupLuaBindings(LuaContext&, bool, bool) [inlined] void LuaContext::writeVariable<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, ServerPolicy&>(this=<unavailable>, data=<unavailable>, data=<unavailable>) at LuaContext.hpp:737:9
    frame #5: 0x000035570a882faf dnsdist`setupLuaBindings(luaCtx=<unavailable>, client=<unavailable>, configCheck=<unavailable>) at dnsdist-lua-bindings.cc:90:12
    frame #6: 0x000035570aa79c13 dnsdist`setupLua(luaCtx=0x000035570abe8a28, client=<unavailable>, configCheck=false, config="/usr/local/etc/dnsdist.conf") at dnsdist-lua.cc:3399:3
    frame #7: 0x000035570ab7bcc5 dnsdist`main(argc=1, argv=0x0000355f2b971e40) at dnsdist.cc:3272:17
    frame #8: 0x0000355f31eb15ca libc.so.7`__libc_start1(argc=1, argv=0x0000355f2b971e40, env=0x0000355f2b971e50, cleanup=<unavailable>, mainX=(dnsdist`main at dnsdist.cc:3181)) at libc_start1.c:157:7
    frame #9: 0x000035570a664cfd dnsdist`_start at crt1_s.S:83

It might be related to the use of placement new. During the call of the copy ct, it looks like the string library code assumes this is already initialized, which it isn't. When I have the opportunity, I'll try to isolate this into a smaller test.

Not using the full ServerPolicy object but a using a const pointer to it works around the issue, in the sense that dnsdist no longer crashes.

diff --git a/pdns/dnsdist-lua-bindings.cc b/pdns/dnsdist-lua-bindings.cc
index 69638e23e..16121b05d 100644
--- a/pdns/dnsdist-lua-bindings.cc
+++ b/pdns/dnsdist-lua-bindings.cc
@@ -78,7 +78,7 @@ void setupLuaBindings(LuaContext& luaCtx, bool client, bool configCheck)
   luaCtx.registerFunction("toString", &ServerPolicy::toString);
   luaCtx.registerFunction("__tostring", &ServerPolicy::toString);
 
-  ServerPolicy policies[] = {
+  static const ServerPolicy policies[] = {
     ServerPolicy{"firstAvailable", firstAvailable, false},
     ServerPolicy{"roundrobin", roundrobin, false},
     ServerPolicy{"wrandom", wrandom, false},
@@ -86,8 +86,8 @@ void setupLuaBindings(LuaContext& luaCtx, bool client, bool configCheck)
     ServerPolicy{"chashed", chashed, false},
     ServerPolicy{"leastOutstanding", leastOutstanding, false}
   };
-  for (auto& policy : policies) {
-    luaCtx.writeVariable(policy.d_name, policy);
+  for (const auto& policy : policies) {
+    luaCtx.writeVariable(policy.d_name, &policy);
   }
 
 #endif /* DISABLE_POLICIES_BINDINGS */

But I have little idea if that affects Lua code in the wild.
Update: not using a const pointer (removing the two instances of const in the diff) gets me modifiable standard ServerPolicies again. Though I would not know why you would want to change those.

So far I did not succeed to reproduce in a smaller program.

@omoerbeek
Copy link
Member Author

Another thing: only ServerPolicy objects are passed to writeVariable() on startup, there might be other objects that passed to writeVariable() during runtime. It has to be verfied these do not suffer from the same issues as ServerPolicy.

@omoerbeek
Copy link
Member Author

omoerbeek commented Feb 6, 2024

It is an alignment issue. The crash happens in:

    0x94ce77cdbc1 <+33>:  movaps (%r14), %xmm0
->  0x94ce77cdbc5 <+37>:  movaps %xmm0, (%rbx)
    0x94ce77cdbc8 <+40>:  jmp    0x94ce77cdc1e             ; <+126> at dnsdist-lbpolicies.hh:3
....
(lldb) register read 
General Purpose Registers:
       rax = 0x000000000003f788
       rbx = 0x000000000003b4f8
       rcx = 0x0000000000030928
       rdx = 0x0000000000045068
       rdi = 0x000000000003b4f8
       rsi = 0x0000233d17bbc4a0
       rbp = 0x0000233d17bbc360
       rsp = 0x0000233d17bbc330
        r8 = 0x0000000000000100
        r9 = 0x0000000000000100
       r10 = 0x00000000000480f8
       r11 = 0x0000000000047528
       r12 = 0x0000233d1d9288a0  libc.so.7`__stack_chk_guard
       r13 = 0x6568636143746567
       r14 = 0x0000233d17bbc4a0
       r15 = 0x0000233d17bbc4a0
       rip = 0x00002334f70d9bc5  dnsdist`ServerPolicy::ServerPolicy(ServerPolicy const&) + 37 [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 17 at string:899:20
  dnsdist`ServerPolicy::ServerPolicy(ServerPolicy const&) + 20 at dnsdist-lbpolicies.hh:32:7
    rflags = 0x0000000000010246
        cs = 0x0000000000000043
        fs = 0x0013
        gs = 0x001b
        ss = 0x000000000000003b
        ds = 0x003b
        es = 0x003b

rbx is not 16-byte aligned.

@omoerbeek
Copy link
Member Author

omoerbeek commented Feb 6, 2024

Some more info: luajit is indeed returning 8-byte aligned allocations, on this FreeBSD machine, but also on a debian test machine where I compiled dnsdist with clang++ 17.0.6. Why the FreeBSD compiler decides to use instructions that only work on 16-byte aligned data in the string copy constructor and the debian machine not, I don't know. It could be a different version of the c++lib used.

@jostreff
Copy link

jostreff commented Feb 7, 2024

link to other thread where the same bug is commented too https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276786

@omoerbeek omoerbeek changed the title dnsdist: init of ServerPolicies crashses on FreeBSD stable/14 dnsdist: init of ServerPolicies crashes on FreeBSD stable/14 Feb 7, 2024
@omoerbeek
Copy link
Member Author

omoerbeek commented Feb 8, 2024

In https://cgit.freebsd.org/src/commit/contrib/llvm-project/libcxx/include/string?h=stable/14&id=fe013be447cd855ccaf6094a1d06aea570450629 it shows that the string copy constructor was changed form non-inlined to inlined (search for basic_string(const basic_string& __str)
AFAIK, the non-linlined version cannot make any assumption about alignment. The inlined version can in principle, but does poorly in our case, it seems.

@rgacogne
Copy link
Member

rgacogne commented Feb 8, 2024

Nice finding! I'm really surprised this doesn't break all over the place.. Still I guess it's better to work around this issue in dnsdist before we get complaints about it. Otto, would you be able to submit a PR with your patch (as far as I can tell const is OK, these policies should not be modified)? Perhaps we could switch to a std::array in the process. I can handle the PR if you are busy, to be clear!

@omoerbeek
Copy link
Member Author

I can do it later today. But I'm worried other cases might break as well....

@omoerbeek
Copy link
Member Author

This program reproduces the issue:

#include <iostream>
#include <functional>

class ServerPolicy
{
public:
  ServerPolicy(const std::string& name_): d_name(name_)
  {
  }

public:
  std::string d_name;
  std::function<void(void)> f; // if I remove this field, the issue disappears
};

void *all(size_t sz)
{
  char *p = static_cast<char*>(malloc(sz + 8));
  p += 8;
  return static_cast<void*>(p);
}

int
main()
{
  ServerPolicy org = ServerPolicy("the first policy");
  void *p = all(sizeof(ServerPolicy));
  new (p) ServerPolicy(org);
  auto s = static_cast<ServerPolicy*>(p);
  std::cout << s->d_name << std::endl;
}

Compile with
c++ -g -Wall -Wextra -O2 -std=c++17 l.cc

lldb ./a.out
(lldb) target create "./a.out"
Current executable set to '/root/a.out' (x86_64).
(lldb) run
Process 2493 launched: '/root/a.out' (x86_64)
Process 2493 stopped
* thread #1, name = 'a.out', stop reason = signal SIGBUS: object-specific hardware error
    frame #0: 0x0000000000201ff4 a.out`main [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str="the first policy") at string:899:20
   896 	  _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string(const basic_string& __str)
   897 	      : __r_(__default_init_tag(), __alloc_traits::select_on_container_copy_construction(__str.__alloc())) {
   898 	    if (!__str.__is_long())
-> 899 	      __r_.first() = __str.__r_.first();
   900 	    else
   901 	      __init_copy_ctor_external(std::__to_address(__str.__get_long_pointer()), __str.__get_long_size());
   902 	  }
(lldb) disassemble 
a.out`main:
    0x201fc0 <+0>:   pushq  %rbp
    0x201fc1 <+1>:   movq   %rsp, %rbp
    0x201fc4 <+4>:   pushq  %r14
    0x201fc6 <+6>:   pushq  %rbx
    0x201fc7 <+7>:   subq   $0x60, %rsp
    0x201fcb <+11>:  movb   $0x20, -0x70(%rbp)
    0x201fcf <+15>:  movups -0x137d(%rip), %xmm0
    0x201fd6 <+22>:  movups %xmm0, -0x6f(%rbp)
    0x201fda <+26>:  movb   $0x0, -0x5f(%rbp)
    0x201fde <+30>:  movq   $0x0, -0x30(%rbp)
    0x201fe6 <+38>:  movl   $0x58, %edi
    0x201feb <+43>:  callq  0x202470                  ; symbol stub for: malloc
    0x201ff0 <+48>:  movaps -0x70(%rbp), %xmm0
->  0x201ff4 <+52>:  movaps %xmm0, 0x8(%rax)
    0x201ff8 <+56>:  movq   -0x60(%rbp), %rcx
    0x201ffc <+60>:  movq   %rcx, 0x18(%rax)
    0x202000 <+64>:  movq   $0x0, 0x48(%rax)
    0x202008 <+72>:  movzbl 0x8(%rax), %edi
    0x20200c <+76>:  leaq   0x9(%rax), %rsi
...
(lldb) register read rax
     rax = 0x00002870d3c09000
(lldb) 

@omoerbeek
Copy link
Member Author

Another finding:

12:21 < ottom> btw std::cout << alignof(std::string) << std::endl; prints 8
12:22 < ottom> hmm, but alignof(ServerPolicy) prints 16

@omoerbeek
Copy link
Member Author

omoerbeek commented Feb 8, 2024

So the summarize things:

  1. ServerPolicy has alignment requirement 16, because it contains a function object, which has alignment requirement 16, because the implementation of std::function has a std::__1::aligned_storage<24ul, 16ul>::type __buf_ field.
  2. luajit's allocator returns 8-byte aligned objects.
  3. The placement new constructs a ServerPolicy object in a non-16 byte aligned piece of memory using the string copy constructor.
  4. The compiler is smart enough to conclude from the 16-byte alignment requirement of ServerPolicy that the string field (which is the the first field of it) is (or at least should be) also 16-byte aligned.
  5. The inlined copy constructor uses this knowledge and uses fast xmm instructions that require 16-byte alignment.
  6. The data isn't actually 16-byte aligned -> SIGBUS.

The fundamental solution would be to to change the default alignment of the luajit allocator to agree to
what the C++ compiler and c++lib assume.

omoerbeek added a commit to omoerbeek/pdns that referenced this issue Feb 8, 2024
… lua(jit)

luajit aligns only to 8 bytes by default, and some objects require 16 byte alignment.
Fixes PowerDNS#13766
omoerbeek added a commit to omoerbeek/pdns that referenced this issue Feb 8, 2024
… lua(jit)

luajit aligns only to 8 bytes by default, and some objects require
16 byte alignment.

Fixes PowerDNS#13766

Note that the static assert in LuaContext.hpp is commented out in
one case.  This trips on some platforms, but does not seem to be
harmful right now.

The fundamental solution remains the have luajit agree with C++ on
minimal alignment of its allocators.
@omoerbeek
Copy link
Member Author

13771-for-dist.diff.txt

@jostreff
Copy link

jostreff commented Feb 8, 2024

I confirm that now it builds and runs fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants