Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: after partial write to TCP client, only send remaining data #7971

Closed
wants to merge 1 commit into from

Conversation

Habbie
Copy link
Member

@Habbie Habbie commented Jun 22, 2019

Short description

If dnsdist could not send all buffered data to a TCP client, it would later try to send the full size of the send buffer, from the correct non-zero position, leading it to also send whatever was -behind- the buffer.

I have a somewhat flakey setup for testing this. I haven't figured out how we could write a reliable test for this.

Checklist

I have:

  • read the CONTRIBUTING.md document
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)
  • checked that this code was merged to master

@Habbie
Copy link
Member Author

Habbie commented Jun 22, 2019

Valgrind output before this PR:

> partial write, res=5432, toWrite=6715, pos=0, sent=0
==20747== Thread 6 dnsdist/tcpClie:
==20747== Syscall param write(buf) points to uninitialised byte(s)
==20747==    at 0x5DEC1CD: ??? (syscall-template.S:84)
==20747==    by 0xBE6EF0: TCPIOHandler::tryWrite(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned long&, unsigned long) (tcpiohandler.hh:249)
==20747==    by 0xBE42A5: handleIO(std::shared_ptr<IncomingTCPConnectionState>&, timeval&) (dnsdist-tcp.cc:1077)
==20747==    by 0xBE4A73: handleIOCallback(int, boost::any&) (dnsdist-tcp.cc:1132)
==20747==    by 0xBF06D7: boost::detail::function::void_function_invoker2<void (*)(int, boost::any&), void, int, boost::any&>::invoke(boost::detail::function::function_buffer&, int, boost::any&) (function_template.hpp:118)
==20747==    by 0xC4346B: boost::function2<void, int, boost::any&>::operator()(int, boost::any&) const (function_template.hpp:771)
==20747==    by 0xC60111: EpollFDMultiplexer::run(timeval*, int) (epollmplexer.cc:165)
==20747==    by 0xBE554C: tcpClientThread(int) (dnsdist-tcp.cc:1192)
==20747==    by 0xBF3B39: void std::_Bind_simple<void (*(int))(int)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) (functional:1391)
==20747==    by 0xBF3A22: std::_Bind_simple<void (*(int))(int)>::operator()() (functional:1380)
==20747==    by 0xBF3853: std::thread::_State_impl<std::_Bind_simple<void (*(int))(int)> >::_M_run() (thread:197)
==20747==    by 0x60B2E6E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==20747==  Address 0x73e924b is 6,715 bytes inside a block of size 13,334 alloc'd
==20747==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==20747==    by 0x798479: __gnu_cxx::new_allocator<unsigned char>::allocate(unsigned long, void const*) (new_allocator.h:104)
==20747==    by 0x790C62: std::allocator_traits<std::allocator<unsigned char> >::allocate(std::allocator<unsigned char>&, unsigned long) (alloc_traits.h:436)
==20747==    by 0x785609: std::_Vector_base<unsigned char, std::allocator<unsigned char> >::_M_allocate(unsigned long) (stl_vector.h:170)
==20747==    by 0xBEED9C: void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char const*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char const*, unsigned char const*, std::forward_iterator_tag) (vector.tcc:659)
==20747==    by 0xBEC473: void std::vector<unsigned char, std::allocator<unsigned char> >::_M_insert_dispatch<unsigned char const*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char const*, unsigned char const*, std::__false_type) (stl_vector.h:1375)
==20747==    by 0xBE9F70: __gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > > std::vector<unsigned char, std::allocator<unsigned char> >::insert<unsigned char const*, void>(__gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char const*, unsigned char const*) (stl_vector.h:1100)
==20747==    by 0xBE14B2: sendResponse(std::shared_ptr<IncomingTCPConnectionState>&, timeval&) (dnsdist-tcp.cc:675)
==20747==    by 0xBE1955: handleResponse(std::shared_ptr<IncomingTCPConnectionState>&, timeval&) (dnsdist-tcp.cc:731)
==20747==    by 0xBE3246: handleDownstreamIO(std::shared_ptr<IncomingTCPConnectionState>&, timeval&) (dnsdist-tcp.cc:961)
==20747==    by 0xBE3B0A: handleDownstreamIOCallback(int, boost::any&) (dnsdist-tcp.cc:1022)
==20747==    by 0xBF06D7: boost::detail::function::void_function_invoker2<void (*)(int, boost::any&), void, int, boost::any&>::invoke(boost::detail::function::function_buffer&, int, boost::any&) (function_template.hpp:118)
==20747== 
partial write, res=5334, toWrite=6714, pos=0, sent=0
partial write, res=2749, toWrite=6714, pos=0, sent=0
partial write, res=2749, toWrite=6714, pos=0, sent=0

The 'partial write' message is generated with:

diff --git a/pdns/tcpiohandler.hh b/pdns/tcpiohandler.hh
index dd82281a7..76ffcb367 100644
--- a/pdns/tcpiohandler.hh
+++ b/pdns/tcpiohandler.hh
@@ -257,6 +259,7 @@ public:
           throw std::runtime_error(std::string("Error while writing message: ") + strerror(errno));
         }
       }
+      if (res < toWrite) cerr<<"partial write, res="<<res<<", toWrite="<<toWrite<<", pos="<<pos<<", sent="<<sent<<endl;
 
       pos += static_cast<size_t>(res);
       sent += static_cast<size_t>(res);

@Habbie
Copy link
Member Author

Habbie commented Jun 22, 2019

I haven't figured out how we could write a reliable test for this.

I suspect the issue can be reproduced by 'breaking' tryWrite to -pretend- that the socket would block. If we can make the socket block on purpose, from outside dnsdist, we can write a real test.

@rgacogne
Copy link
Member

Different fix merged in #7974, closing.

@rgacogne rgacogne closed this Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants