Q0017

Nigel Metheringham edited this page Nov 29, 2012 · 2 revisions
Clone this wiki locally

Q0017

Question

Whenever Exim tries to deliver a specific message to a particular server, it fails, giving the error Remote end closed connection after data or Broken pipe or a timeout. What's going on?

Answer

Broken pipe is the error you get on some OS when the remote host just drops the connection. The alternative is connection reset by peer. There are many potential causes. Here are some of them (see also `../Q0068`_):

  1. There are some firewalls that fall over on binary zero characters mail. Have a look, e.g. with hexdump -c mymail | tail to see if mail contains any binary zero characters.

  2. There are broken SMTP servers around that just drop the connection r the data has been sent if they don't like the message for some on (e.g. it is too big) instead of sending a 5xx error code. Have tried sending a small message to the same address?

It has been reported that some releases of Novell servers running NIMS are unable to handle lines longer than 1024 characters, and just close the connection. This is an example of this behaviour.

  1. If the problem occurs right at the start of the mail, then it could network problem with mishandling of large packets. Many emails are l and thus appear to propagate correctly, but big emails will rate big IP datagrams.

There have been problems when something in the middle of the network mishandles large packets due to IP tunnelling. In a tunnelled link, your IP datagrams gets wrapped in a larger datagram and sent over a network. This is how virtual private networks (VPNs), and some ISP transit circuits work. Since the datagrams going over the tunnel require a larger packet size, the tunnel needs a bigger maximum transfer unit (MTU) in the network handling the tunnelled packets. However, MTUs are often fixed, so the tunnel will try to fragment the packets. If the systems outside the tunnel are using path MTU discovery, (most Sun Sparc Solaris machines do by default), and set the DF (don't fragment) bit because they don't send packets larger than their local MTU, then ICMP control messages will be sent by the routers at the ends of the tunnel to tell them to reduce their MTU, since the tunnel can't fragment the data, and has to throw it away. If this mechanism stops working, e.g. a firewall blocks ICMP, then your host never knows it has hit the maximum path MTU, but it has received no ACK on the packet either, so it continues to resend the same packet and the connection stalls, eventually timing out. You can test the link using pings of large packets and see what works:

ping -s host 2048

Try reducing the MTU on the sending host:

ifconfig le0 mtu 1300

Alternatively, you can reduce the size of the buffer Exim uses for SMTP output by putting something like

DELIVER_OUT_BUFFER_SIZE=512

in your Local/Makefile and rebuilding Exim (the default is 8192). While this should not in principle have any effect on the size of packets sent, in practice it does seem to have an effect on some OS. You can also try disabling path MTU discovery on the sending host. On Linux, try:

echo 1 >/proc/sys/net/ipv4/ip_no_pmtu_disc

For a general discussion and information about other operating systems, see http://www.netheaven.com/pmtu.html. If disabling path MTU discovery fixes the problem, try to find the broken or misconfigured router/firewall that swallows the ICMP-unreachable packets. Increasing timeouts on the receiving host will not work around the problem.