Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rsync Fails over opernfortivpn and ssh #184

Closed
robmukai opened this issue Sep 25, 2017 · 24 comments
Closed

Rsync Fails over opernfortivpn and ssh #184

robmukai opened this issue Sep 25, 2017 · 24 comments

Comments

@robmukai
Copy link

robmukai commented Sep 25, 2017

This may be similar to #154 so if it is please close it. I am running on Ubuntu 16.04.3. Latest version of openfortivpn compiled from source. I run a backup over SSH through the openfortvpn to a box behind a fortigate. I am in a pretty remote location and my internet is over a microwave connection. Although the signal is usually pretty good. Also, I can run this backup on a Windows 10 machine using the Forticlient from Fortinet. It does disconnect on occasion however more randomly.

What happens is, I can connect to the Fortigate through the openfortivpn just fine. I can also start the Rsync process just fine. However at the same point in the backup for each directory, it seems to "hang", and the openfortivpn closes. I back up a few different directories, and all the directories will "hang" this way. This happens using different source hard drives, and different destination hard drives.

Here is the end of the session:
`
DEBUG: pppd ---> gateway (201 bytes)
pppd: 00 21 45 00 00 c7 7e 51 40 00 01 11 ff d9 0a 00 01 01 ef ff ff fa cf ee 07 6c 00 b3 68 54 4d 2d 53 45 41 52 43 48 20 2a 20 48 54 54 50 2f 31 2e 31 0d 0a 48 4f 53 54 3a 20 32 33 39 2e 32 35 35 2e 32 35 35 2e 32 35 30 3a 31 39 30 30 0d 0a 4d 41 4e 3a 20 22 73 73 64 70 3a 64 69 73 63 6f 76 65 72 22 0d 0a 4d 58 3a 20 31 0d 0a 53 54 3a 20 75 72 6e 3a 64 69 61 6c 2d 6d 75 6c 74 69 73 63 72 65 65 6e 2d 6f 72 67 3a 73 65 72 76 69 63 65 3a 64 69 61 6c 3a 31 0d 0a 55 53 45 52 2d 41 47 45 4e 54 3a 20 47 6f 6f 67 6c 65 20 43 68 72 6f 6d 65 2f 36 31 2e 30 2e 33 31 36 33 2e 39 31 20 4c 69 6e 75 78 0d 0a 0d 0a

DEBUG: pppd ---> gateway (201 bytes)
pppd: 00 21 45 00 00 c7 7e b7 40 00 01 11 ff 73 0a 00 01 01 ef ff ff fa cf ee 07 6c 00 b3 68 54 4d 2d 53 45 41 52 43 48 20 2a 20 48 54 54 50 2f 31 2e 31 0d 0a 48 4f 53 54 3a 20 32 33 39 2e 32 35 35 2e 32 35 35 2e 32 35 30 3a 31 39 30 30 0d 0a 4d 41 4e 3a 20 22 73 73 64 70 3a 64 69 73 63 6f 76 65 72 22 0d 0a 4d 58 3a 20 31 0d 0a 53 54 3a 20 75 72 6e 3a 64 69 61 6c 2d 6d 75 6c 74 69 73 63 72 65 65 6e 2d 6f 72 67 3a 73 65 72 76 69 63 65 3a 64 69 61 6c 3a 31 0d 0a 55 53 45 52 2d 41 47 45 4e 54 3a 20 47 6f 6f 67 6c 65 20 43 68 72 6f 6d 65 2f 36 31 2e 30 2e 33 31 36 33 2e 39 31 20 4c 69 6e 75 78 0d 0a 0d 0a

DEBUG: pppd ---> gateway (25 bytes)
pppd: c0 21 05 02 00 17 50 65 65 72 20 6e 6f 74 20 72 65 73 70 6f 6e 64 69 6e 67

DEBUG: pppd ---> gateway (25 bytes)
pppd: c0 21 05 03 00 17 50 65 65 72 20 6e 6f 74 20 72 65 73 70 6f 6e 64 69 6e 67

ERROR: read: Input/output error
INFO: Cancelling threads...
INFO: Setting ppp interface down.
INFO: Restoring routes...
DEBUG: ip route del to XX.XXX.XXX.XXX/255.255.255.255 via XXX.XXX.X.X dev wlp2s0
INFO: Removing VPN nameservers...
DEBUG: Waiting for pppd to exit...
DEBUG: waitpid: pppd exit status code 16
INFO: Terminated pppd.
INFO: Closed connection to gateway.
DEBUG: Gateway certificate validation failed.
DEBUG: Gateway certificate digest found in white list.
INFO: Logged out. `

The last pppd message is:
À!���Peer not responding

The ERROR: read: Input/output error is the same as #154 , but the cause is different. Any and all help is appreciated. I am willing to do any testing that may be required.

@DimitriPapadopoulos
Copy link
Collaborator

This message is printed by code recently added to openfortivpn (74dc069):
DEBUG: waitpid: pppd exit status code 16

According to the pppd documentation it means:

The link was terminated because the peer is not responding to echo requests.

For some reason pppd is not able to reach its peer - the gateway. Could be a pppd error in the worst case, or problems with the VPN tunnel itself.

I really don't know how a microwave connection works. Is this a little bit like Wi-Fi, where the connection could be reset, resulting for example in a new DHCP lease? If so, could you check the logs of the microwave connection and find whether something happened when pppd failed?

@robmukai
Copy link
Author

@DimitriPapadopoulos The microwave connection is pretty transparent. The antenna connects directly into my wifi router. There really isn't anything on my end to look at. Looking at the router logs, there isn't anything that jumps out.

If I do restart the openfortivpn, the rsync continues until it hits another random spot, then the vpn dies again with the same message. So it appears that the openfortivpn is somehow losing a connection maybe? Or maybe it is timing out too quickly?

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Sep 26, 2017

Among possible causes:

  1. a timeout somewhere (but not in openfortivpn, possibly pppd),
  2. a pppd bug, but then it's a widely used piece of software so I doubt it,
  3. an openfortivpn bug, where openfortivpn fails unbeknownst to pppd,
  4. need to set some network parameters of the MTU and MRU kind, possibly related to the "exotic" microwave router.

I can't help much. Unless some other maintainer can help, I can only suggest:

  1. Instead of comparing openfortivpn/Ubuntu (VPN SSL only) with FortiClient/Windows (IPSec by default), you could you compare openfortivpn/Ubuntu with FortiClient/Ubuntu, or alternatively FortiClient/Windows in SSL mode - not the IPSec default.
  2. You could also compare rsync with and without VPN (use a different destination server if you have to).

@robmukai
Copy link
Author

@DimitriPapadopoulos Thanks for following up.

On your two suggestions

  1. openforticlientvpn/Ubuntu and Forticlient/Ubuntu show the same behavior. FortiClient/Windows is set to SSL-VPN and seems to work.
  2. I'll have to see if I can find a machine to rsync to. I'll let you know what I come up with.

Thanks for your help!

@DimitriPapadopoulos
Copy link
Collaborator

Thank you for trying these suggestions.

  1. Since openforticlientvpn/Ubuntu and FortiClient/Ubuntu share the same behavior, this is probably not an openfortivpn bug - at worst this is a "feature" shared by both clients! More seriously, my gut feeling is that this is related to networking parameters (such as MRU and MTU) - cause 4 in my list of possible causes above.
    Since FortiClient/Windows in VPN-SSL mode does not share the same behavior and works properly, it could be these networking parameters are properly set on Windows. Could be interesting to investigate network settings on either systems - not sure how to collect these settings out of my head though.

  2. If a direct rsync doesn't work, then that's definitely a network issue you need to debug without VPN. But I believe it will work. By the way it would have been better to rsync to the same server with/without VPN - but that's probably not possible since you need a VPN in the first place!
    If I understand correctly, the problem is that each additional encapsulation add its own extra payload in packets, which means you may need lower initial MTU/MRU values to leave room for that additional payload.
    Unfortunately all this happens in network layer 2 (data link), which I'm even less familiar with than layer 3....

@DimitriPapadopoulos
Copy link
Collaborator

You could perhaps try this recipe, where you increase the size of packets sent by ping until it stops working:
Troubleshooting MTU size over IPSEC VPN

@robmukai
Copy link
Author

I'll try playing around with that see if that makes a difference. Thanks for the ideas.

@robmukai
Copy link
Author

Ok, so I'm not sure what I am looking at but a
ping -M do -s 1326 XXX.XXX.XXX.XXX

Gives a good ping
PING XXX.XXX.XXX.XXX (XXX.XXX.XXX.XXX) 1326(1354) bytes of data.
1334 bytes from XXX.XXX.XXX.XXX: icmp_seq=1 ttl=63 time=274 ms

ping -M do -s 1327 XXX.XXX.XXX.XXX
ping: local error: Message too long, mtu=1354

So what would you suggest I set the MTU on the Wifi Connection at?

@DimitriPapadopoulos
Copy link
Collaborator

I think the MTU is set on the inner encapsulated layer (here that would be pppd?) but again I'm not a specialist. For pppd the MTU can be set in the relevant options file of pppd (probably somewhere in /etc/ppp) or passed as a parameter to pppd (for that the openfortivpn code would need to be modified to pass proper options to pppd).
So I'd give a try to modifying options using a pppd option file, probably under /etc/ppp). Unfortunately I don't have time to help much more right now, I don't know how to set options in pppd.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Sep 27, 2017

I'd try setting the MTU on the Wi-Fi only if rsync and/or ping fail also without VPN.

@robmukai
Copy link
Author

@DimitriPapadopoulos I think we can close this. After testing for a day, changing the MTU on the WIFI connection to 1326 makes it work as well as the windows version does on my connection. Which is to say, it still closes, but only randomly and after it has run for a long time. Thanks for your help in thinking this through!

@DimitriPapadopoulos
Copy link
Collaborator

@robmukai Thank you for coming back to us. This will hopefully help other users of the software.

This does look like an issue with your network setup after all, however I'm not 100% certain there's nothing we could to help within openfortivpn - such as adding an option to set MTU for pppd or at least writing a paragraph about MTU in the documentation.

Also, how long is a long time in your case? Please note that there's a default timeout on the FortiGate server - set by default at 8 hours if I recall correctly.

@robmukai
Copy link
Author

@DimitriPapadopoulos I'm wondering about that. I'd be surprised if Windows 10 handles fragmented packets better than Ubuntu? If that is not the case, Is there something in the way that openfortivpn handles fragmented packets that causes the shut down? Don't know the answer to that, but the work around seems to be working well.

Not sure what a "Long Time" is. I usually run it over night, and it is down when I get to it in the morning. However, large files (as in GB sized files) have been transferred. It does occasionally drop in less than 8 hours as well, but that could be due to instability on the Microwave connection. Is there a way to log uptime on the connection? I'll see what the timeout is set for on the FortiGate. Also, is there a reason for the default timeout on the Fortigate?

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Sep 28, 2017

@robmukai I doubt Ubuntu cannot handle fragmented packets as well as Windows. I've read in some of the web pages I've read these last days that fragmented packets may be dropped by firewalls because they are a security issue (DoS) - in this case the Fortigate could drop the fragmented packets.

It could just be that the MTU is set correctly on Windows but not Ubuntu. Perhaps because there's some sort of driver for the microwave link on the Windows machine - which could perhaps properly set the MTU at 1326.

About the timeout, it's best to have a look at the logs FortiGate-side and check whether it shows a reason for the connection closing. Perhaps Forti support can help. Also ask them the rationale behind the FortiGate-side timeout.

@robmukai
Copy link
Author

@DimitriPapadopoulos So I ran a quick check on the Windows 10 box and get this:

netsh interface ipv4 show subinterfaces

MTU MediaSenseState Bytes In Bytes Out Interface


1354 1 3372 35244 fortissl
4294967295 1 1304 46329 Loopback Pseudo-Interface 1
1500 5 0 0 Ethernet 4
1500 1 13897964 1967937 Wi-Fi 3
1500 5 0 0 Ethernet 5
1500 5 0 0 Local Area Connection* 18

So the connection for the Forticlient MTU is 1354 (less 28 is 1326) So somewhere, it is setting the MTU correctly in windows. Not sure if it is the forticlient or windows itself doing it. You'll notice that the Wi-Fi connection is at 1500.

The microwave connection is completely transparent to the machines downstream. I actually run a small Inn and my guests don't have to do anything special to connect, and they bring all manner of devices from windows, apple, android, etc. I don't have any drivers or anything installed for it.

I'll keep an eye on the connecttion. If I can find the time it drops out, I can have my buddy, who owns the fortigate, check the logs to see what the fortigate sees at the time of disconnect.

Thanks so much for your help!

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Sep 29, 2017

@robmukai Great to have all this information, it will help debug future issues.

For future reference, let's recap what we know so far:

  • MTU issues resulting in fragmentation may cause the dreaded Input/output error.
  • Some firewalls and perhaps FortiGate devices drop fragmented packets for security reasons (can be used for DoS).
  • FortiClient/Windows in SSL mode seems to be setting a lower MTU on the fortissl interface (1354 instead of usual 1500), and you experience no errors with FortiClient/Windows.
  • FortiClient/Ubuntu does not seem to be setting a lower MTU, and you do experience errors similar to the openfortivpn errors. It looks like openfortivpn is doing no worse than FortiClient here.
  • On the other hand openfortivpn could do better than FortiClient!

What I don't know is whether MTU should always be set to a value lower than 1500, or only sometimes depending on MTU values along the path.

Also should MTU be set to a constant value, and if so which one, or variable values depending on MTU values along the path? In the latter case, how to discover MTU values along the path?

Other sources refer to setting MSS, not MTU.

On Linux there are tools to discover MTU values along the path. See for example tracepath. I have also read MTU woes in IPsec tunnels and how you can fix it and Path MTU discovery in practice and although I don't have time to really understand it, setting MTU does not seem that a robust technique after all...

Some links:

@robmukai
Copy link
Author

@DimitriPapadopoulos I agree with all 5 of your bullet points. Unfortunately, your questions go beyond my abilities. I guess my thought would be to see if we can figure out how FortiClient/Windows figures out the MTU and how it lowers it on the tunnel.

@DimitriPapadopoulos
Copy link
Collaborator

@robmukai For what it's worth, I've just looked up MTU values of the different interfaces on Ubuntu 16.04 LTS :

  • Wi-Fi
wlp2s0b1  Link encap:Ethernet  HWaddr xx:xx:xx:xx:xx:xx 
         inet addr:xxx.xxx.x.xx  Bcast:xxx.xxx.x.xxx  Mask:255.255.255.0
         inet6 addr: xxxx::xxxx:xxxx:xxxx:xxxx/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  • PPP connection created by openfortivpn
ppp0      Link encap:Point-to-Point Protocol  
          inet addr:xxx.xxx.xx.x  P-t-P:1.1.1.1  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1354  Metric:1

The MTU of the PPP connection is set to 1354 automatically. I haven't had to force or specify anything here. Isn't that the case when you run openfortivpn?

@robmukai
Copy link
Author

robmukai commented Oct 9, 2017

@DimitriPapadopoulos Ok this is really weird. So I reset my MTU on the wifi conntection back to "auto" and now the ppp0 connections is showing an MTU:1354 as well. So funny thing, it works without changing the MTU now. Not sure what would have caused it to not work before?

@DimitriPapadopoulos
Copy link
Collaborator

Strange indeed. As far as i can see, openfortivpn does not set the MTU. It has to be handled by pppd.

From the PPPD(8) man page:

mtu n
Set the MTU [Maximum Transmit Unit] value to n. Unless the peer requests a smaller value via MRU negotiation, pppd will request that the kernel networking code send data packets of no more than n bytes through the PPP network interface. Note that for the IPv6 protocol, the MTU must be at least 1280.

My guess is that this was a problem with path MTU discovery over pppd. Sometimes MTU discovery doesn't work because of poorly configured “security” appliances. Perhaps something changed along the path?

@DimitriPapadopoulos
Copy link
Collaborator

If MTU discovery does not work as expected, users should probably work around the issue in the software responsible for MTU discovery, namely pppd as far as I can tell.

So one answer might be that this is not openfortivpn issue.

On the other hand openfortivpn could have an option to force MTU, that would simply be passed to pppd as option mtu.

@DimitriPapadopoulos
Copy link
Collaborator

For the record the MRU is actually set by openfortivpn to 1354:

		char *args[] = {
			"/usr/sbin/pppd", "38400", "noipdefault", "noaccomp",
			"noauth", "default-asyncmap", "nopcomp", "receive-all",
			"nodefaultroute", ":1.1.1.1", "nodetach",
			"lcp-max-configure", "40", "mru", "1354",
			NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL,
			NULL
		};

This a mystery! I'll close the issue for now, but do not hesitate to come back to us if needed.

@CristianCardosoA
Copy link

Iḿ using Fedora 26. I installed version 1.5.0

Iḿ getting this issue: pppd: The link was terminated because the peer is not responding to echo requests.

What can I do ?

@DimitriPapadopoulos
Copy link
Collaborator

Two things you can do :-)

  1. Nothing if it works for you - I'm aware of this error message and I will find a workaround.
  2. Open an new issue if it doesn't work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants