Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qubes 4.1 - VPN over Tor netvms: ARP request does not get resolved properly #7123

Closed
unknown-ter opened this issue Dec 15, 2021 · 25 comments · Fixed by QubesOS/qubes-core-agent-linux#345 or QubesOS/qubes-core-agent-linux#384
Labels
C: core C: networking diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Milestone

Comments

@unknown-ter
Copy link

Qubes OS release

Qubes 4.1 RC2

Brief summary

A VPN Gateway over Tor worked fine with Qubes 4.0.
It does not work with Qubes 4.1 anymore. ARP requests aren't proxied properly.

Steps to reproduce

Configure VPN gateway as described in the Contrib docs.
Configure the Tor gateway according to the steps described in Whonix docs.

Then:

ping 1.1.1.1 # or 8.8.8.8

Qubes 4.0 -> OK
Qubes 4.1 -> No resolution

Expected behavior

Same behavior as Qubes 4.0

Actual behavior

ARP requests are not getting proxied properly / do not receive the MAC address fe:ff:ff:ff:ff:ff of the corresponding netVM.

Sample from user aUsername:

bash-5.1# ip neigh show
10.137.0.6 dev eth0 lladdr fe:ff:ff:ff:ff:ff PERMANENT
212.129.0.80 dev eth0  INCOMPLETE

, whereas 212.129.0.80 in this case is the of the remote VPN gateway.

aUsename also mentions a fix:

arp -s <IP> -i eth0 fe:ff:ff:ff:ff:ff

This manual correction has not been necessary with Qubes 4.0.

The Whonix maintainer states here, there haven't been related changes in the Whonix configuration:

You could report this issue on qubes-issues because nothing related in the Whonix configuration changed , no such issue with Non-Qubes-Whonix reported. Therefore some change between Qubes 4.0 and Qubes R4.1 might have caused this.

Related issues

@unknown-ter unknown-ter added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Dec 15, 2021
DemiMarie added a commit to DemiMarie/qubes-core-agent-linux that referenced this issue Dec 15, 2021
They cause all sorts of regressions, of which QubesOS/qubes-issues#7123
is just the most recent.  In the future, Qubes OS should switch to pure
layer-3 links between qubes, with no layer-2 addressing at all.  That is
for another day, however.
@andrewdavidwong andrewdavidwong added C: core diagnosed Technical diagnosis has been performed (see issue comments). pr submitted A pull request has been submitted for this issue. labels Dec 15, 2021
@andrewdavidwong andrewdavidwong added this to the Release 4.1 milestone Dec 15, 2021
@rudolfocode
Copy link

why is this marked as closed, it seems alot of people are having the same issue, me included. Are we all expected to implement this hacky workaround everytime?

@marmarek
Copy link
Member

marmarek commented Feb 8, 2022

Are you sure you use up to date packages? The update linked above should fix the issue.

@rudolfocode
Copy link

forgive me, how can I do that?

@marmarek
Copy link
Member

marmarek commented Feb 8, 2022

you can check package version with rpm -q qubes-core-agent (in Fedora) or dpkg -l qubes-core-aget (in Debian).

you can install updates as in https://www.qubes-os.org/doc/how-to-update/#routine-updates

@rudolfocode
Copy link

rudolfocode commented Feb 8, 2022

you can install updates as in https://www.qubes-os.org/doc/how-to-update/#routine-updates

yes that's as I thought, although it says I don't have any updates.. I am now forcing “Enable updates for qubes without known available updates,”

@adrelanos
Copy link
Member

you can install updates as in https://www.qubes-os.org/doc/how-to-update/#routine-updates

yes that's as I thought, although it says I don't have any updates.. I am now forcing “Enable updates for qubes without known available updates,”

@rudolfocode
Copy link

rudolfocode commented Feb 9, 2022

My initial issue still occurs, probably because after running updates nothing is updated/ or the updates haven't fixed the issue

As a consequence I cannot resolve.. the original issue, to clarify my issue is using this config for PROXY-VM.

However using my VPN provider config file VPN tests fine, connects and stays up, although after implementing @tasket Qubes-vpn-support or @andrewdavidwong script above and restarting proxy-vm I get continuous "RESOLVE: Cannot resolve host address: xxxx.yyyyy.net" this issue occurs using either script in fedora 34 and Debian 11. The VM will then most often fail to shutdown even after multiple attempts and has to be killed, very occasionally after restarting the proxy-vm it will connect although briefly and will return to not being able to resolve the host address.

@adrelanos
Copy link
Member

Did you check package versions as per #7123 (comment)?

If package versions are outdated / there's a newer package version available but updates don't work for you then that's a separate issue and should be resolved as per the usual https://www.qubes-os.org/support/ process and not in this ticket which has a narrow scope.

@rudolfocode
Copy link

I don't have the new packages and according to qubes update tool there are no updates, fortunately its a pretty new install of rc3 so I will reinstall with the new 4.1 stable and try to install any further updates from there. Thanks for getting back to me even though you are a whonix dev, kind of you.

@ew0k
Copy link

ew0k commented Mar 30, 2022

I am experiencing the same issue fresh 4.1 install latest updates - and yes i verified latest version qubes core agent

the arp workaround fixes it

@andrewdavidwong andrewdavidwong added needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. and removed diagnosed Technical diagnosis has been performed (see issue comments). labels Mar 30, 2022
@DemiMarie
Copy link

I am experiencing the same issue fresh 4.1 install latest updates - and yes i verified latest version qubes core agent

the arp workaround fixes it

This looks like missing Proxy ARP on the upstream side. Does this reproduce with a non-sys-whonix NetVM?

In any case, I think I may have a workaround using a custom C program.

@marmarek
Copy link
Member

This looks like missing Proxy ARP on the upstream side. Does this reproduce with a non-sys-whonix NetVM?

The routing table should point at specific IP as a gateway, so proxy ARP should not be needed. At some point we used to have just ip r a default dev eth0 - which did required proxy ARP, but it shouldn't be the case anymore. That's why I'm asking about the routing table.

In any case, I think I may have a workaround using a custom C program.

If making networking work requires "custom C program", something went really wrong. IOW, please don't, lets find a proper solution.

@DemiMarie
Copy link

Nevermind, the workaround does not work.

@qubesfan35267
Copy link

"VPN over TOR" will work under Qubes 4.1 if you put a firewall-vm between the vpn-proxy and sys-whonix.

Best regards.

@DemiMarie
Copy link

Summary of the problem:

  • Whonix disables Proxy ARP/Proxy NDP on its downstream interfaces for reasons that @adrelanos would need to explain.
  • Qubes works fine in spite of this because the route to upstream is declared as onlink.
  • However, the VPN does not create an onlink route, and breakage ensues.

In the long term, I think the best solution is to switch from Ethernet links to point-to-point links. Qubes uses them as point-to-point links anyway, and the current situation keeps causing problems.

@qubesfan35267
Copy link

With this setup "app-vm -> vpn-proxy -> firewall-vm-2 -> sys-whonix -> firewall-vm-1 -> sys-net" I did not see any breakage. The VPN is routed over the onion circuit without any problems. Everything seems to work very stable without leaks or breaks.

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). C: networking and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. pr submitted A pull request has been submitted for this issue. labels May 25, 2022
@Enteee
Copy link

Enteee commented Jun 3, 2022

@DemiMarie : whilst enabling Proxy ARP in sys-whonix might be a workaround for this, I don't see why arp got involved in this issue in the first place. Following up on @marmarek 's thoughts i did dig a bit deeper and i do believe today that the root cause for this issue is a scope host route entry on the downstream appvm connected to sys-whonix.

Reproduce

  1. Create fedora-34 appvm with net set to sys-whonix (called: downstream)
  2. Install tcpdump in downstream: sudo dnf install tcpdump
  3. In downstream:
$ ip r s
default via 10.137.0.12 dev eth0 onlink 
10.137.0.12 dev eth0 scope host onlink 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
  1. In downstream start pinging google: ping 8.8.8.8
  2. Verify with tcpdump that echo requests are sent. Note: We won't get a reply, but that's expected since sys-whonix does not support icmp.
$ sudo tcpdump -n -i eth0 icmp or arp
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:54:39.846791 IP 10.137.0.26 > 8.8.8.8: ICMP echo request, id 1, seq 27, length 64
18:54:40.870820 IP 10.137.0.26 > 8.8.8.8: ICMP echo request, id 1, seq 28, length 64
18:54:41.894789 IP 10.137.0.26 > 8.8.8.8: ICMP echo request, id 1, seq 29, length 64
  1. Add a specific route for 8.8.8.8: $ sudo ip r a 8.8.8.8 via 10.137.0.12 dev eth0
  2. Downstream will start tying to resolve 8.8.8.8 's MAC using ARP:
18:56:14.055169 ARP, Request who-has 8.8.8.8 tell 10.137.0.26, length 28
18:56:15.078714 ARP, Request who-has 8.8.8.8 tell 10.137.0.26, length 28
18:56:16.103781 ARP, Request who-has 8.8.8.8 tell 10.137.0.26, length 28

-> This is very weird since downstream perfectly knows what the next-hop for 8.8.8.8 should be. My assumption is that the kernel does detect that 8.8.8.8 should be reached via eth0 but when trying to resole the via it fails to honor the 10.137.0.12 dev eth0 scope host onlink rule.

Workaround

Downstream does not fall back to arp requests if the scope of the 10.137.0.12 would have been scope link instead of scope host:

$ sudo ip r d 10.137.0.12 dev eth0 scope host onlink 
$ sudo ip r a 10.137.0.12 dev eth0
$ ip r s
default via 10.137.0.12 dev eth0 onlink 
10.137.0.12 dev eth0 scope link 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown

and downstream starts sending icmp messages again:

19:19:56.690645 IP 10.137.0.26 > 8.8.8.8: ICMP echo request, id 1, seq 224, length 64
19:19:57.714632 IP 10.137.0.26 > 8.8.8.8: ICMP echo request, id 1, seq 225, length 64
19:19:58.738611 IP 10.137.0.26 > 8.8.8.8: ICMP echo request, id 1, seq 226, length 64
19:19:59.762614 IP 10.137.0.26 > 8.8.8.8: ICMP echo request, id 1, seq 227, length 64

⚠️ Note: if you change the scope of the 10.137.0.12 AFTER you added the 8.8.8.8 rule you need to re-create the 8.8.8.8 rule:

$ sudo ip r d 8.8.8.8 via 10.137.0.12 dev eth0 
$ sudo ip r a 8.8.8.8 via 10.137.0.12 dev eth0

Conclusion

I am by no means an expert on linux routing. But sending ARP requests for any hosts sounds just wrong. So I am not quite sure if an ARP proxy in sys-whonix is the correct solution for this. Also I don't fully understand why the routing decision is affected that much by the scoping of the next hop route. I also don't know the reason why qubes adds a scope host route for sys-whonix when an appvm is connected to sys-whonix. But maybe changing this behavior could solve this issue?

@DemiMarie
Copy link

@Enteee Thanks! I will be able to come up with a patch based on that.

@DemiMarie
Copy link

@Enteee what name and email address would you like me to use in the Suggested-by in the commit message?

@andrewdavidwong andrewdavidwong added the pr submitted A pull request has been submitted for this issue. label Jun 5, 2022
@Enteee
Copy link

Enteee commented Jun 6, 2022

@DemiMarie the commit looks good. Thanks for patching this! 🚀

@DemiMarie
Copy link

@DemiMarie the commit looks good. Thanks for patching this! 🚀

You’re welcome! Thanks for figuring out the problem! That was the hard part; changing the shell script was trivial.

@adrelanos
Copy link
Member

Seems like fix by @DemiMarie (thanks!) QubesOS/qubes-core-agent-linux#384 didn't land in any packages yet?

I am wondering why there aren't any notifications by Qubes bot that packages were built and upload to Qubes repository?

@adrelanos
Copy link
Member

A user on an updated system reported to still have the old version of network/setup-ip. That means not having received the changes in PR QubesOS/qubes-core-agent-linux#384 yet. After manually updating /usr/lib/qubes/setup-ip the issue of broken
user -> Tor -> VPN -> destination
has been fixed.

@Enteee
Copy link

Enteee commented Oct 24, 2022

Yeah i would also really welcome it if the change from the pull request could be released as a new version of qubes-core-agent-linux.

@marmarek

marmarek pushed a commit to QubesOS/qubes-core-agent-linux that referenced this issue Nov 29, 2022
Host scope means "only valid within this host", which is certainly not
correct for Xen virtual devices.  Link scope means "only valid in the
context of this link", which is correct.

Fixes: QubesOS/qubes-issues#7123.
Suggested-by: Ente <ducksource@duckpond.ch>
(cherry picked from commit 0220911)
@adrelanos
Copy link
Member

In Qubes R4.2 at least /usr/lib/qubes/setup-ip seems to have the changes which were applied in the linked PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core C: networking diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
9 participants