Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for NAT-Traversal and Hole-Punching #357

Closed
wants to merge 2 commits into from
Closed

Conversation

emmacasolin
Copy link
Contributor

@emmacasolin emmacasolin commented Mar 10, 2022

Description

The architecture for our NAT busting will be completed in #326. Once this is done, we should be able to run fully simulated tests for communication between nodes using hole-punching across NAT. This will involve writing Jest tests that create namespaces with iptables rules to simulate different kinds of NAT and creating nodes inside of these namespaces. Specifically, we are only looking at endpoint-independent and endpoint-dependent NAT mapping, since our nat busting should be able to traverse all different types of firewalls. Thus, these tests will only be simulating port-restricted cone (endpoint-independent) and symmetric (endpoint-dependent) NAT.

The test cases that need to be considered here are:

  1. Node1 connect to Node2 - basic sanity test
  2. Node1 behind NAT connects to Node2 - here Node1 is acting like a client and it is behind a NAT, connecting to an open Node2 that isn't behind NAT
  3. Node1 connects to Node2 behind NAT - here Node1 is acting like a client, connecting to a closed Node2 that is behind a NAT
  4. Node1 behind NAT connects to Node2 behind NAT - here Node1 is acting like a client and it is behind NAT, and it is connecting to Node2 which is also behind NAT

These tests should cover all possible NAT combinations and should be repeated both with and without the use of a seed node as a signalling server (as such, some tests will be expected to fail, since the seed nodes are used in our NAT busting).

Relates to #148

Issues Fixed

Tasks

  • 1. Create util functions for creating namespaces and interfaces, as well as for setting up various iptables rules
  • 2. Write tests for basic connection (no NAT, sanity test)
  • 3. Write tests for one node behind NAT
  • 4. Write tests for both nodes behind NAT
  • 5. Write tests using a seed node
  • 6. Ensure tests are separated from the rest of the testing suite (i.e. are not run with npm run test), and make sure they check that the operating system is Linux before running
  • [ ] 7. Investigate no STDERR logs when running pk agent status or pk agent start in the beginning Tests for NAT-Traversal and Hole-Punching #357 (comment) could be resolved in Testnet Deployment #326 - resolved here: Tests for NAT-Traversal and Hole-Punching #357 (comment)
  • 8. Investigate the usage of unshare instead of ip netns which reduces the overhead of using sudo and makes our tests more accessible in other platforms like CI/CD.
  • [ ] 9. Authenticate the sender of hold-punch messages Authenticate the sender of a hole-punching signalling message #148 - Not relevant to this PR, we will need to review the security of the P2P system after testnet deployment

Final checklist

  • Domain specific tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@emmacasolin emmacasolin self-assigned this Mar 10, 2022
@emmacasolin
Copy link
Contributor Author

This PR is currently on top of master - once #326 is ready however it will need to be rebased on top of that in order to proceed with tests that involve using a seed node.

@CMCDragonkai CMCDragonkai mentioned this pull request Mar 10, 2022
29 tasks
@emmacasolin
Copy link
Contributor Author

I'm noticing some interesting behaviour when using ip netns attach instead of ip netns add in order to create namespaces. Since we want the nodes in our NAT tests to be running in their own separate network namespaces, I thought that the best way to achieve this would be to use ip netns attach instead of ip netna add, such that the network namespaces could be created with a node's pid.

ip netns add NAME - create a new named network namespace

              If NAME is available in /var/run/netns this command
              creates a new network namespace and assigns NAME.
ip netns attach NAME PID - create a new named network namespace

              If NAME is available in /var/run/netns this command
              attaches the network namespace of the process PID to NAME
              as if it were created with ip netns.

However, when doing this it looks like the created namespaces, as well as the parent (host) namespace, all share the same network configurations, and any changes to one of them propagate to the others as well. This does not happen for namespaces created using ip netns add, and we don't want it to happen since it defeats the whole purpose of using namespaces in the first place (to create isolated network configurations with their own individual settings and test the communication between them).

Looking at the descriptions of these commands however, it's likely that this is happening because ip netns attach is attaching the newly created namespace to the existing host one, merging them together. So this definitely isn't what we want for our purposes.

So we're back to square 1 of needing to create the namespace before creating the node, and creating the node inside the created namespace, but I'm not sure how this could be achieved.

@CMCDragonkai
Copy link
Member

Not entirely sure but if you see my gist example, you create the network namespace first using ip netns add, and then just create the nodes within the namespace by doing ip netns exec ....

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 11, 2022

This should work at the very least.

# enter into the elevated nix-shell
sudo nix-shell
ip netns add node1
ip netns exec node1 ip link set dev lo up
ip netns exec node1 npm run polykey -- agent start --node-path=./tmp/node1 --password-file=<echo 'abc'> --verbose

Then run another elevated nix shell to run the npm run polykey -- agent status command.

@emmacasolin
Copy link
Contributor Author

emmacasolin commented Mar 11, 2022

Since we need to use nix-shell in order to run polyley (and thus create a node) as well as sudo in order to run processes inside a namespace using ip netns exec, we need to nest these using sudo nix-shell. Once inside this elevated nix-shell, we can create a node inside a namespace using the following commands:

# Create a namespace for the node
ip netns add node1
# Bring up its loopback interface
ip netns exec node1 ip link set lo up
# Start the agent
ip netns exec node1 npm run polykey -- agent start --node-path=./tmp/node1 --password-file=<(echo 'abc') --verbose

If we create a veth pair between the node namespace and host namespace, we can run commands from the host that interact with the node in the other namespace, for example:

# Create veth pair (in host namspace)
ip link add veth0 type veth peer name veth1
# Set one end on the node namespace
ip link set veth1 netns node1
# Bring up both ends and assign ip addresses
ip link set veth0 up
ip netns exec node1 ip link set veth1 up
ip addr add 10.0.0.1/24 dev veth0
ip netns exec node1 ip addr add 10.0.0.2/24 dev veth1
# Check status of node from host
npm run polykey -- agent status --node-path=./tmp/node1 --password-file=<(echo 'abc') --client-host=10.0.0.2 --verbose

@CMCDragonkai
Copy link
Member

Some general notes:

  1. Remember that sudo runs an elevated subshell acting as the root user, it affects what environment variables are passed in. In our NixOS configuration, this automatically preserves things like $PATH when you do sudo bash -c 'echo $PATH'. Compare it with echo $PATH. However if you use sudo -i that's not the case, so this only works for immediate commands (non-login shells), not the new login & interactive shells. See man sudo and look for -i section.
  2. It is recommended to use sudo nix-shell then ip netns exec.... This results in a 4-layer shell, going from your default user profile shell, to sudo shell to nix-shell to ip netns exec shell. This ensures that all commands run inside that nix-shell will be elevated. Avoid running nix-shell then running sudo as this can be confusing.
  3. Remember that files & directories created by the programs running under sudo will end up being owned by the root user. This can cause problems afterwards if you later run without sudo which is what we normally do. Remember to keep track of the files/directories created in ./tmp and in ./node_modules and you may need to delete them with sudo rm -r before restarting the process if random files/directories are left over from a prior sudo session.
  4. Note that pk is just an "alias" for how you are supposed to run the polykey program. In development, you need to use npm run polykey inside a nix-shell, whereas for a deployed polykey program/installed polykey, you can use pk or polykey after using npm install. If you use npm run build, you have to use ./dist/bin/polykey.js but you must first do chmod u+x ./dist/bin/polykey.js to give it executable permissions, this would normally be done by npm install after npm publish, but we don't really "install" our development projects.
  5. An alternative to using sudo is to use "user-namespaces". To do this, we have to use the unshare command rather than ip netns... etc which creates root-level namespaces. I haven't tried this yet, but this could enable us to run these tests without having to enter sudo, and also means it could work in our CI/CD.

@CMCDragonkai
Copy link
Member

Weird error that came up, STDERR logs is not being shown when we use pk agent start --password-file=<(echo 'abc')> --verbose.This also affects the pk agent status command. This is a bug that should be fixed. I have changes already in #326 but not sure if it affects this.

@CMCDragonkai
Copy link
Member

@emmacasolin after you have it working with ip netns, please investigate if you can instead use unshare to use user-level namespaces. It seems like it should work for localhost, but not sure how this combines with veth pairs, this may require some googling, so that we can avoid using sudo completely.

@emmacasolin
Copy link
Contributor Author

Weird error that came up, STDERR logs is not being shown when we use pk agent start --password-file=<(echo 'abc')> --verbose.This also affects the pk agent status command. This is a bug that should be fixed. I have changes already in #326 but not sure if it affects this.

Found the problem: there was an extra > after the ). The call should instead be pk agent start --password-file=<(echo 'abc') --verbose

@emmacasolin
Copy link
Contributor Author

@emmacasolin after you have it working with ip netns, please investigate if you can instead use unshare to use user-level namespaces. It seems like it should work for localhost, but not sure how this combines with veth pairs, this may require some googling, so that we can avoid using sudo completely.

From what I've been able to find, and through a bit of prototyping, I don't think this will work. While unshare --net --user npm run polykey -- agent start --node-path=./tmp/node1 --password-file=<(echo 'abc') --verbose does allow you to create a new network namespace for the node without using sudo, the actual namespace is not at the user level. What this call does is simply create two namespaces (a network namespace and a user namespace) that share the same pid - if you want to manipulate the network namespace you still need sudo to do so.

$ lsns
        NS TYPE   NPROCS    PID USER COMMAND
4026533265 user        2 101459 emma npm
4026533271 net         2 101459 emma npm
$ nsenter -t 101459 -n ip addr
nsenter: reassociate to namespace 'ns/net' failed: Operation not permitted
$ sudo nsenter -t 101459 --net ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

@CMCDragonkai
Copy link
Member

@emmacasolin please investigate #148 as part of this as well.

@CMCDragonkai CMCDragonkai self-requested a review March 11, 2022 06:57
@CMCDragonkai
Copy link
Member

@emmacasolin after you have it working with ip netns, please investigate if you can instead use unshare to use user-level namespaces. It seems like it should work for localhost, but not sure how this combines with veth pairs, this may require some googling, so that we can avoid using sudo completely.

From what I've been able to find, and through a bit of prototyping, I don't think this will work. While unshare --net --user npm run polykey -- agent start --node-path=./tmp/node1 --password-file=<(echo 'abc') --verbose does allow you to create a new network namespace for the node without using sudo, the actual namespace is not at the user level. What this call does is simply create two namespaces (a network namespace and a user namespace) that share the same pid - if you want to manipulate the network namespace you still need sudo to do so.

$ lsns
        NS TYPE   NPROCS    PID USER COMMAND
4026533265 user        2 101459 emma npm
4026533271 net         2 101459 emma npm
$ nsenter -t 101459 -n ip addr
nsenter: reassociate to namespace 'ns/net' failed: Operation not permitted
$ sudo nsenter -t 101459 --net ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

So nsenter cannot be used on the polykey agent status command? I'm curious is nsenter the equivalent of ip netns exec?

@CMCDragonkai
Copy link
Member

Try this rootless-containers/slirp4netns#222.

The options of nsenter should be checked.

@CMCDragonkai
Copy link
Member

Also do you need --user for unshare?

@emmacasolin
Copy link
Contributor Author

So nsenter cannot be used on the polykey agent status command? I'm curious is nsenter the equivalent of ip netns exec?

nsenter is used to enter a namespace using a program, so it should work with polykey agent status, however it requires sudo permissions just like ip netns exec. The difference between ip netns exec and nsenter is that ip netns exec can only be used for namespaces created using ip netns, whereas nsenter can be used for namespaces regardless of how they were created (e.g. namespaces created using unshare), however for nsenter you have to provide the pid of the namespace you want to enter, which can be difficult to find.

@emmacasolin
Copy link
Contributor Author

Try this rootless-containers/slirp4netns#222.

The options of nsenter should be checked.

I could try using --preserve-credentials. -U enters the user namespace that was created rather than the network namespace, so the network options we need for our tests wouldn't be available there.

@emmacasolin
Copy link
Contributor Author

Also do you need --user for unshare?

Yes and no. What --user is doing is creating a user namespace (as well as the network namespace specified by setting --net). We don't need to use the user namespace, however it we don't create one then unshare requires sudo.

@emmacasolin
Copy link
Contributor Author

Try this rootless-containers/slirp4netns#222.
The options of nsenter should be checked.

I could try using --preserve-credentials. -U enters the user namespace that was created rather than the network namespace, so the network options we need for our tests wouldn't be available there.

Nope, just tried --preserve-credentials and it still requires sudo

@emmacasolin
Copy link
Contributor Author

Ok, I think I'e figured out how to set up a rootless network namespace. Essentially, what we need to do is create a new user namespace, and then create the network namespace from inside the user namespace. This also means that any time we want to perform any operations on the network namespace we have to go through the user namespace. Aside from that, the only other downside is that the only way to identify these namespaces is through their pid, which isn't easy to find (what I've been doing is running lsns and then looking for the most recent pid that is shared by a network and user namespace and that is associated with the command npm).

In terminal 1

# Create a new user namespace (--user) and wait for the user to receive root privileges (--map-root-user)
# Inside the user namespace create a new network namespace (--net) and start a polykey agent inside of it
unshare --user --map-root-user unshare --net npm run polykey -- agent start --node-path=./tmp/node5 --password-file=<(echo 'abc') --verbose

In terminal 2

# Before this you need to find the pid of the namespaces using lsns
# Here the pid is 154500
# Enter the user namespace with pid 154500, preserving the credentials we gave it earlier (so we can enter the network namespace) (--target 154500 --user --preserve-credentials)
# Then enter the network namespace with pid 154500 from the user namespace we just entered and run the ip addr command
nsenter --target 154500 --user --preserve-credentials nsenter --target 154500 --net ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

# Can also run other commands here that work in network namespaces, e.g.
nsenter -t 154500 -U --preserve-credentials nsenter -t 154500 -n ip link set lo up
nsenter -t 154500 -U --preserve-credentials nsenter -t 154500 -n ip addr          
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever

@CMCDragonkai
Copy link
Member

  1. Use spawn based off pkSpawn for long-running/non-terminating processes like agent start.
  2. It will be easier to get the node id and other information as soon as you do an agent start. This is available in Testnet Deployment #326. You can create a temporary branch off commit 2df87bb and then rebase this on top. Use git branch git branch testnetpreptmp 2df87bb`.

@CMCDragonkai
Copy link
Member

Then you'd use pk agent start with --format=json too to get the output in JSON format so you can parse the JSON. Check STDOUT, I think it goes to STDOUT while logs go to STDERR. And remember to use spawn!

Look at the existing tests to see how to process stdout/stderr streams from pkSpawn. The first line of output should be JSON.

@CMCDragonkai
Copy link
Member

It appears that master has already changed to getHost and getPort, so I should fix that up in #326, you can rebase on top again after squashing.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 14, 2022

BTW given that nsenter and unshare programs are part of the utillinux package. This is installed in our NixOS configuration, but best practices would be to have it in our shell.nix as well under nativeBuildInputs. Just add utillinux at the end of the list so that the shell specifies the installation of these additional packages, which will be necessary when running these specific tests.

@CMCDragonkai
Copy link
Member

Finally it makes sense that to use a network namespace, we have to first create a user namespace. It's explained in this answer: https://unix.stackexchange.com/a/618956/56970.

Basically you need to first namespace the uid and gid, so you can masquerade as the root user within the user namespace. Then finally inside that namespace you can create the underlying network namespace.

BTW this should mean you don't need to use nsenter twice right?

@emmacasolin
Copy link
Contributor Author

Rebased on 2df87bb in order to bring in the change from the Testnet deployment branch where pk agent start returns status info on startup.

@emmacasolin
Copy link
Contributor Author

Finally it makes sense that to use a network namespace, we have to first create a user namespace. It's explained in this answer: https://unix.stackexchange.com/a/618956/56970.

Basically you need to first namespace the uid and gid, so you can masquerade as the root user within the user namespace. Then finally inside that namespace you can create the underlying network namespace.

BTW this should mean you don't need to use nsenter twice right?

You do still need to use nsenter twice, since you need to enter the user namespace and then enter the network namespace. Unless you were to nsenter into a shell in the user namespace and run all subsequent commands from within that, although you're still having to nsenter twice.

@CMCDragonkai
Copy link
Member

Note that without the MASQUERADE rules, no conntrack entries are ever created. So this has to do with the MASQUERADE rules and the fact that they are creating a conntrack entry for packets coming from the external interface, which it shouldn't do.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 7, 2022

@tegefaulkes you're still on the old Linux right, hasn't been updated to the latest vostro 5410.

Can you try and see if the old iptables behaves differently.

Do this:

ip netns add agent1
ip netns add router1
ip netns add router2
ip netns add agent2

# show all network namespaces
ip netns list

# creates an interface a1 (otherside is r1-int)
ip netns exec agent1 ip link add a1 type veth peer name r1-int
# set r1-int to router1
ip netns exec agent1 ip link set r1-int netns router1

# creates an interface r1-ext (otherside is rw-ext)
ip netns exec router1 ip link add r1-ext type veth peer name r2-ext
# sets r2-ext to router2
ip netns exec router1 ip link set r2-ext netns router2

# creates an interface r2-int (otherwisde is a2)
ip netns exec router2 ip link add r2-int type veth peer name a2
# sets a2 to agent2
ip netns exec router2 ip link set a2 netns agent2

# set up interfaces for agent1
ip netns exec agent1 ip link set lo up
ip netns exec agent1 ip link set a1 up
ip netns exec agent1 ip addr add 10.0.0.2/24 dev a1
ip netns exec agent1 ip route add default via 10.0.0.1

# set up interfaces for router1
ip netns exec router1 ip link set lo up
ip netns exec router1 ip link set r1-int up
ip netns exec router1 ip link set r1-ext up
ip netns exec router1 ip addr add 10.0.0.1/24 dev r1-int
ip netns exec router1 ip addr add 192.168.0.1/24 dev r1-ext
ip netns exec router1 ip route add default via 192.168.0.2

# setup interfaces for router2
ip netns exec router2 ip link set lo up
ip netns exec router2 ip link set r2-int up
ip netns exec router2 ip link set r2-ext up
ip netns exec router2 ip addr add 10.0.0.1/24 dev r2-int
ip netns exec router2 ip addr add 192.168.0.2/24 dev r2-ext
ip netns exec router2 ip route add default via 192.168.0.1

# setup interfaces for agent2
ip netns exec agent2 ip link set lo up
ip netns exec agent2 ip link set a2 up
ip netns exec agent2 ip addr add 10.0.0.2/24 dev a2
ip netns exec agent2 ip route add default via 10.0.0.1

# Setting up port-restricted NAT for both routers (mapping will always be to port 55555 for easier testing)
ip netns exec router1 iptables -t nat -A POSTROUTING -p udp -o r1-ext -j MASQUERADE --to-ports 55555
ip netns exec router2 iptables -t nat -A POSTROUTING -p udp -o r2-ext -j MASQUERADE --to-ports 55555

Then on 2 terminals:

$ ip netns exec agent1 nc -u -p 55555 192.168.0.2 55555
FIRST
ip netns exec agent2 nc -u -p 55555 192.168.0.1 55555
SECOND

Then on a third terminal, check:

# make sure to use sudo here
sudo ip netns exec router1 conntrack -L 
sudo ip netns exec router2 conntrack -L 

Can you post the output of both commands? We are seeing something like:

$ sudo ip netns exec router1 conntrack -L
udp      17 14 src=10.0.0.2 dst=192.168.0.2 sport=55555 dport=55555 [UNREPLIED] src=192.168.0.2 dst=192.168.0.1 sport=55555 dport=55555 mark=0 use=1
conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.

$ sudo ip netns exec router2 conntrack -L
udp      17 16 src=192.168.0.1 dst=192.168.0.2 sport=55555 dport=55555 [UNREPLIED] src=192.168.0.2 dst=192.168.0.1 sport=55555 dport=55555 mark=0 use=1
conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.

And we would like to not see any conntrack entries on router2.

Use ip -all netns delete to remove all the infra above.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 7, 2022

Some possible directions:

  1. Find out if iptables legacy had a different behavour
  2. What if you create the veth pairs outside in the host namespace then add them into the namespaces?
  3. What changes to the iptables rules so that way we only "MASQUERADE" for packets coming from the internal interfaces.
    • Tried with -s 10.0.0.0/24, still didn't work
    • If a conntrack entry gets created for packet from external interface, this basically breaks for packets coming from internal interface. We must not create conntrack entries coming from the external interface!
  4. Maybe the routing table should be configured differently to prevent packets coming out of the external interface triggering the "MASQUERADE" rule?

@CMCDragonkai
Copy link
Member

Pending @tegefaulkes test on iptables legacy, we can then post a question about why and how to prevent iptables from creating conntrack entries for the external interface. If we can fix this, I reckon the whole system should work.

@tegefaulkes
Copy link
Contributor

the results I got were

faulkes@faulkeswork> sudo ip netns exec router1 conntrack -L                                      ~/matrixcode/test
udp      17 20 src=10.0.0.2 dst=192.168.0.2 sport=55555 dport=55555 [UNREPLIED] src=192.168.0.2 dst=192.168.0.1 sport=55555 dport=55555 mark=0 use=1
conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.
faulkes@faulkeswork> sudo ip netns exec router2 conntrack -L                                      ~/matrixcode/test
udp      17 15 src=192.168.0.1 dst=192.168.0.2 sport=55555 dport=55555 [UNREPLIED] src=192.168.0.2 dst=192.168.0.1 sport=55555 dport=55555 mark=0 use=1
conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.
faulkes@faulkeswork>   

@CMCDragonkai
Copy link
Member

So yea, iptables-legacy has the same issue @emmacasolin. Can you also try with your nftable rules instead and see if you have more control there?

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 7, 2022

@emmacasolin according to my gist: https://gist.github.com/CMCDragonkai/3f3649d7f1be9c7df36f

There's a couple assumptions here.

  1. That net.ipv4.conf.all.forwarding is enabled. But how do we know this to be true for network namespaces and CI/CD? Since we cannot control these settings. Are we sure this all works even without forwarding enabled?
  2. In my original system I actually had forwarding rules also and the -s limitation:
    # use firewall rules to route packets between the internet interface and veth0 on the host
    # masquerade makes it so that packets going from namespace subnet will be made to seem as 
    # if the packet came from the internet_host
    iptables -t nat -A POSTROUTING -s "${namespace_subnet}${namespace_cidr}" -o $internet_host -j MASQUERADE
    iptables -A FORWARD -i $internet_host -o $veth_host -j ACCEPT
    iptables -A FORWARD -o $internet_host -i $veth_host -j ACCEPT
    
  3. Additionally I had actual rules to drop packets from the outside world...
    # use firewall rules inside the namespace to allow packets going to the gateway and DNS
    # but disallow packets heading to local and private networks
    ip netns exec $namespace iptables -A OUTPUT -d "$veth_host_ip" -j ACCEPT
    ip netns exec $namespace iptables -A OUTPUT -d "$dns_host" -j ACCEPT
    ip netns exec $namespace iptables -A OUTPUT -d 127.0.0.0/8 -j DROP
    ip netns exec $namespace iptables -A OUTPUT -d 10.0.0.0/8 -j DROP
    ip netns exec $namespace iptables -A OUTPUT -d 172.16.0.0/12 -j DROP
    ip netns exec $namespace iptables -A OUTPUT -d 192.168.0.0/16 -j DROP
    

Why are FORWARD rules not necessary anymore? Maybe incorporate these and this may help?

I think you should also make the NAT work for TCP too and drop the UDP requirement.

@CMCDragonkai
Copy link
Member

Our test should check if the forwarding options are enabled https://unix.stackexchange.com/a/348534/56970

Before running the test:

[nix-shell:~/Projects/js-polykey]$ sysctl net.ipv4.conf.all.forwarding
net.ipv4.conf.all.forwarding = 1

[nix-shell:~/Projects/js-polykey]$ sysctl net.ipv6.conf.all.forwarding
net.ipv6.conf.all.forwarding = 0

You need them in "router mode", otherwise the tests might not work? I haven't tested it though with those turned off. You can check what happens.

@emmacasolin
Copy link
Contributor Author

@emmacasolin according to my gist: https://gist.github.com/CMCDragonkai/3f3649d7f1be9c7df36f

There's a couple assumptions here.

  1. That net.ipv4.conf.all.forwarding is enabled. But how do we know this to be true for network namespaces and CI/CD? Since we cannot control these settings. Are we sure this all works even without forwarding enabled?

During my original testing packet forwarding was always enabled by default on a new namespace (I just tested this again and it is indeed the case). I haven't checked in the CI/CD yet though.

  1. In my original system I actually had forwarding rules also and the -s limitation:
    # use firewall rules to route packets between the internet interface and veth0 on the host
    # masquerade makes it so that packets going from namespace subnet will be made to seem as 
    # if the packet came from the internet_host
    iptables -t nat -A POSTROUTING -s "${namespace_subnet}${namespace_cidr}" -o $internet_host -j MASQUERADE
    iptables -A FORWARD -i $internet_host -o $veth_host -j ACCEPT
    iptables -A FORWARD -o $internet_host -i $veth_host -j ACCEPT
    

Those FORWARD rules are being added to the host namespace, not a network namespace. This might be because the default target for the FORWARD chain might not be accept on the host, so those rules ensure that packets can be sent between the internet-connected interface and the namespace-connected interface on the host. On a brand new namespace, the default target for all chains is ACCEPT, so these rules wouldn't change anything.

  1. Additionally I had actual rules to drop packets from the outside world...
    # use firewall rules inside the namespace to allow packets going to the gateway and DNS
    # but disallow packets heading to local and private networks
    ip netns exec $namespace iptables -A OUTPUT -d "$veth_host_ip" -j ACCEPT
    ip netns exec $namespace iptables -A OUTPUT -d "$dns_host" -j ACCEPT
    ip netns exec $namespace iptables -A OUTPUT -d 127.0.0.0/8 -j DROP
    ip netns exec $namespace iptables -A OUTPUT -d 10.0.0.0/8 -j DROP
    ip netns exec $namespace iptables -A OUTPUT -d 172.16.0.0/12 -j DROP
    ip netns exec $namespace iptables -A OUTPUT -d 192.168.0.0/16 -j DROP
    

Those rules are disallowing outgoing packets, not incoming packets. But even if they were for incoming packets, since our namespaces aren't connected to the internet (or anything besides the other namespaces) it would be impossible for packets from the "outside world" to ever arrive on a namespace in the first place.

Why are FORWARD rules not necessary anymore? Maybe incorporate these and this may help?

I think you should also make the NAT work for TCP too and drop the UDP requirement.

This would require a separate rule to match TCP packets, since the --to-ports option requires either TCP or UDP packets to be specified (I don't think you can match both TCP and UDP in the same rule).

@emmacasolin
Copy link
Contributor Author

emmacasolin commented Apr 8, 2022

I've found the solution to our problem courtesy of this article: https://blog.cloudflare.com/conntrack-tales-one-thousand-and-one-flows/

Going back to the diagram of how packets travel through the iptables tables and chains:
image

We know that outgoing packets always travel down the route on the right (i.e. raw: PREROUTING -> mangle: PREROUTING -> nat: PREROUTING -> mangle: FORWARD -> filter: FORWARD -> mangle: POSTROUTING -> nat:POSTROUTING). And packets that are desitined for the local network go down the other side.

However, the weird packets that we were seeing in the conntrack entries looked something like this:

$ sudo ip netns exec router2 conntrack -L
udp      17 16 src=192.168.0.1 dst=192.168.0.2 sport=55555 dport=55555 [UNREPLIED] src=192.168.0.2 dst=192.168.0.1 sport=55555 dport=55555 mark=0 use=1
conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.

The destination address of the packet is 192.168.0.2, which is Router 2's address. So this is a packet destined for the local network, which is not something we want.

The solution is to simply drop any packets that go down that "locally-destined" route with something simple like

ip netns exec router1 iptables -A INPUT -j DROP
ip netns exec router2 iptables -A INPUT -j DROP

And that's all we need! Our original MASQUERADE rules now work as expected (no need for -s), and can be tested using the same netcat test setup we were using before. Looking at the conntrack table now shows no sign of the weird packets anymore, and we can send return packets back from Router 2 to Router 1 after Router 1 sends its first packet (which gets dropped due to Router 2's NAT).

Note that with this fix we are no longer sending ICMP destination unreachable packets, however these are not necessarily needed for NAT anyway.

@CMCDragonkai
Copy link
Member

That's a good workaround. However I believe that isn't that the root cause of this problem. The solution could affect situations where routers are also running their own services. This is not an issue for our NAT-testing so we can proceed with this solution. However we should continue to analyse this further (in the background) and maybe someone comes up a proper answer from stackexchange.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 8, 2022

Ok the better diagram is actually this: https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg

image

Here's the theory:

  1. The existence of the weird conntrack entry was preventing packets from agent2 because conntrack makes iptables think the port is occupied. The goal is prevent this entry from being created.
  2. The default policy for each chain in every table is ACCEPT. This is so that by default, there is no firewall whatsoever. It is possible to change the default policy, however we should prefer to use an explicit DROP rule at the very end to reduce confusion.
  3. Packets entering from WAN or LAN interfaces to router2 is going through conntrack (that creates the entry) before reaching routing decision to go to input chains or forward chains.
  4. The FIRST packet send from agent1 would go to the input chains, not the forward chains. This is because there's no logic to forward that packet onto anywhere else. It's just destined to router2 itself.
  5. Upon reaching the input chains, since there is no local process, the packet just stops there, and is discarded. However the default behaviour is to REJECT packets with an ICMP unreachable.
  6. Therefore the solution is to in fact drop (maybe reject?) packets in the input chains them. The right place to to do this is in the filter table which is the default table. By dropping the packets, we prevent the conntrack entry from being created. I'm not sure if rejecting them would also delete the conntrack entry since the default behaviour was to reject, and that ended up creating a conntrack entry.
  7. This was proven by also dropping packets in the mangle input chain as well, and the same behaviour was observed. The result was that no conntrack entry was created.
  8. However dropping all packets in the input chain is too broad. In production routers, they would be running local processes to run services like DNS, DHCP, FTP, NTP... etc. Therefore we need to accept selectively.
  9. We should accept all packets coming in from the internal LAN interface with ip netns exec routerX iptables -A INPUT -i rX-int -j ACCEPT.
  10. We should also accept all packets that are already established or related according to conntrack. This allows the router itself to contact external services and get responses. This is done with ip netns exec routerX iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT.
  11. Finally drop all other packets. Note without ICMP unreachable, nc on agent1 won't terminate. However this is the same behaviour for production routers that don't bother rejecting packets.
  12. Our MASQUERADE rule should also be tweaked with -s 10.0.0.0/24 simply to ensure that we only masquerade packets sourced from the LAN. There's no need to masquerade packets sourced from the router itself.

Remember that iptable rules are in-order. So the -A appends, and you must ensure the correct order of execution.

The end result is changing the iptables rules to:

# accept from internal interface for both routers
ip netns exec router1 iptables -A INPUT -i r1-int -j ACCEPT
ip netns exec router2 iptables -A INPUT -i r2-int -j ACCEPT
# accept from where packets are "established" or related... (including external interface)
ip netns exec router1 iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
ip netns exec router2 iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# drop all other packets
ip netns exec router1 iptables -A INPUT -j DROP
ip netns exec router2 iptables -A INPUT -j DROP

# SNAT packets that are coming in from LAN and going out to the external interface
ip netns exec router1 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -p udp -o r1-ext -j MASQUERADE --to-ports 55555
ip netns exec router2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -p udp -o r2-ext -j MASQUERADE --to-ports 55555

Now you can send punch-through between agent1 to agent2. As well as sending packets from the agents to the routers and vice-versa. Routers can also punch through to other routers.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 8, 2022

When you start testing with a third-party seed agent. You would remove the --to-ports option entirely (since the third party agent would be signalling the mapped ports), and you also don't need the -p option either, since SNAT would work on all packets.

The expectation is that since MASQUERADE will behave like SNAT with respect to port mapping. Then the same mapped port will continue to be used. Applying --random should then change it to symmetric NAT, and that we don't want.

@emmacasolin
Copy link
Contributor Author

NAT tests using a manual seed node are now passing for all types of NAT both locally and in the CI/CD. This is now blocked until #326 is merged, which will bring in node graph adding policies (needed to write tests involving proper symmetric NAT), relaying (needed to test symmetric NAT traversal), and the ability to write tests using third-party seed agents.

Comment on lines +15 to +40
const agent1Host = 'agent1';
const agent2Host = 'agent2';
const agent1RouterHostInt = 'router1-int';
const agent1RouterHostExt = 'router1-ext';
const agent2RouterHostInt = 'router2-int';
const agent2RouterHostExt = 'router2-ext';
const router1SeedHost = 'router1-seed';
const router2SeedHost = 'router2-seed';
const seedRouter1Host = 'seed-router1';
const seedRouter2Host = 'seed-router2';
// Subnets
const agent1HostIp = '10.0.0.2';
const agent2HostIp = '10.0.0.2';
const agent1RouterHostIntIp = '10.0.0.1';
const agent2RouterHostIntIp = '10.0.0.1';
const agent1RouterHostExtIp = '192.168.0.1';
const agent2RouterHostExtIp = '192.168.0.2';
const router1SeedHostIp = '192.168.0.1';
const seedHostIp = '192.168.0.3';
const router2SeedHostIp = '192.168.0.2';
// Subnet mask
const subnetMask = '/24';
// Ports
const agent1Port = '55551';
const agent2Port = '55552';
const mappedPort = '55555';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be notated as constants. We usually our constants like this:

const ABC_DEF = 'blah';

Also use docblock comments for these. You can use something like this:

/**
 * ...
 */

@@ -59,7 +59,7 @@ test $test_dir:
interruptible: true
script:
- >
nix-shell -I nixpkgs=./pkgs.nix --packages nodejs --run '
nix-shell -I nixpkgs=./pkgs.nix --packages nodejs iproute2 utillinux nftables iptables --run '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using iptables-legacy, this can just be changed to iptables-legacy.

@CMCDragonkai
Copy link
Member

Once this gets rebased on top of the merged #326, this PR can also include additional tests to the testnet.polykey.io as well. The test for testnet.polykey.io should be separated out, and if testnet.polykey.io is not contactable, then this should not be a test failure, but rather a "test debugging state". Just check if jest has a "third state" beyond success or failure. Maybe a conditional test can be done.

@emmacasolin mentioned that there may be a missing test that uses its own generated seed node. That should be checked.

The nftable related code went into the wiki. Our tests here will continue to use iptables-legacy. This works fine in the case of our NixOS environments, and also in CI/CD. We can switch over in the future when both CI/CD and our NixOS environments starts to use nftables too.

@tegefaulkes tegefaulkes mentioned this pull request Jun 1, 2022
29 tasks
@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 3, 2022

The new CI/CD changes from demo-libs should be merged in first before this gets merged. This way we can observe the CI/CD doing integration testing and also we can in the deployment jobs. The testnet deployment can then help task 5 and 6 to be completed.

@CMCDragonkai
Copy link
Member

At this point we should also consider bringing in the benchmarking system we have, so we can bench our network too.

@CMCDragonkai
Copy link
Member

Closed in favour of #381.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants