Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] fix iptables rules to make them idempotent #1214

Conversation

MorningLightMountain713
Copy link
Contributor

@MorningLightMountain713 MorningLightMountain713 commented Feb 10, 2024

Not sure what your workflow is, usually I would branch off master for a bugfix and merge back into master (and rebase or merge on development) but have branched off develop here as I believe you will merge this into next release.

Fixes #1213

Regarding the existing extra rules, I think they're okay to leave, they will get removed by attrition over time as nodes reboot etc (iptables rules get flushed on reboot). It's non trival to remove them manually as there are rules inbetween. It only adds a new rule if the FluxOS services is restarted - I only notice as I'm doing testing on one of my nodes for some other work, and have restarted many times.

I've update this function to include a check for each iptables command using the -C option

       -C, --check chain rule-specification
              Check whether a rule matching the specification does exist in the selected chain. This command uses the same  logic  as  -D  to
              find  a  matching  entry,  but does not alter the existing iptables configuration and uses its exit code to indicate success or
              failure.

There was also a problem where if someone deletes one rule - it was impossible to know where to insert new rules, as you can't have the DROP rule first.

Have implemented a FLUX chain and jump to this immediately from the FORWARD chain. This way, can INSERT the ACCEPT rules, and APPEND the DROP rule. Of note, once the end of the FLUX chain is reached, there is an implicit RETURN back to the FORWARD chain, and rule evaulation continues if no match found.

I've only done quick testing on this (I'm pretty sure it's good) by running the function on one of my nodes. I have to pop out for a couple of hours and will deploy it properly on a node when I get back.

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 10, 2024

This needs a total rework - I would suggest disabling the following (in serviceManager) until a proper fix is applied:

      fluxNetworkHelper.removeDockerContainerAccessToHost();

This is only blocking access for containers that run on the default docker network - which Flux containers don't, so all it is doing currently is adding iptables rules every time FluxOS restarts.

Ideally, it needs to be done per docker network so each bridge would have it's own filter rules. This can be done, it just means every time a network is craeted / removed, iptables needs to be updated.

However, there is actually a much easier way to do this, if you remove the -i docker0 interface from the commands, and just go with the assumption that all Flux networks are subnets of 172.23.0.0/16 - you don't need to specify the interface, just the source. It just means the host can't use 172.23.0.0/16 - but that assumption has already been made as these networks are hardcoded.

I'll go with the later now

@MorningLightMountain713
Copy link
Contributor Author

Okay, I've fixed it.

It now works like it should; As long as Flux continues to use 172.23.0.0/16 for app networks. Also if someone deletes the DROP rule - it will get appended, instead of inserted (which would kill DNS, if the host is using local router for DNS)

Of note, it looks like FluxOS is starting before docker, as FluxOS inserts the docker chain rules.

Flux logs on reboot:

2024-02-10T16:38:38.659Z          DOCKER-USER iptables chain created
2024-02-10T16:38:38.673Z          New FORWARDED inserted to jump to DOCKER-USER chain in iptables
2024-02-10T16:38:38.699Z          Access to host from containers removed
2024-02-10T16:38:38.727Z          Access to containers from host accepted
2024-02-10T16:38:39.009Z          Access to host DNS from containers accepted
2024-02-10T16:38:39.616Z          Firewall adjusted for UPNP

Flux logs on FluxOS restart:

2024-02-10T16:35:30.854Z          DOCKER-USER iptables chain already created
2024-02-10T16:35:30.861Z          jump to DOCKER-USER chain already enabled in iptables
2024-02-10T16:35:30.872Z          Access to host from containers already removed
2024-02-10T16:35:30.883Z          Access to containers from host already accepted
2024-02-10T16:35:30.895Z          Access to host DNS from containers already accepted

iptables:

-A INPUT -j ufw-track-input
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
<snip>
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.32.0/24 -p udp -m udp --dport 53 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.32.0/24 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.32.0/24 -j DROP
-A DOCKER-USER -j RETURN

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 10, 2024

Here is the container being blocked from my router (and home network), and can still ping google.

/app # ping -W 1 -c 3 google.com
PING google.com (142.250.200.46): 56 data bytes
64 bytes from 142.250.200.46: seq=0 ttl=114 time=14.068 ms
64 bytes from 142.250.200.46: seq=1 ttl=114 time=8.605 ms
64 bytes from 142.250.200.46: seq=2 ttl=114 time=12.328 ms

--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 8.605/11.667/14.068 ms

/app # ping -W 1 -c 3 172.16.32.1
PING 172.16.32.1 (172.16.32.1): 56 data bytes

--- 172.16.32.1 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

route to router on host:

davew@chud:~$ ip route
default via 172.16.32.1 dev ens18 proto static
172.16.32.0/24 dev ens18 proto kernel scope link src 172.16.32.12
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.23.191.0/24 dev br-72d1725e481c proto kernel scope link src 172.23.191.1
172.23.210.0/24 dev br-048fde111132 proto kernel scope link src 172.23.210.1

ping from host to container:

davew@chud:~$ ping 172.23.191.2
PING 172.23.191.2 (172.23.191.2) 56(84) bytes of data.
64 bytes from 172.23.191.2: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 172.23.191.2: icmp_seq=2 ttl=64 time=0.042 ms
^C
--- 172.23.191.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1020ms
rtt min/avg/max/mdev = 0.042/0.086/0.131/0.044 ms
davew@chud:~$

ip address of container:

/app # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:17:bf:02 brd ff:ff:ff:ff:ff:ff
    inet 172.23.191.2/24 brd 172.23.191.255 scope global eth0
       valid_lft forever preferred_lft forever
/app #

@MorningLightMountain713
Copy link
Contributor Author

I'll do a full app test tomorrow - unless someone else wants to verify

@MorningLightMountain713
Copy link
Contributor Author

I got rid of all the shell wizardry; in favor of using the ip -json route command which gives up its secrets as a json string

}

const fluxSrc = '172.23.0.0/16';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each app create own network using:
const fluxNetworkOptions = {
Name: fluxDockerNetwork_${appname},
IPAM: {
Config: [{
Subnet: 172.23.${number}.0/24,
Gateway: 172.23.${number}.1,
}],
},
};

that why i dont understood this /16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each app create own network using: const fluxNetworkOptions = { Name: fluxDockerNetwork_${appname}, IPAM: { Config: [{ Subnet: 172.23.${number}.0/24, Gateway: 172.23.${number}.1, }], }, };

that why i dont understood this /16

This is network subnetting. Flux docker networks allocate out of the 172.23.0.0/16 range. I.e., like you say, a Flux app docker network will have any /24 that is 172.23.x.0. These are all encapsulated by the /16.

172.23.0.0/16 is a "supernet" of the 172.23.x.0/24 networks.

What this means, is that any Flux network will be included by the 17.23.0.0/16 subnet. Which is why we can use it for the source address in the iptables rules. It would be better to do this on a per docker interface (br interface) but that is considerably more work.

It's just saying, "I want to match any address that falls in this range"

@MorningLightMountain713
Copy link
Contributor Author

Ps - I'll do up some tests for this after lunch (I'm on GMT)

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 11, 2024

Thinking more about this - we can do this a lot easier. It's not really the host network we're trying to block here - it's container access to non routeable address space (except flux container network, dns and established connections)

I.e. we should be blocking all RFC1918 private address space. There is no legitimate reason that a container should connect to this address space.

RFC1918 address space is as follows:

  • 10.0.0.0/8
  • 172.16.0.0/12
  • 192.168.0.0/16

Blocking the host network doesn't remedy the problem 100% - the node provider may have other private networks that they are routing - which would still be accessible, given, it would be hard for a container to know about them, but a determined operator could find them. Also, there is no guarantee that the host only has one default gateway, which makes things unpredicatable.

I propose the following:

  • Allow containers to 172.23.0.0/16 (Flux docker network)
  • Allow containers to 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 for DNS
  • Deny all other traffic destined to 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16.

Doing this, means we can elimiate the requirement to find the default gateway and the shenanigans involved with determining the host net.

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 11, 2024

I've updated it - no longer reliant on the ip route command. This is more robust and will protect all private operator networks.

davew@chud:~/zelflux$ node test_newcmd.js
IPTABLES: DOCKER-USER chain created
IPTABLES: New rule in FORWARD inserted to jump to DOCKER-USER chain
IPTABLES: Access to Flux containers from 10.0.0.0/8 accepted
IPTABLES: DNS access to 10.0.0.0/8 from Flux containers accepted
IPTABLES: Access to 10.0.0.0/8 from Flux containers removed
IPTABLES: Access to Flux containers from 172.16.0.0/12 accepted
IPTABLES: DNS access to 172.16.0.0/12 from Flux containers accepted
IPTABLES: Access to 172.16.0.0/12 from Flux containers removed
IPTABLES: Access to Flux containers from 192.168.0.0/16 accepted
IPTABLES: DNS access to 192.168.0.0/16 from Flux containers accepted
IPTABLES: Access to 192.168.0.0/16 from Flux containers removed

Second run:

davew@chud:~/zelflux$ node test_newcmd.js
IPTABLES: DOCKER-USER chain already created
IPTABLES: Jump to DOCKER-USER chain already enabled
IPTABLES: Access to Flux containers from 10.0.0.0/8 already accepted
IPTABLES: DNS access to 10.0.0.0/8 from Flux containers already accepted
IPTABLES: Access to 10.0.0.0/8 from Flux containers already removed
IPTABLES: Access to Flux containers from 172.16.0.0/12 already accepted
IPTABLES: DNS access to 172.16.0.0/12 from Flux containers already accepted
IPTABLES: Access to 172.16.0.0/12 from Flux containers already removed
IPTABLES: Access to Flux containers from 192.168.0.0/16 already accepted
IPTABLES: DNS access to 192.168.0.0/16 from Flux containers already accepted
IPTABLES: Access to 192.168.0.0/16 from Flux containers already removed

I'll do up some tests now, then throw it on a node for e2e.

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 11, 2024

I think it also needs to be made clear for node operators in whatever documentation they use, that the 172.23.0.0/16 network is reserved for Flux use only (and has been since before this pull)

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 11, 2024

I need to check access to the 172.23.0.0/16 network - in theory I don't think it's needed as you're using layer 2 - you don't need to cross the bridge (and iptables) to route in your own network. Will test a bit later

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 11, 2024

I need to check access to the 172.23.0.0/16 network - in theory I don't think it's needed as you're using layer 2 - you don't need to cross the bridge (and iptables) to route in your own network. Will test a bit later

I was wrong here. Iptables was dropping this traffic. Have added an allow rule for this. Also, have to remove docker's explicit -A DOCKER-USER -j RETURN as this gets re added to the end every time a new network is created, which means that some drop rules can end up after it - there is an implicit return anyway so not necessary.

This solution isn't quite 100% - it still allows traffic between Flux applications (eg. app 1 <-> app2) which should be denied.
The only solution to this is to use the bridge interfaces in the source / return -j ACCEPT, or use the network addresses. In order to do this, they need to be applied when the networks are created / removed.

So if that's the case - would need to rework this. Which I can do... and probably makes sense in the long run to do it properly.

However, think this is good enough for the meantime.

Due to these changes, 3 of the tests are now failing. I will fix these up tomorrow morning.

@MorningLightMountain713
Copy link
Contributor Author

After reflection, I'm not happy with how this works. Going to do it properly today, will have in done this evening. (Before tomorrow relase)

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 12, 2024

Total refactor. Works 100% now.

  • Largely simplified how it works and removed a large block of the shell invocations.
  • Don't need to check for rules anymore as we flush all rules every time (this is safe as default is to RETURN)
  • Blocks containers from any private address space. (except DNS and established)
  • Allows containers free access within their bridge. (I.e. an app with multiple containers can all talk to each other)
  • Blocks any inter app traffic using local address space. (They can still talk on public addresses)
  • Runs on FluxOS boot, and any time a new app / docker network is created.

Here is an example of the ruleset that gets created (this has one flux app network bridge on it):

-A DOCKER-USER -s 172.23.0.0/16 -d 192.168.0.0/16 -p udp -m udp --dport 53 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 192.168.0.0/16 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.0.0/12 -p udp -m udp --dport 53 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.0.0/12 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 10.0.0.0/8 -p udp -m udp --dport 53 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 10.0.0.0/8 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-USER -i docker0 -o docker0 -j ACCEPT
-A DOCKER-USER -i br-a9ebf17b6e8d -o br-a9ebf17b6e8d -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 10.0.0.0/8 -j DROP
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.0.0/12 -j DROP
-A DOCKER-USER -s 172.23.0.0/16 -d 192.168.0.0/16 -j DROP
-A DOCKER-USER -j RETURN

I haven't done e2e yet - and all the tests will be broken.

Have to head out for a few hours - will fix up after that.

@MorningLightMountain713
Copy link
Contributor Author

MorningLightMountain713 commented Feb 12, 2024

Live testing on node:

Ruleset added:

-A DOCKER-USER -s 172.23.0.0/16 -d 192.168.0.0/16 -p udp -m udp --dport 53 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 192.168.0.0/16 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.0.0/12 -p udp -m udp --dport 53 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.0.0/12 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 10.0.0.0/8 -p udp -m udp --dport 53 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 10.0.0.0/8 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-USER -i docker0 -o docker0 -j ACCEPT
-A DOCKER-USER -i br-098bac43a7f1 -o br-098bac43a7f1 -j ACCEPT
-A DOCKER-USER -i br-aaf87aa57b20 -o br-aaf87aa57b20 -j ACCEPT
-A DOCKER-USER -s 172.23.0.0/16 -d 10.0.0.0/8 -j DROP
-A DOCKER-USER -s 172.23.0.0/16 -d 172.16.0.0/12 -j DROP
-A DOCKER-USER -s 172.23.0.0/16 -d 192.168.0.0/16 -j DROP
-A DOCKER-USER -j RETURN

Active docker networks: (these match the rules above)

davew@chud:~$ docker network ls
NETWORK ID     NAME                       DRIVER    SCOPE
9d00ff270b68   bridge                     bridge    local
098bac43a7f1   fluxDockerNetwork_beaver   bridge    local
aaf87aa57b20   fluxDockerNetwork_chud     bridge    local
9728c8571954   host                       host      local
c50d38a94e0e   none                       null      local

Here is the access being removed on a new app deploy:

Screenshot 2024-02-12 at 2 52 17 PM

I've tested intra-network communications - all work fine.
I've tested inter-network communications - all blocked when using private addresses (as they should be) and work using public addresses.

Have tested a live app from the internet back in - works fine.
Have tested host communications to apps - works fine.

I'm pretty much done here I think.

@MorningLightMountain713
Copy link
Contributor Author

Yup - finished here. Happy to run someone through it... but ready for UA.

Cheers.

@MorningLightMountain713
Copy link
Contributor Author

> NODE_CONFIG_DIR=./ZelBack/config npx mocha -g "remove flux container access" tests/unit/fluxNetworkHelper.test.js


  fluxNetworkHelper tests
    remove flux container access to private address space tests
      ✔ should add the DOCKER-USER chain to iptables if it is missing
      ✔ should skip addding the DOCKER-USER chain to iptables if it already exists
      ✔ should bail out if there is an error addding the DOCKER-USER chain to iptables
      ✔ should add the jump to DOCKER-USER chain from FORWARD chain to iptables if it is missing
      ✔ should skip adding the jump to DOCKER-USER chain from FORWARD chain to iptables if it already exists
      ✔ should bail out if there is an error addding the DOCKER-USER chain to iptables
      ✔ should flush the DOCKER-USER chain
      ✔ should bail if there is an error flushing the DOCKER-USER chain
      ✔ should add two allow and one drop rule for each private network
      ✔ should add an allow for intra-network traffic per docker network
      ✔ should bail out as soon as a rule errors out


  11 passing (50ms)

Copy link
Member

@Cabecinha84 Cabecinha84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I will start testing it after tuesday FluxOs release.

if (dockerUserChainExists) log.info('IPTABLES: DOCKER-USER chain already created');

const checkJumpToDockerChain = await cmdAsync(checkJumpChain).catch(async (checkErr) => {
if (checkErr.message.includes('Bad rule')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error returned always in english? We have many Fluxnodes installed with different OS languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I'll check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the source for iptables 1.8.7 (installed on 22.04) the error responses are hardcoded english:

	    { nft_chain_user_add, EEXIST, "Chain already exists" },
	    { nft_chain_user_rename, EEXIST, "File exists" },
	    { nft_rule_insert, E2BIG, "Index of insertion too big" },
	    { nft_rule_check, ENOENT, "Bad rule (does a matching rule exist in that chain?)" },
	    { nft_rule_replace, E2BIG, "Index of replacement too big" },
	    { nft_rule_delete_num, E2BIG, "Index of deletion too big" },
/*	    { TC_READ_COUNTER, E2BIG, "Index of counter too big" },
	    { TC_ZERO_COUNTER, E2BIG, "Index of counter too big" }, */
	    /* ENOENT for DELETE probably means no matching rule */
	    { nft_rule_delete, ENOENT,
	      "Bad rule (does a matching rule exist in that chain?)" },
	    { nft_chain_set, ENOENT, "Bad built-in chain name" },

However, just checking on a 20.04 box, it's using iptables 1.8.4 (legacy) which has a different response.

I will fix this up so that it just looks for an error and doesn't read the text. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ZelBack/src/services/fluxNetworkHelper.js Show resolved Hide resolved
return false;
}

const giveContainerAccessToDNS = baseAllowDnsCmd.replace('#DST', network);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be '/#DST/g'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but there is only one #DST so I just went with the more simple non regex replace. The one ones with the regex, replace twice.

}

// This always gets appended, so the drop is at the end
const dropAccessToHostNetwork = baseDropCmd.replace('#DST', network);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be '/#DST/g'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but there is only one #DST so I just went with the more simple non regex replace. The one ones with the regex, replace twice.

David White added 14 commits February 20, 2024 15:57
This commit now blocks 100% of access to private address space, while
maintaining isolation for each Flux docker network. Now apps can be
sure no other app is snooping their traffic, and operators can be sure
that apps do not have access to ANY private network they are routing.

Tests will all be broken - I'll fix up in next commit.
Older iptables (legacy) on ubuntu 20.04 operates slightly
differently than the nf_tables version, the check output command
doesn't return anything.

Some of the output strings are different, so we don't check those
anymore.

Have also added a check to make sure the iptables binary is in the
root users path.
@Cabecinha84 Cabecinha84 merged commit 28fe33e into RunOnFlux:development Feb 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants