IPTables Tables basic introduction

man iptables-extensions

Iptables elements

Tables->Chains->Rules

Tables

iptables -L -v -n will list the filter table indefault, in you want to list nat table using iptables -L -t nat -v -n IPTables has the following 4 built-in tables. filter, nat, mangle, raw or security

1. Filter Table

Filter is default table for iptables. So, if you don’t define you own table, you’ll be using filter table. Iptables’s filter table has the following built-in chains. INPUT chain – Incoming to firewall. For packets coming to the local server. OUTPUT chain – Outgoing from firewall. For packets generated locally and going out of the local server. FORWARD chain – Packet for another NIC on the local server. For packets routed through the local server.

2. NAT table

Iptable’s NAT table has the following built-in chains.

PREROUTING chain – Alters packets before routing. i.e Packet translation happens immediately after the packet comes to the system (and before routing). This helps to translate the destination ip address of the packets to something that matches the routing on the local server. This is used for DNAT (destination NAT). POSTROUTING chain – Alters packets after routing. i.e Packet translation happens when the packets are leaving the system. This helps to translate the source ip address of the packets to something that might match the routing on the desintation server. This is used for SNAT (source NAT). OUTPUT chain – NAT for locally generated packets on the firewall.

3. Mangle table

Iptables’s Mangle table is for specialized packet alteration. This alters QOS bits in the TCP header. Mangle table has the following built-in chains. PREROUTING chain OUTPUT chain FORWARD chain INPUT chain POSTROUTING chain

4. Raw table

Iptable’s Raw table is for configuration excemptions. Raw table has the following built-in chains.

PREROUTING chain OUTPUT chain

IPTABLES CHAINS

one table contain multiple chains. you can create/delete your own chians iptables -t nat -N REDSOCKS //create a new chain “REDSOCKS” in table nat iptables -t nat -F REDSOCKS //flush all the rules in chain “REDSOCKS” in table nat iptables -t nat -X REDSOCKS //delete chain “REDSOCKS” in table nat

IPTABLES RULES

Following are the key points to remember for the iptables rules. Rules contain a criteria and a target. If the criteria is matched, it goes to the rules specified in the target (or) executes the special values mentioned in the target. If the criteria is not matched, it moves on to the next rule.// the order of rules is important here

adding rules to a chain in a table

iptables -t nat -A REDSOCKS -p tcp -d <dstipaddr> -s <srcipaddr> –dstport <dstp> –srcport <srcp>

deleting rules to a chain in a table

iptables -t nat -D REDSOCKS -p tcp -d <dstipaddr> -s <srcipaddr> –dstport <dstp> –srcport <srcp>

iptables -t nat -D REDSOCKS <rule num in the list>

listing rules in a table

The following iptable example shows that there are some rules defined in the input, forward, and output chain of the filter table.

Chain INPUT (policy ACCEPT) num target prot opt source destination 1 RH-Firewall-1-INPUT all – 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT) num target prot opt source destination 1 RH-Firewall-1-INPUT all – 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT) num target prot opt source destination

Chain RH-Firewall-1-INPUT (2 references) num target prot opt source destination 1 ACCEPT all – 0.0.0.0/0 0.0.0.0/0 2 ACCEPT icmp – 0.0.0.0/0 0.0.0.0/0 icmp type 255 5 ACCEPT udp – 0.0.0.0/0 224.0.0.251 udp dpt:5353 6 ACCEPT udp – 0.0.0.0/0 0.0.0.0/0 udp dpt:631 7 ACCEPT tcp – 0.0.0.0/0 0.0.0.0/0 tcp dpt:631 8 ACCEPT all – 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 9 ACCEPT tcp – 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 10 REJECT all – 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

The rules in the iptables –list command output contains the following fields:

num – Rule number within the particular chain target – Special target variable that we discussed abelow prot – Protocols. tcp, udp, icmp, etc., opt – Special options for that specific rule. source – Source ip-address of the packet destination – Destination ip-address for the packet

Target Values

Following are the possible special values that you can specify in the target. ACCEPT – Firewall will accept the packet. DROP – Firewall will drop the packet. QUEUE – Firewall will pass the packet to the userspace. RETURN – Firewall will stop executing the next set of rules in the current chain for this packet. The control will be returned to the calling chain.

REDIRECT - redirect all the packets to a specified port, just forward to the port this rule shoulb be in PREROUTING rule for forwarding packets /bin/iptables -t nat -A PREROUTING -p tcp -m set –match-set gfwlist dst -j REDIRECT –to-ports 1090 this rule shoulb be in OUTPUT rule for host generated packets /bin/iptables -t nat -A OUTPUT -p tcp -m set –match-set gfwlist dst -j REDIRECT –to-ports 1090

#### these three rule only need be set onece, the reverse direction traffic will be back to origin dst/src automatically by iptables, so only set once MASQUERADE - is source nating the packet, but without designating the destination iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j MASQUERADE SNAT — change source adress to destinatio ip parameter iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j SNAT –to-destination=<ip> iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j SNAT –to-destination=<ip>:<port> with -p option (eg. -p tcp) DNAT — change destination address as <ip>, when DNAT, kernel will mark this packet, the reply for this packet will be “SNAT”, restore original adress automatically iptables -t nat -A PREROUTING -d 192.168.1.0/24 -j DNAT –to-destination=<ip>

CONNMARK –set-mark 1 - set the mark NFLOG –nflog-group 30 - mark the virtual interface nflog:30 LOG –log-level –log -prefic - the matched packets will be logged in the file /var/log/syslog. eg. sudo iptables -t nat -I DOCKER -m limit –limit 2/min -j LOG –log-level 4 –log-prefix ‘DOCKER CHAIN ’

eg. iptables -t nat -A REDSOCKS -p tcp -j REDIRECT –to-ports 31338 //redirect all tcp packets to port 31338 <ChainName> - the target could be a chain name also, all the packets will run through the rules in that chain eg. iptables -t nat -A OUTPUT -p tcp -m owner –uid-owner linuxaria -j REDSOCKS //tcp packet owner is linuxaria will be chekced by rules in Chain “REDSOCKS”

rules in order

If the criteria is matched, it goes to the rules specified in the target (or) executes the special values mentioned in the target. If the criteria is not matched, it moves on to the next rule.// the order of rules is important here

$IPTABLES -t nat -A REDSOCKS_FILTER -m iprange –dst-range 192.168.0.10-192.168.0.30 -j REDSOCKS $IPTABLES -t nat -A REDSOCKS_FILTER -d 126.0.0.0/8 -j REDSOCKS $IPTABLES -t nat -A REDSOCKS_FILTER -j RETURN //this run first rule, then second rule(if criteria meet, go to rules(ordered) in REDSOCKS, then last rule will be ignored //if creiteria not meeting for the first two rules, then run last rule, will return, do nothing specail for whitelist

## Do not redirect LAN traffic and some other reserved addresses. (blacklist option) #$IPTABLES -t nat -A REDSOCKS_FILTER -d 240.0.0.0/4 -j RETURN #$IPTABLES -t nat -A REDSOCKS_FILTER -j REDSOCKS

### Above whitelist and blacklist cannot operate together.

A Deep Dive into iptables and netfilters

What Are IPTables and Netfilter?

The basic firewall software most commonly used in Linux is called iptables. The iptables firewall works by interacting with the packet filtering hooks in the Linux kernel’s networking stack. These kernel hooks are known as the netfilter framework.

Every packet that enters networking system (incoming or outgoing) will trigger these hooks as it progresses through the stack, allowing programs that register with these hooks to interact with the traffic at key points. The kernel modules associated with iptables register at these hooks in order to ensure that the traffic conforms to the conditions laid out by the firewall rules.

Netfileter Hooks

common filter Hooks

There are five netfilter hooks that programs can register with. As packets progress through the stack, they will trigger the kernel modules that have registered with these hooks. The hooks that a packet will trigger depends on whether the packet is incoming or outgoing, the packet’s destination, and whether the packet was dropped or rejected at a previous point.

The following hooks represent various well-defined points in the networking stack:

(PREROUTING) NF_IP_PRE_ROUTING: This hook will be triggered by any incoming traffic very soon after entering the network stack. This hook is processed before any routing decisions have been made regarding where to send the packet. (INPUT) NF_IP_LOCAL_IN: This hook is triggered after an incoming packet has been routed if the packet is destined for the local system. (FOWARD) NF_IP_FORWARD: This hook is triggered after an incoming packet has been routed if the packet is to be forwarded to another host. (OUTPUT) NF_IP_LOCAL_OUT: This hook is triggered by any locally created outbound traffic as soon it hits the network stack. (POSTROUTING) NF_IP_POST_ROUTING: This hook is triggered by any outgoing or forwarded traffic after routing has taken place and just before being put out on the wire.

Which Tables are Available?

Let’s step back for a moment and take a look at the different tables that iptables provides. These represent distinct sets of rules, organized by area of concern, for evaluating packets.

filter table

is one of the most widely used tables in iptables. The filter table is used to make decisions about whether to let a packet continue to its intended destination or to deny its request. In firewall parlance, this is known as “filtering” packets. This table provides the bulk of functionality that people think of when discussing firewalls. The NAT Table

nat table

is used to implement network address translation rules. As packets enter the network stack, rules in this table will determine whether and how to modify the packet’s source or destination addresses in order to impact the way that the packet and any response traffic are routed. This is often used to route packets to networks when direct access is not possible.

mangle table

is used to alter the IP headers of the packet in various ways. For instance, you can adjust the TTL (Time to Live) value of a packet, either lengthening or shortening the number of valid network hops the packet can sustain. Other IP headers can be altered in similar ways. This table can also place an internal kernel “mark” on the packet for further processing in other tables and by other networking tools. This mark does not touch the actual packet, but adds the mark to the kernel’s representation of the packet. iptables firewall is stateful, meaning that packets are evaluated in regards to their relation to previous packets. The connection tracking features built on top of the netfilter framework allow iptables to view packets as part of an ongoing connection or session instead of as a stream of discrete, unrelated packets. The connection tracking logic is usually applied very soon after the packet hits the network interface.

raw table

The raw table has a very narrowly defined function. Its only purpose is to provide a mechanism for marking packets in order to opt-out of connection tracking.

Security Table

The security table is used to set internal SELinux security context marks on packets, which will affect how SELinux or other systems that can interpret SELinux security contexts handle the packets. These marks can be applied on a per-packet or per-connection basis.

table/chains/rules orders applied to packet

in fact, iptables is a large filter net to filter out the packets. a packet wil tranverse from raw,mangle,DNAT, filter, security ,SNAT table, and in each table, chains of PREROUTING INPUT FORWARD OUTPUT POSTROUTING will be tranversed, in each chains, every rules in those chains will be tranversed, unless it met the target ACCEPT/DROP. ACCEPT is a target will make the tranverse within the chains stop, but it will be filtered out also by other chians/tables in above order after that. DROP is a target will make all the tranvesre stop imediately, including the chains/tables after that. the packet will be drop on the floor without response. REJECT is similar to DROP, but it will repsonse some message to indicate that this packet has been rejected. RETURN means stop traversing this chain and resume at the next rule in the previous (calling) chain.

overal table order from top to bottom

Tables/Chains | PREROUTING INPUT FORWARD OUTPUT POSTROUTING

raw | ✓ ✓ mangle | ✓ ✓ ✓ ✓ ✓ nat (DNAT) | ✓ ✓ filter | ✓ ✓ ✓ security | ✓ ✓ ✓ nat (SNAT) | ✓ ✓ =========================================================================== The hooks (columns) that a packet will trigger depend on whether it is an incoming or outgoing packet, the routing decisions that are made, and whether the packet passes filtering criteria. __________________________

stack protocol

/|\ |

mangle raw filter mangle

nat(Dst)
filter

\	/

INPUT OUTPUT /|\ |



route choice out=
mangle
filter \	/

ingress interface----> PREROUTING—route–> FORWARD ----->POSTROUTING------>egress interface choice out= raw(conntrack) mangle mangle nat(Src) nat(Dst) Conntrack

chains tranverse order

Incoming packets destined for the local system: PREROUTING -> INPUT Incoming packets destined to another host: PREROUTING -> FORWARD -> POSTROUTING Locally generated packets: OUTPUT -> POSTROUTING

**** Jul 6 10:48:36 wen-Default-string kernel: [ 3722.041065] TRACE: raw:OUTPUT:policy:2 IN= OUT=enp1s0 SRC=192.168.31.75 DST=192.168.31.88 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=60361 DF PROTO=ICMP TYPE=8 CODE=0 ID=2565 SEQ=1 UID=0 GID=0 Jul 6 10:48:36 wen-Default-string kernel: [ 3722.042291] TRACE: raw:PREROUTING:policy:2 IN=enp1s0 OUT= MAC=00:e0:4c:68:27:62:ce:81:da:91:eb:88:08:00 SRC=192.168.31.88 DST=192.168.31.75 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=1476 PROTO=ICMP TYPE=0 CODE=0 ID=2565 SEQ=1

an packet generated in the local host traverse route:

packet seq=223235714 OUTPUT -> POSTROUTING (raw->nat->filter) (nat) out=enp1s0 use nat kernel: [49988.219931] TRACE: raw:OUTPUT:policy:2 IN= OUT=enp1s0 SRC=192.168.31.75 DST=216.58.200.36 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=40719 DF PROTO=TCP SPT=56284 DPT=80 SEQ=2232357149 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A119B3B6D0000000001030307) UID=0 GID=0 kernel: [49988.219962] TRACE: nat:OUTPUT:rule:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=216.58.200.36 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=40719 DF PROTO=TCP SPT=56284 DPT=80 SEQ=2232357149 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A119B3B6D0000000001030307) UID=0 GID=0 ####above rule will DNAT the addr to 127.0.0.1:1085 kernel: [49988.219988] TRACE: filter:OUTPUT:policy:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=40719 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357149 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A119B3B6D0000000001030307) UID=0 GID=0

kernel: [49988.220001] TRACE: nat:POSTROUTING:policy:1 IN= OUT=lo SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=40719 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357149 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A119B3B6D0000000001030307) UID=0 GID=0 ####above rule postrouting will check the route table to get the correct out interface

PREROUTING -> INPUT kernel: [49988.220037] TRACE: raw:PREROUTING:policy:2 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=40719 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357149 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A119B3B6D0000000001030307) kernel: [49988.220053] TRACE: filter:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=40719 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357149 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A119B3B6D0000000001030307)

kernel: [49988.220124] TRACE: raw:OUTPUT:policy:2 IN= OUT=enp1s0 SRC=192.168.31.75 DST=216.58.200.36 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=40720 DF PROTO=TCP SPT=56284 DPT=80 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK URGP=0 OPT (0101080A119B3B6E78DE73EE) UID=0 GID=0

kernel: [49988.220124] TRACE: raw:OUTPUT:policy:2 IN= OUT=enp1s0 SRC=192.168.31.75 DST=216.58.200.36 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=40720 DF PROTO=TCP SPT=56284 DPT=80 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK URGP=0 OPT (0101080A119B3B6E78DE73EE) UID=0 GID=0 kernel: [49988.220141] TRACE: filter:OUTPUT:policy:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=40720 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK URGP=0 OPT (0101080A119B3B6E78DE73EE) UID=0 GID=0 kernel: [49988.220159] TRACE: raw:PREROUTING:policy:2 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=40720 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK URGP=0 OPT (0101080A119B3B6E78DE73EE) kernel: [49988.220173] TRACE: filter:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=40720 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK URGP=0 OPT (0101080A119B3B6E78DE73EE) kernel: [49988.220268] TRACE: raw:OUTPUT:policy:2 IN= OUT=enp1s0 SRC=192.168.31.75 DST=216.58.200.36 LEN=129 TOS=0x00 PREC=0x00 TTL=64 ID=40721 DF PROTO=TCP SPT=56284 DPT=80 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK PSH URGP=0 OPT (0101080A119B3B6E78DE73EE) UID=0 GID=0 kernel: [49988.220285] TRACE: filter:OUTPUT:policy:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=127.0.0.1 LEN=129 TOS=0x00 PREC=0x00 TTL=64 ID=40721 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK PSH URGP=0 OPT (0101080A119B3B6E78DE73EE) UID=0 GID=0 kernel: [49988.220303] TRACE: raw:PREROUTING:policy:2 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=129 TOS=0x00 PREC=0x00 TTL=64 ID=40721 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK PSH URGP=0 OPT (0101080A119B3B6E78DE73EE) kernel: [49988.220317] TRACE: filter:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=129 TOS=0x00 PREC=0x00 TTL=64 ID=40721 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357150 ACK=279668517 WINDOW=229 RES=0x00 ACK PSH URGP=0 OPT (0101080A119B3B6E78DE73EE) kernel: [49990.360989] TRACE: raw:OUTPUT:policy:2 IN= OUT=enp1s0 SRC=192.168.31.75 DST=216.58.200.36 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=40722 DF PROTO=TCP SPT=56284 DPT=80 SEQ=2232357227 ACK=279669057 WINDOW=237 RES=0x00 ACK URGP=0 OPT (0101080A119B43CA78DE7C4A) UID=0 GID=0 kernel: [49990.361079] TRACE: filter:OUTPUT:policy:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=40722 DF PROTO=TCP SPT=56284 DPT=1085 SEQ=2232357227 ACK=279669057 WINDOW=237 RES=0x00 ACK URGP=0 OPT (0101080A119B43CA78DE7C4A) UID=0 GID=0

local generated packet to local destination

local genrated packet to be sent out

OUTPUT -> POSTROUTING (raw->mangle->filter) (mangle) out=lo use mangle kernel: [52679.696178] TRACE: raw:OUTPUT:policy:2 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) UID=0 GID=0 kernel: [52679.696194] TRACE: mangle:OUTPUT:policy:1 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) UID=0 GID=0 kernel: [52679.696207] TRACE: filter:OUTPUT:policy:1 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) UID=0 GID=0 kernel: [52679.696219] TRACE: mangle:POSTROUTING:policy:1 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) UID=0 GID=0

when interface get the packet

PREROUTING -> INPUT (raw->mangle) (mangle->filter) kernel: [52679.696249] TRACE: raw:PREROUTING:rule:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) kernel: [52679.696263] TRACE: raw:PREROUTING:policy:2 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) kernel: [52679.696276] TRACE: mangle:PREROUTING:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) kernel: [52679.696290] TRACE: mangle:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307) kernel: [52679.696304] TRACE: filter:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3070 DF PROTO=TCP SPT=38932 DPT=80 SEQ=2191382873 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080AAECEE2970000000001030307)

kernel: [52679.696340] TRACE: raw:OUTPUT:policy:2 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=38932 SEQ=1168447734 ACK=2191382874 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080AAECEE297AECEE29701030307) kernel: [52679.696351] TRACE: mangle:OUTPUT:policy:1 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=38932 SEQ=1168447734 ACK=2191382874 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080AAECEE297AECEE29701030307) kernel: [52679.696362] TRACE: filter:OUTPUT:policy:1 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=38932 SEQ=1168447734 ACK=2191382874 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080AAECEE297AECEE29701030307) kernel: [52679.696372] TRACE: mangle:POSTROUTING:policy:1 IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=38932 SEQ=1168447734 ACK=2191382874 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080AAECEE297AECEE29701030307)

redirect target is similar to DNAT target

nat table: Chain OUTPUT (policy ACCEPT 1797 packets, 108K bytes) pkts bytes target prot opt in out source destination 6 360 REDIRECT tcp – any any anywhere anywhere match-set gfwlist dst redir ports 1085

raw table: root@wen-Default-string:/home/wen/ruijian_cocimg# iptables -v -L -t raw Chain PREROUTING (policy ACCEPT 9 packets, 640 bytes) pkts bytes target prot opt in out source destination 0 0 TRACE all – any any anywhere anywhere ctstate DNAT

Chain OUTPUT (policy ACCEPT 8 packets, 840 bytes) pkts bytes target prot opt in out source destination 141 18467 TRACE tcp – any any anywhere anywhere match-set gfwlist dst

curl 207.38.70.46 ##triger the OUTPUT rule 1 since this ip is in gfwlist

generate the packet by curl

OUTPUT------------------------------------>POSTROUTING (raw->mangle->nat->filter) (mangle–>nat) Jul 5 14:38:59 wen-Default-string kernel: [64365.700752] TRACE: raw:OUTPUT:policy:2 IN= OUT=enp1s0 SRC=192.168.31.75 DST=207.38.70.46 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=80 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) UID=1000 GID=1000 Jul 5 14:38:59 wen-Default-string kernel: [64365.700783] TRACE: mangle:OUTPUT:policy:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=207.38.70.46 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=80 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) UID=1000 GID=1000 Jul 5 14:38:59 wen-Default-string kernel: [64365.700797] TRACE: nat:OUTPUT:rule:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=207.38.70.46 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=80 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) UID=1000 GID=1000 Jul 5 14:38:59 wen-Default-string kernel: [64365.700822] TRACE: filter:OUTPUT:policy:1 IN= OUT=enp1s0 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) UID=1000 GID=1000 Jul 5 14:38:59 wen-Default-string kernel: [64365.700834] TRACE: mangle:POSTROUTING:policy:1 IN= OUT=lo SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) UID=1000 GID=1000 Jul 5 14:38:59 wen-Default-string kernel: [64365.700846] TRACE: nat:POSTROUTING:policy:1 IN= OUT=lo SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) UID=1000 GID=1000

###### since out=lo, the packet will be in the lo interface and trigger PREROUTING

ss-redir get the packet from lo

PREROUTING -> INPUT (raw->mangle) (mangle->filter) Jul 5 14:38:59 wen-Default-string kernel: [64365.700882] TRACE: raw:PREROUTING:rule:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) Jul 5 14:38:59 wen-Default-string kernel: [64365.700896] TRACE: raw:PREROUTING:policy:2 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) Jul 5 14:38:59 wen-Default-string kernel: [64365.700910] TRACE: mangle:PREROUTING:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) Jul 5 14:38:59 wen-Default-string kernel: [64365.700926] TRACE: mangle:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307) Jul 5 14:38:59 wen-Default-string kernel: [64365.700939] TRACE: filter:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.31.75 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9195 DF PROTO=TCP SPT=60772 DPT=1085 SEQ=2711700657 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080ABC2B5F6B0000000001030307)

#######finally, INPUT in filter ACCEPT, then the packet can get the server port 1085, then server reply with ACK like IN= OUT=lo SRC=127.0.0.1 SPT=1085, DST=192.168.31.75, DPORT=60772, ACK=62711700657 and coverted by dnat with SRC=207.38.70.46 SPT=80 DST=192.168.31.75, DPORT=60772, ACK=62711700657

get the packet generated by ss-redir

Jul 5 14:38:59 wen-Default-string kernel: [64365.700991] TRACE: raw:PREROUTING:policy:2 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=207.38.70.46 DST=192.168.31.75 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=60772 SEQ=1918215521 ACK=2711700658 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080A79B9D5F6BC2B5F6B01030307) Jul 5 14:38:59 wen-Default-string kernel: [64365.701006] TRACE: mangle:PREROUTING:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=207.38.70.46 DST=192.168.31.75 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=60772 SEQ=1918215521 ACK=2711700658 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080A79B9D5F6BC2B5F6B01030307) Jul 5 14:38:59 wen-Default-string kernel: [64365.701020] TRACE: mangle:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=207.38.70.46 DST=192.168.31.75 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=60772 SEQ=1918215521 ACK=2711700658 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080A79B9D5F6BC2B5F6B01030307) Jul 5 14:38:59 wen-Default-string kernel: [64365.701034] TRACE: filter:INPUT:policy:1 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=207.38.70.46 DST=192.168.31.75 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=60772 SEQ=1918215521 ACK=2711700658 WINDOW=43690 RES=0x00 ACK SYN URGP=0 OPT (0204FFD70402080A79B9D5F6BC2B5F6B01030307)

rules order within a specific chain

rules order in a chain generally

Chain OUTPUT (policy ACCEPT) target prot opt source destination 1 DROP all – anywhere !127.0.0.0/8 ADDRTYPE match dst-type LOCAL 2 ACCEPT all – anywhere 127.0.0.0/8 ADDRTYPE match dst-type LOCAL 3 RETURN all – anywhere 172.0.0.1/24 4 DROP all – anywhere 172.0.0.1/24

if iptables output like this, it means packet will tranverse firstly to policy 1, policy 2,3, 4, the last is the 5. ACCEPT police(next to the Chain). the policy next to the chain is the last rule to be tranversed in this chain. if the first rule mathced, the packet will be dropped, then all the tranverse afterwards will be stopped. if the second rule matched, the packet will be ACCEPTed in this chain(no further afterwards rules tranverse in this rule), but the packet will be tranversted to other chains/tables. if the third rule matched, it will end the rules tranverse within this rule, it means no chance to match rule 4. if the fourth rule not mathced, it will go the the last rule (policy ACCEPT) in the Chian output, and then to other chanins/tables tranverse afterwards.

rules order in s subset chain

Chain OUTPUT (policy ACCEPT) num target prot opt source destination 1 DOCKER all – anywhere !127.0.0.0/8 ADDRTYPE match dst-type LOCAL 2 DROP all – anywhere !127.0.0.0/8 ADDRTYPE match dst-type LOCAL

Chain DOCKER (2 references) num target prot opt source destination 1 RETURN all – anywhere anywhere ==================================================== when a packet hit the OUPUT chain, if mathced number 1 rule, it wil got the subset chain DOCKER, if match first rule of chain DOCKER, then RETURN wil return to the rule after this DOCKER jumping, that’s number 2 rule in chain OUTPUT

targets and jumps

LOG, ULOG, TOS, TRACE targets won’t jump the tranverse at all.

targets will make the tranverse jump. but ACCEPT(all other rules in this chain will be ignored), RETURN(all the afterwards rules within this chain will be ignored, but it will jump to the next rule after which it jump. DROP(all other rules in this chain,other chains/tables will be ignored) <user-defined chain name> (jump to this sub chain)

Extensions to iptables: New Matches

iptables is extensible, meaning that both the kernel and the iptables tool can be extended to provide new features.

Some of these extensions are standard, and other are more exotic. Extensions can be made by other people and distributed separately for niche users.

Kernel extensions normally live in the kernel module subdirectory, such as /lib/modules/2.4.0-test10/kernel/net/ipv4/netfilter. They are demand loaded if your kernel was compiled with CONFIG_KMOD set, so you should not need to manually insert them.

Extensions to the iptables program are shared libraries which usually live in usr/local/lib/iptables, although a distribution would put them in /lib/iptables or /usr/lib/iptables.

Extensions come in two types: new targets, and new matches (we’ll talk about new targets a little later). Some protocols automatically offer new tests: currently these are TCP, UDP and ICMP as shown below.

For these you will be able to specify the new tests on the command line after the `-p’ option, which will load the extension. For explicit new tests, use the `-m’ option to load the extension, after which the extended options will be available.

To get help on an extension, use the option to load it (`-p’, `-j’ or `-m’) followed by `-h’ or `–help’, eg:

#

IPTables and Connection Tracking

We introduced the connection tracking system implemented on top of the netfilter framework when we discussed the raw table and connection state matching criteria. Connection tracking allows iptables to make decisions about packets viewed in the context of an ongoing connection. The connection tracking system provides iptables with the functionality it needs to perform “stateful” operations.

Connection tracking is applied very soon after packets enter the networking stack. The raw table chains and some basic sanity checks are the only logic that is performed on packets prior to associating the packets with a connection.

The system checks each packet against a set of existing connections. It will update the state of the connection in its store if needed and will add new connections to the system when necessary. Packets that have been marked with the NOTRACK target in one of the raw chains will bypass the connection tracking routines.

The State Match of conntrack

The most useful match criterion is supplied by the `state’ extension, which interprets the connection-tracking analysis of the `ip_conntrack’ module. This is highly recommended.

Specifying `-m state’ allows an additional `–state’ option, which is a comma-separated list of states to match (the `!’ flag indicates not to match those states). These states are:

NEW

A packet which creates a new connection. ESTABLISHED

A packet which belongs to an existing connection (i.e., a reply packet, or outgoing packet on a connection which has seen replies). RELATED

A packet which is related to, but not part of, an existing connection, such as an ICMP error, or (with the FTP module inserted), a packet establishing an ftp data connection. INVALID

A packet which could not be identified for some reason: this includes running out of memory and ICMP errors which don’t correspond to any known connection. Generally these packets should be dropped.

An example of this powerful match extension would be:

Available States

Connections tracked by the connection tracking system will be in one of the following states:

NEW: When a packet arrives that is not associated with an existing connection, but is not invalid as a first packet, a new connection will be added to the system with this label. This happens for both connection-aware protocols like TCP and for connectionless protocols like UDP. ESTABLISHED: A connection is changed from NEW to ESTABLISHED when it receives a valid response in the opposite direction. For TCP connections, this means a SYN/ACK and for UDP and ICMP traffic, this means a response where source and destination of the original packet are switched. RELATED: Packets that are not part of an existing connection, but are associated with a connection already in the system are labeled RELATED. This could mean a helper connection, as is the case with FTP data transmission connections, or it could be ICMP responses to connection attempts by other protocols. INVALID: Packets can be marked INVALID if they are not associated with an existing connection and aren’t appropriate for opening a new connection, if they cannot be identified, or if they aren’t routable among other reasons. UNTRACKED: Packets can be marked as UNTRACKED if they’ve been targeted in a raw table chain to bypass tracking. SNAT: A virtual state set when the source address has been altered by NAT operations. This is used by the connection tracking system so that it knows to change the source addresses back in reply packets. DNAT: A virtual state set when the destination address has been altered by NAT operations. This is used by the connection tracking system so that it knows to change the destination address back when routing reply packets.

The states tracked in the connection tracking system allow administrators to craft rules that target specific points in a connection’s lifetime. This provides the functionality needed for more thorough and secure rules.

TCP Extensions

The TCP extensions are automatically loaded if `-p tcp’ is specified. It provides the following options (none of which match fragments).

–tcp-flags

Followed by an optional `!’, then two strings of flags, allows you to filter on specific TCP flags. The first string of flags is the mask: a list of flags you want to examine. The second string of flags tells which one(s) should be set. For example,

This indicates that all flags should be examined (`ALL’ is synonymous with `SYN,ACK,FIN,RST,URG,PSH’), but only SYN and ACK should be set. There is also an argument `NONE’ meaning no flags. –syn

Optionally preceded by a `!’, this is shorthand for `–tcp-flags SYN,RST,ACK SYN’. –source-port

followed by an optional `!’, then either a single TCP port, or a range of ports. Ports can be port names, as listed in /etc/services, or numeric. Ranges are either two port names separated by a `:’, or (to specify greater than or equal to a given port) a port with a `:’ appended, or (to specify less than or equal to a given port), a port preceded by a `:’. –sport

is synonymous with `–source-port’. –destination-port

and –dport

are the same as above, only they specify the destination, rather than source, port to match. –tcp-option

followed by an optional `!’ and a number, matches a packet with a TCP option equaling that number. A packet which does not have a complete TCP header is dropped automatically if an attempt is made to examine its TCP options.

An Explanation of TCP Flags

It is sometimes useful to allow TCP connections in one direction, but not the other. For example, you might want to allow connections to an external WWW server, but not connections from that server.

The naive approach would be to block TCP packets coming from the server. Unfortunately, TCP connections require packets going in both directions to work at all.

The solution is to block only the packets used to request a connection. These packets are called SYN packets (ok, technically they’re packets with the SYN flag set, and the RST and ACK flags cleared, but we call them SYN packets for short). By disallowing only these packets, we can stop attempted connections in their tracks.

The `–syn’ flag is used for this: it is only valid for rules which specify TCP as their protocol. For example, to specify TCP connection attempts from 192.168.1.1:

-p TCP -s 192.168.1.1 –syn

This flag can be inverted by preceding it with a `!’, which means every packet other than the connection initiation.

UDP Extensions

These extensions are automatically loaded if `-p udp’ is specified. It provides the options `–source-port’, `–sport’, `–destination-port’ and `–dport’ as detailed for TCP above. ICMP Extensions

This extension is automatically loaded if `-p icmp’ is specified. It provides only one new option:

–icmp-type

followed by an optional `!’, then either an icmp type name (eg `host-unreachable’), or a numeric type (eg. `3’), or a numeric type and code separated by a `/’ (eg. `3/3’). A list of available icmp type names is given using `-p icmp –help’.

Other Match Extensions

The other extensions in the netfilter package are demonstration extensions, which (if installed) can be invoked with the `-m’ option.

mac

This module must be explicitly specified with `-m mac’ or `–match mac’. It is used for matching incoming packet’s source Ethernet (MAC) address, and thus only useful for packets traversing the PREROUTING and INPUT chains. It provides only one option:

–mac-source

followed by an optional `!’, then an ethernet address in colon-separated hexbyte notation, eg `–mac-source 00:60:08:91:CC:B7’.

limit

This module must be explicitly specified with `-m limit’ or `–match limit’. It is used to restrict the rate of matches, such as for suppressing log messages. It will only match a given number of times per second (by default 3 matches per hour, with a burst of 5). It takes two optional arguments:

–limit

followed by a number; specifies the maximum average number of matches to allow per second. The number can specify units explicitly, using `/second’, `/minute’, `/hour’ or `/day’, or parts of them (so `5/second’ is the same as `5/s’). –limit-burst

followed by a number, indicating the maximum burst before the above limit kicks in.

This match can often be used with the LOG target to do rate-limited logging. To understand how it works, let’s look at the following rule, which logs packets with the default limit parameters:

The first time this rule is reached, the packet will be logged; in fact, since the default burst is 5, the first five packets will be logged. After this, it will be twenty minutes before a packet will be logged from this rule, regardless of how many packets reach it. Also, every twenty minutes which passes without matching a packet, one of the burst will be regained; if no packets hit the rule for 100 minutes, the burst will be fully recharged; back where we started.

Note: you cannot currently create a rule with a recharge time greater than about 59 hours, so if you set an average rate of one per day, then your burst rate must be less than 3.

You can also use this module to avoid various denial of service attacks (DoS) with a faster rate to increase responsiveness.

Syn-flood protection:

Furtive port scanner:

Ping of death:

This module works like a “hysteresis door”, as shown in the graph below.

rate (pkt/s) ^ .—.

/ DoS \

/ \

Edge of DoS -|.....:................................ = (limit * | /: \ limit-burst) | / : \ .-.

/ : \ / \

End of DoS -|/....:..............:…/........…/. = limit | : :`-’ `–’ -------------+-----+--------------+------------------> time (s) LOGIC => Match | Didn’t Match | Match

Say we say match one packet per second with a five packet burst, but packets start coming in at four per second, for three seconds, then start again in another three seconds.

<–Flood 1–> <—Flood 2—>

Total ^ Line __– YNNN Packets| Rate __– YNNN

mum __– YNNN

10 | Maxi __– Y

__– Y

__– YNNN

- YNNN

5 | Y

Y Key: Y -> Matched Rule

Y N -> Didn’t Match Rule

Y

0 +--------------------------------------------------> Time (seconds) 0 1 2 3 4 5 6 7 8 9 10 11 12

You can see that the first five packets are allowed to exceed the one packet per second, then the limiting kicks in. If there is a pause, another burst is allowed but not past the maximum rate set by the rule (1 packet per second after the burst is used).

owner

This module attempts to match various characteristics of the packet creator, for locally-generated packets. It is only valid in the OUTPUT chain, and even then some packets (such as ICMP ping responses) may have no owner, and hence never match.

–uid-owner userid

Matches if the packet was created by a process with the given effective (numerical) user id. –gid-owner groupid

Matches if the packet was created by a process with the given effective (numerical) group id. –pid-owner processid

Matches if the packet was created by a process with the given process id. –sid-owner sessionid

Matches if the packet was created by a process in the given session group.

unclean

This experimental module must be explicitly specified with `-m unclean or `–match unclean’. It does various random sanity checks on packets. This module has not been audited, and should not be used as a security device (it probably makes things worse, since it may well have bugs itself). It provides no options.

iptables flag some sort of packets

capture packets with some specific address attribute

Capture tcp packets from/to port 80

capture packets form/to a program

capture packets from/to by a userid process

if you want to only caputure the packets from/to a specific process, it is feasible to do this by iptables function like this: Capture packets generated by uid: 13 to file uid-13.pcap a process run as user for example user proxy ps aux |grep squid //squid run as user proxy $id -u proxy 13

capture packets from/to by a process process id (pid )

iptables -A OUTPUT -m owner –pid-owner 13 -j CONNMARK –set-mark 2 iptables -A INPUT -m connmark –mark 33 -j NFLOG –nflog-group 40 iptables -A OUTPUT -m connmark –mark 2 -j NFLOG –nflog-group 40 dumpcap -i nflog:40 -w port-80.pcap

TARGET LOG example

iptables -L –line-number -t nat 31 RETURN all – anywhere 240.0.0.0/4 32 LOG tcp – anywhere anywhere LOG level warning prefix “rule fore” 33 RETURN all – anywhere anywhere match-set chnroute dst 34 LOG tcp – anywhere anywhere LOG level warning prefix “rule after” 35 REDIRECT tcp – anywhere anywhere redir ports 1085

### insert the LOG rule after 33 root@wen-Default-string:/home/wen# iptables -t nat -I SHADOWSOCKS 33 -p tcp -j LOG –log-prefix “rule after”

iptables -L -v –line-number -t nat Chain SHADOWSOCKS (2 references) pkts bytes target prot opt in out source destination 1 60 RETURN all – any any anywhere li1703-142.members.linode.com 1 60 RETURN all – any any anywhere api.dynu.com 0 0 RETURN all – any any anywhere api.dynu.com 0 0 RETURN all – any any anywhere api.dynu.com 0 0 RETURN all – any any anywhere 240.0.0.0/4 61 3660 LOG tcp – any any anywhere anywhere LOG level warning prefix “rule fore” 4 240 RETURN all – any any anywhere anywhere match-set chnroute dst 48 2880 LOG tcp – any any anywhere anywhere LOG level warning prefix “rule after” 121 7298 REDIRECT tcp – any any anywhere anywhere redir ports 1085

curl 172.217.160.68 #####google.com root@wen-Default-string:/home/wen/ruijian_cocimg# grep “172.217.160.68” /var/log/syslog Jul 2 14:15:01 wen-Default-string kernel: [316674.936638] rule foreIN= OUT=enp1s0 SRC=192.168.31.75 DST=172.217.160.68 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=61558 DF PROTO=TCP SPT=59122 DPT=8 Jul 2 14:24:52 wen-Default-string kernel: [317265.612122] rule foreIN= OUT=enp1s0 SRC=192.168.31.75 DST=172.217.160.68 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27926 DF PROTO=TCP SPT=59740 DPT=8 Jul 2 14:24:52 wen-Default-string kernel: [317265.612142] rule afterIN= OUT=enp1s0 SRC=192.168.31.75 DST=172.217.160.68 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27926 DF PROTO=TCP SPT=59740 DPT=

google ip wil reach to the next rule of REDIRECT to 1085

curl 182.140.245.49#####taobao.com root@wen-Default-string:/home/wen/ruijian_cocimg# grep “182.140.245.49” /var/log/syslog Jul 2 14:25:04 wen-Default-string kernel: [317277.679435] rule foreIN= OUT=enp1s0 SRC=192.168.31.75 DST=182.140.245.49 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=61086 DF PROTO=TCP SPT=38608 DPT=8

taobao ip will only reach to match-set chnroute dst

TARGET TRACE example

this only works in filter table

x_tables: ip_tables: TRACE target: only valid in raw table, not nat

sudo iptables -t raw -A PREROUTING -p tcp –dport 25 -j TRACE sudo iptables -t raw -A OUTPUT -p tcp –dport 25 -j TRACE modprobe ipt_LOG modprobe nf_log_ipv4 sysctl net.netfilter.nf_log.2=nf_log_ipv4

/var/log/syslog will show the packet firstly from raw table PREROUTING:

Jul 4 16:53:34 wen-Default-string kernel: [498977.349233] TRACE: mangle:PREROUTING:policy:1 IN=enp1s0 OUT= MAC=00:e0:4c:68:27:62:f0:b4:29:d7:c3:c6:08:00 SRC=101.204.228.187 DST=192.168.31.75 LEN=80 TOS=0x00 PREC=0x00 TTL=50 ID=11478 PROTO=TCP SPT=16337 DPT=22 SEQ=2610936745 ACK=4213764086 WINDOW=65535 RES=0x00 ACK PSH URGP=0 OPT (0101080AE0639FA0598E5567) Jul 4 16:53:34 wen-Default-string kernel: [498977.349317] TRACE: mangle:INPUT:policy:1 IN=enp1s0 OUT= MAC=00:e0:4c:68:27:62:f0:b4:29:d7:c3:c6:08:00 SRC=101.204.228.187 DST=192.168.31.75 LEN=80 TOS=0x00 PREC=0x00 TTL=50 ID=11478 PROTO=TCP SPT=16337 DPT=22 SEQ=2610936745 ACK=4213764086 WINDOW=65535 RES=0x00 ACK PSH URGP=0 OPT (0101080AE0639FA0598E5567) UID=0 GID=0 Jul 4 16:53:34 wen-Default-string kernel: [498977.349388] TRACE: filter:INPUT:policy:1 IN=enp1s0 OUT= MAC=00:e0:4c:68:27:62:f0:b4:29:d7:c3:c6:08:00 SRC=101.204.228.187 DST=192.168.31.75 LEN=80 TOS=0x00 PREC=0x00 TTL=50 ID=11478 PROTO=TCP SPT=16337 DPT=22 SEQ=2610936745 ACK=4213764086 WINDOW=65535 RES=0x00 ACK PSH URGP=0 OPT (0101080AE0639FA0598E5567) UID=0 GID=0

centos7 add iptable trace

How to trace IPTables in RHEL7 / CENTOS7

If you are debugging IPTables, it is handy to be able to trace the packets while it traverses the various chains. I was trying to find out why port forwarding from the external NIC to a virtual machine attached to a virtual bridge device was not working.

You need to perform the following preparations:

Load the (IPv4) netfilter log kernel module:

Enable logging for the IPv4 (AF Family 2):

reconfigure rsyslogd to log kernel messages (kern.*) to /var/log/messages:

kern.*;*.info;mail.none;authpriv.none;cron.none /var/log/messages

restart rsyslogd:

Now check the raw tables – you’ll see that there are already entries coming from firewalld:

The rules now look as follows:

Chain PREROUTING (policy ACCEPT) target prot opt source destination TRACE tcp – anywhere anywhere tcp dpt:http PREROUTING_direct all – anywhere anywhere

Chain OUTPUT (policy ACCEPT) target prot opt source destination TRACE tcp – anywhere anywhere tcp dpt:http OUTPUT_direct all – anywhere anywhere

Chain OUTPUT_direct (1 references) target prot opt source destination

Chain PREROUTING_direct (1 references) target prot opt source destination Now access to that specific machine’s TCP port 80 are logged to /var/log/messages:

May 27 19:57:54 storm3 kernel: TRACE: mangle:PRE_public:rule:3 IN=em1 OUT= MAC=ec:f4:bb:f1:4e:f0:00:25:46:70:2e:41:08:00 SRC=10.36.7.11 DST=10.32.105.30 LEN=64 TOS=0x00 PREC=0x00 TTL=59 ID=10953 DF PROTO=TCP SPT=54451 DPT=80 SEQ=1779626624 ACK=0 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (020404D8010303050101080A124E9DB70000000004020000) May 27 19:57:54 storm3 kernel: TRACE: mangle:PRE_public_allow:return:1 IN=em1 OUT= MAC=ec:f4:bb:f1:4e:f0:0

how to configure a linux server as a router

if the destination ip address is not the local system, then it will through PREROUTING -> FORWARD -> POSTROUTING

the server has two interface, eth0 for wan, eth1 for lan.

//packet will be applying MSQUERADE

MASQUERADE is source nating the packet, change the source ip and set a map before that, make sure the FORWARD rule of table filter will allow this packet #iptables –append FORWARD –in-interface eth1 -j ACCEPT //all in coming packet on interface eht1 will be forwarded, assuming default policy is DROP

A confiugration of NAT with iptables for a router function

tep-By-Step Configuration of NAT with iptables

This tutorial shows how to set up network-address-translation (NAT) on a Linux system with iptables rules so that the system can act as a gateway and provide internet access to multiple hosts on a local network using a single public IP address. This is achieved by rewriting the source and/or destination addresses of IP packets as they pass through the NAT system. Requirements:

CPU - PII or more OS - Any Linux distribution Software - Iptables Network Interface Cards: 2

Here is my considerations:

Replace xx.xx.xx.xx with your WAN IP

Replace yy.yy.yy.yy with your LAN IP

(i.e. 192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8 as suggested by Mr. tzs)

WAN = eth0 with public IP xx.xx.xx.xx LAN = eth1 with private IP yy.yy.yy.yy/ 255.255.0.0

Step by Step Procedure

Step #1. Add 2 Network cards to the Linux box

Step #2. Verify the Network cards, Wether they installed properly or not

ls /etc/sysconfig/network-scripts/ifcfg-eth* | wc -l

( The output should be “2”)

Step #3. Configure eth0 for Internet with a Public ( IP External network or Internet)

cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0 BOOTPROTO=none BROADCAST=xx.xx.xx.255 # Optional Entry HWADDR=00:50:BA:88:72:D4 # Optional Entry IPADDR=xx.xx.xx.xx NETMASK=255.255.255.0 # Provided by the ISP NETWORK=xx.xx.xx.0 # Optional ONBOOT=yes TYPE=Ethernet USERCTL=no IPV6INIT=no PEERDNS=yes GATEWAY=xx.xx.xx.1 # Provided by the ISP

Step #4. Configure eth1 for LAN with a Private IP (Internal private network)

cat /etc/sysconfig/network-scripts/ifcfg-eth1

BOOTPROTO=none PEERDNS=yes HWADDR=00:50:8B:CF:9C:05 # Optional TYPE=Ethernet IPV6INIT=no DEVICE=eth1 NETMASK=255.255.0.0 # Specify based on your requirement BROADCAST=”” IPADDR=192.168.2.1 # Gateway of the LAN NETWORK=192.168.0.0 # Optional USERCTL=no ONBOOT=yes

Step #5. Host Configuration (Optional)

cat /etc/hosts

127.0.0.1 nat localhost.localdomain localhost

Step #6. Gateway Configuration

cat /etc/sysconfig/network

NETWORKING=yes HOSTNAME=nat GATEWAY=xx.xx.xx.1 # Internet Gateway, provided by the ISP

Step #7. DNS Configuration

cat /etc/resolv.conf

nameserver 203.145.184.13 # Primary DNS Server provided by the ISP nameserver 202.56.250.5 # Secondary DNS Server provided by the ISP

Step #8. NAT configuration with IP Tables

iptables –flush # Flush all the rules in filter and nat tables

iptables –table nat –flush

iptables –delete-chain

iptables –table nat –delete-chain

iptables –table nat –append POSTROUTING –out-interface eth0 -j MASQUERADE

iptables –append FORWARD –in-interface eth1 -j ACCEPT

echo 1 > /proc/sys/net/ipv4/ip_forward

#Apply the configuration

service iptables restart

Step #9. Testing

ping 192.168.2.1

Try it on your client systems

ping google.com Configuring PCs on the network (Clients)

• All PC’s on the private office network should set their “gateway” to be the local private network IP address of the Linux gateway computer. • The DNS should be set to that of the ISP on the internet. Windows ‘95, 2000, XP, Configuration:

• Select “Start” + Settings” + “Control Panel” • Select the “Network” icon • Select the tab “Configuration” and double click the component “TCP/IP” for the ethernet card. (NOT the TCP/IP -> Dial-Up Adapter) • Select the tabs: o “Gateway”: Use the internal network IP address of the Linux box. (192.168.2.1) o “DNS Configuration”: Use the IP addresses of the ISP Domain Name Servers. (Actual internet IP address) o “IP Address”: The IP address (192.168.XXX.XXX - static) and netmask (typically 255.255.0.0 for a small local office network) of the PC can also be set here.

netfliter using

iptables can flag some packets which sent by a specific process

Capture packets generated by uid: 13 to file uid-13.pcap

for example, squid run as a user proxy id -u proxy 13

Capture tcp packets from/to port 80

linux new kernel support TRACE in /var/log/kern.log or /var/log/syslog

modprobe nf_log_ipv4 sysctl net.netilter.nf_log.2=nf_log_ipv4 f

Iptables further reference

Using iptables

iptables has a fairly detailed manual page (man iptables), and if you need more detail on particulars. Those of you familiar with ipchains may simply want to look at Differences Between iptables and ipchains; they are very similar.

There are several different things you can do with iptables. You start with three built-in chains INPUT, OUTPUT and FORWARD which you can’t delete. Let’s look at the operations to manage whole chains:

Create a new chain (-N). Delete an empty chain (-X). Change the policy for a built-in chain. (-P). List the rules in a chain (-L). Flush the rules out of a chain (-F). Zero the packet and byte counters on all rules in a chain (-Z).

There are several ways to manipulate rules inside a chain:

Append a new rule to a chain (-A). Insert a new rule at some position in a chain (-I). Replace a rule at some position in a chain (-R). Delete a rule at some position in a chain, or the first that matches (-D).

7.1 What You’ll See When Your Computer Starts Up

iptables may be a module, called (`iptable_filter.o’), which should be automatically loaded when you first run iptables. It can also be built into the kernel permenantly.

Before any iptables commands have been run (be careful: some distributions will run iptables in their initialization scripts), there will be no rules in any of the built-in chains (`INPUT’, `FORWARD’ and `OUTPUT’), all the chains will have a policy of ACCEPT. You can alter the default policy of the FORWARD chain by providing the `forward=0’ option to the iptable_filter module. 7.2 Operations on a Single Rule

This is the bread-and-butter of packet filtering; manipulating rules. Most commonly, you will probably use the append (-A) and delete (-D) commands. The others (-I for insert and -R for replace) are simple extensions of these concepts.

Each rule specifies a set of conditions the packet must meet, and what to do if it meets them (a `target’). For example, you might want to drop all ICMP packets coming from the IP address 127.0.0.1. So in this case our conditions are that the protocol must be ICMP and that the source address must be 127.0.0.1. Our target is `DROP’.

127.0.0.1 is the `loopback’ interface, which you will have even if you have no real network connection. You can use the `ping’ program to generate such packets (it simply sends an ICMP type 8 (echo request) which all cooperative hosts should obligingly respond to with an ICMP type 0 (echo reply) packet). This makes it useful for testing.

PING 127.0.0.1 (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.2 ms

— 127.0.0.1 ping statistics — 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.2/0.2/0.2 ms

PING 127.0.0.1 (127.0.0.1): 56 data bytes

— 127.0.0.1 ping statistics — 1 packets transmitted, 0 packets received, 100% packet loss #

You can see here that the first ping succeeds (the `-c 1’ tells ping to only send a single packet).

Then we append (-A) to the `INPUT’ chain, a rule specifying that for packets from 127.0.0.1 (`-s 127.0.0.1’) with protocol ICMP (`-p icmp’) we should jump to DROP (`-j DROP’).

Then we test our rule, using the second ping. There will be a pause before the program gives up waiting for a response that will never come.

We can delete the rule in one of two ways. Firstly, since we know that it is the only rule in the input chain, we can use a numbered delete, as in:

#

To delete rule number 1 in the INPUT chain.

The second way is to mirror the -A command, but replacing the -A with -D. This is useful when you have a complex chain of rules and you don’t want to have to count them to figure out that it’s rule 37 that you want to get rid of. In this case, we would use:

#

The syntax of -D must have exactly the same options as the -A (or -I or -R) command. If there are multiple identical rules in the same chain, only the first will be deleted.

7.3 Filtering Specifications

We have seen the use of `-p’ to specify protocol, and `-s’ to specify source address, but there are other options we can use to specify packet characteristics. What follows is an exhaustive compendium. Specifying Source and Destination IP Addresses

Source (`-s’, `–source’ or `–src’) and destination (`-d’, `–destination’ or `–dst’) IP addresses can be specified in four ways. The most common way is to use the full name, such as `localhost’ or `www.linuxhq.com’. The second way is to specify the IP address such as `127.0.0.1’.

The third and fourth ways allow specification of a group of IP addresses, such as `199.95.207.0/24’ or `199.95.207.0/255.255.255.0’. These both specify any IP address from 199.95.207.0 to 199.95.207.255 inclusive; the digits after the `/’ tell which parts of the IP address are significant. `/32’ or `/255.255.255.255’ is the default (match all of the IP address). To specify any IP address at all `/0’ can be used, like so:

[ NOTE: `-s 0/0’ is redundant here. ]

#

This is rarely used, as the effect above is the same as not specifying the `-s’ option at all. Specifying Inversion

Many flags, including the `-s’ (or `–source’) and `-d’ (`–destination’) flags can have their arguments preceded by `!’ (pronounced `not’) to match addresses NOT equal to the ones given. For example. `-s ! localhost’ matches any packet not coming from localhost. Specifying Protocol

The protocol can be specified with the `-p’ (or `–protocol’) flag. Protocol can be a number (if you know the numeric protocol values for IP) or a name for the special cases of `TCP’, `UDP’ or `ICMP’. Case doesn’t matter, so `tcp’ works as well as `TCP’.

The protocol name can be prefixed by a `!’, to invert it, such as `-p ! TCP’ to specify packets which are not TCP. Specifying an Interface

The `-i’ (or `–in-interface’) and `-o’ (or `–out-interface’) options specify the name of an interface to match. An interface is the physical device the packet came in on (`-i’) or is going out on (`-o’). You can use the ifconfig command to list the interfaces which are `up’ (i.e., working at the moment).

Packets traversing the INPUT chain don’t have an output interface, so any rule using `-o’ in this chain will never match. Similarly, packets traversing the OUTPUT chain don’t have an input interface, so any rule using `-i’ in this chain will never match.

Only packets traversing the FORWARD chain have both an input and output interface.

It is perfectly legal to specify an interface that currently does not exist; the rule will not match anything until the interface comes up. This is extremely useful for dial-up PPP links (usually interface ppp0) and the like.

As a special case, an interface name ending with a `+’ will match all interfaces (whether they currently exist or not) which begin with that string. For example, to specify a rule which matches all PPP interfaces, the -i ppp+ option would be used.

The interface name can be preceded by a `!’ with spaces around it, to match a packet which does not match the specified interface(s), eg -i ! ppp+. Specifying Fragments

Sometimes a packet is too large to fit down a wire all at once. When this happens, the packet is divided into fragments, and sent as multiple packets. The other end reassembles these fragments to reconstruct the whole packet.

The problem with fragments is that the initial fragment has the complete header fields (IP + TCP, UDP and ICMP) to examine, but subsequent packets only have a subset of the headers (IP without the additional protocol fields). Thus looking inside subsequent fragments for protocol headers (such as is done by the TCP, UDP and ICMP extensions) is not possible.

If you are doing connection tracking or NAT, then all fragments will get merged back together before they reach the packet filtering code, so you need never worry about fragments.

Please also note that in the INPUT chain of the filter table (or any other table hooking into the NF_IP_LOCAL_IN hook) is traversed after defragmentation of the core IP stack.

Otherwise, it is important to understand how fragments get treated by the filtering rules. Any filtering rule that asks for information we don’t have will not match. This means that the first fragment is treated like any other packet. Second and further fragments won’t be. Thus a rule -p TCP –sport www (specifying a source port of `www’) will never match a fragment (other than the first fragment). Neither will the opposite rule -p TCP –sport ! www.

However, you can specify a rule specifically for second and further fragments, using the `-f’ (or `–fragment’) flag. It is also legal to specify that a rule does not apply to second and further fragments, by preceding the `-f’ with ` ! ‘.

Usually it is regarded as safe to let second and further fragments through, since filtering will effect the first fragment, and thus prevent reassembly on the target host; however, bugs have been known to allow crashing of machines simply by sending fragments. Your call.

Note for network-heads: malformed packets (TCP, UDP and ICMP packets too short for the firewalling code to read the ports or ICMP code and type) are dropped when such examinations are attempted. So are TCP fragments starting at position 8.

As an example, the following rule will drop any fragments going to 192.168.1.1:

#

7.4 Target Specifications

Now we know what examinations we can do on a packet, we need a way of saying what to do to the packets which match our tests. This is called a rule’s target.

There are two very simple built-in targets: DROP and ACCEPT. We’ve already met them. If a rule matches a packet and its target is one of these two, no further rules are consulted: the packet’s fate has been decided.

There are two types of targets other than the built-in ones: extensions and user-defined chains. User-defined chains

One powerful feature which iptables inherits from ipchains is the ability for the user to create new chains, in addition to the three built-in ones (INPUT, FORWARD and OUTPUT). By convention, user-defined chains are lower-case to distinguish them (we’ll describe how to create new user-defined chains below in Operations on an Entire Chain).

When a packet matches a rule whose target is a user-defined chain, the packet begins traversing the rules in that user-defined chain. If that chain doesn’t decide the fate of the packet, then once traversal on that chain has finished, traversal resumes on the next rule in the current chain.

Time for more ASCII art. Consider two (silly) chains: INPUT (the built-in chain) and test (a user-defined chain).

`INPUT’ `test’ ---------------------------- ----------------------------

Rule1: -p ICMP -j DROP		Rule1: -s 192.168.1.1
--------------------------		--------------------------
Rule2: -p TCP -j test		Rule2: -d 192.168.1.1
--------------------------	----------------------------
Rule3: -p UDP -j DROP

Consider a TCP packet coming from 192.168.1.1, going to 1.2.3.4. It enters the INPUT chain, and gets tested against Rule1 - no match. Rule2 matches, and its target is test, so the next rule examined is the start of test. Rule1 in test matches, but doesn’t specify a target, so the next rule is examined, Rule2. This doesn’t match, so we have reached the end of the chain. We return to the INPUT chain, where we had just examined Rule2, so we now examine Rule3, which doesn’t match either.

So the packet path is:

v __________________________ `INPUT’ | / `test’ v ------------------------|–/ -----------------------|----

Rule1	/		Rule1
-----------------------	/-		----------------------	—
Rule2 /		Rule2
--------------------------	-----------------------v----
Rule3 –+___________________________

------------------------|— v

User-defined chains can jump to other user-defined chains (but don’t make loops: your packets will be dropped if they’re found to be in a loop). Extensions to iptables: New Targets

The other type of extension is a target. A target extension consists of a kernel module, and an optional extension to iptables to provide new command line options. There are several extensions in the default netfilter distribution:

LOG

This module provides kernel logging of matching packets. It provides these additional options:

–log-level

Followed by a level number or name. Valid names are (case-insensitive) `debug’, `info’, `notice’, `warning’, `err’, `crit’, `alert’ and `emerg’, corresponding to numbers 7 through 0. See the man page for syslog.conf for an explanation of these levels. The default is `warning’. –log-prefix

Followed by a string of up to 29 characters, this message is sent at the start of the log message, to allow it to be uniquely identified.

This module is most useful after a limit match, so you don’t flood your logs. REJECT

This module has the same effect as `DROP’, except that the sender is sent an ICMP `port unreachable’ error message. Note that the ICMP error message is not sent if (see RFC 1122):

The packet being filtered was an ICMP error message in the first place, or some unknown ICMP type. The packet being filtered was a non-head fragment. We’ve sent too many ICMP error messages to that destination recently (see /proc/sys/net/ipv4/icmp_ratelimit).

REJECT also takes a `–reject-with’ optional argument which alters the reply packet used: see the manual page.

Special Built-In Targets

There are two special built-in targets: RETURN and QUEUE.

RETURN has the same effect of falling off the end of a chain: for a rule in a built-in chain, the policy of the chain is executed. For a rule in a user-defined chain, the traversal continues at the previous chain, just after the rule which jumped to this chain.

QUEUE is a special target, which queues the packet for userspace processing. For this to be useful, two further components are required:

a “queue handler”, which deals with the actual mechanics of passing packets between the kernel and userspace; and a userspace application to receive, possibly manipulate, and issue verdicts on packets.

The standard queue handler for IPv4 iptables is the ip_queue module, which is distributed with the kernel and marked as experimental.

The following is a quick example of how to use iptables to queue packets for userspace processing:

With this rule, locally generated outgoing ICMP packets (as created with, say, ping) are passed to the ip_queue module, which then attempts to deliver the packets to a userspace application. If no userspace application is waiting, the packets are dropped.

To write a userspace application, use the libipq API. This is distributed with iptables. Example code may be found in the testsuite tools (e.g. redirect.c) in CVS.

The status of ip_queue may be checked via:

/proc/net/ip_queue

The maximum length of the queue (i.e. the number packets delivered to userspace with no verdict issued back) may be controlled via:

/proc/sys/net/ipv4/ip_queue_maxlen

The default value for the maximum queue length is 1024. Once this limit is reached, new packets will be dropped until the length of the queue falls below the limit again. Nice protocols such as TCP interpret dropped packets as congestion, and will hopefully back off when the queue fills up. However, it may take some experimenting to determine an ideal maximum queue length for a given situation if the default value is too small.

7.5 Operations on an Entire Chain

A very useful feature of iptables is the ability to group related rules into chains. You can call the chains whatever you want, but I recommend using lower-case letters to avoid confusion with the built-in chains and targets. Chain names can be up to 31 letters long. Creating a New Chain

Let’s create a new chain. Because I am such an imaginative fellow, I’ll call it test. We use the `-N’ or `–new-chain’ options:

#

It’s that simple. Now you can put rules in it as detailed above. Deleting a Chain

Deleting a chain is simple as well, using the `-X’ or `–delete-chain’ options. Why `-X’? Well, all the good letters were taken.

#

There are a couple of restrictions to deleting chains: they must be empty (see Flushing a Chain below) and they must not be the target of any rule. You can’t delete any of the three built-in chains. If you don’t specify a chain, then all user-defined chains will be deleted, if possible. Flushing a Chain

There is a simple way of emptying all rules out of a chain, using the `-F’ (or `–flush’) commands.

#

If you don’t specify a chain, then all chains will be flushed. Listing a Chain

You can list all the rules in a chain by using the `-L’ (or `–list’) command.

The `refcnt’ listed for each user-defined chain is the number of rules which have that chain as their target. This must be zero (and the chain be empty) before this chain can be deleted.

If the chain name is omitted, all chains are listed, even empty ones.

There are three options which can accompany `-L’. The `-n’ (numeric) option is very useful as it prevents iptables from trying to lookup the IP addresses, which (if you are using DNS like most people) will cause large delays if your DNS is not set up properly, or you have filtered out DNS requests. It also causes TCP and UDP ports to be printed out as numbers rather than names.

The `-v’ options shows you all the details of the rules, such as the the packet and byte counters, the TOS comparisons, and the interfaces. Otherwise these values are omitted.

Note that the packet and byte counters are printed out using the suffixes `K’, `M’ or `G’ for 1000, 1,000,000 and 1,000,000,000 respectively. Using the `-x’ (expand numbers) flag as well prints the full numbers, no matter how large they are. Resetting (Zeroing) Counters

It is useful to be able to reset the counters. This can be done with the `-Z’ (or `–zero’) option.

Consider the following:

In the above example, some packets could pass through between the `-L’ and `-Z’ commands. For this reason, you can use the `-L’ and `-Z’ together, to reset the counters while reading them. Setting Policy

We glossed over what happens when a packet hits the end of a built-in chain when we discussed how a packet walks through chains earlier. In this case, the policy of the chain determines the fate of the packet. Only built-in chains (INPUT, OUTPUT and FORWARD) have policies, because if a packet falls off the end of a user-defined chain, traversal resumes at the previous chain.

The policy can be either ACCEPT or DROP, for example:

#

tcpdump to capture packets generated recently

capture tcp flags with Sync and Final

in raspberry, 216.58.194.164 is www.google.com it will be dnat to redirect to 1085 /usr/bin/ss-redir -s us0.0bad.com -p 31856 -l 1085 ####(us0.0bad.com)45.79.93.169.31856

curl 216.58.194.164

root@wen-Default-string:/home/wen# tcpdump -i any -n ‘tcp[tcpflags] & (tcp-syn|tcp-fin) != 0’ tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 15:58:28.296597 IP 192.168.31.75.42606 > 127.0.0.1.1085: Flags [S], seq 944619112, win 29200, options [mss 1460,sackOK,TS val 1742167111 ecr 0,nop,wscale 7], length 0 15:58:28.296643 IP 216.58.194.164.80 > 192.168.31.75.42606: Flags [S.], seq 3860745401, ack 944619113, win 43690, options [mss 65495,sackOK,TS val 1378123321 ecr 1742167111,nop,wscale 7], length 0 15:58:28.296856 IP 192.168.31.75.48540 > 45.79.93.169.31856: Flags [S], seq 3907401065, win 29200, options [mss 1460,sackOK,TS val 2208452572 ecr 0,nop,wscale 7], length 0 15:58:28.513248 IP 45.79.93.169.31856 > 192.168.31.75.48540: Flags [S.], seq 2870573123, ack 3907401066, win 63443, options [mss 1460,nop,wscale 6,nop,nop,sackOK], length 0 15:58:28.764522 IP 192.168.31.75.42606 > 127.0.0.1.1085: Flags [F.], seq 944619191, ack 3860745942, win 237, options [nop,nop,TS val 1742167579 ecr 1378123788], length 0 15:58:28.765153 IP 192.168.31.75.48540 > 45.79.93.169.31856: Flags [F.], seq 102, ack 557, win 237, length 0 15:58:28.765307 IP 216.58.194.164.80 > 192.168.31.75.42606: Flags [F.], seq 541, ack 80, win 342, options [nop,nop,TS val 1378123789 ecr 1742167579], length 0 15:58:28.980662 IP 45.79.93.169.31856 > 192.168.31.75.48540: Flags [F.], seq 557, ack 103, win 16407, length 0 ^C 8 packets captured 12 packets received by filter 0 packets dropped by kernel

iptables for docker packet flow

docker container and docker host share the same iptables configuration. docker0 and eth0 are both interfaces in host and docker container

docker container confi

[root@0593e928b872 /]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 [root@0593e928b872 /]# ifconfig eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02 inet addr:172.17.0.2 Bcast:172.17.255.255 Mask:255.255.0.0

docker host confi

[root@TeamCI-1 ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.56.233.130 0.0.0.0 UG 100 0 0 eno1 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 [root@TeamCI-1 ~]# ifconfig |head -14 docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.56.233.181 netmask 255.255.255.128 broadcast 10.56.233.255

----------------------=

iptables confiuration same in container and host

root@TeamCI-1 ~]# iptables -t filter -L -v Chain INPUT (policy ACCEPT 2155K packets, 193M bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT udp – virbr0 any anywhere anywhere udp dpt:domain 0 0 ACCEPT tcp – virbr0 any anywhere anywhere tcp dpt:domain 0 0 ACCEPT udp – virbr0 any anywhere anywhere udp dpt:bootps 0 0 ACCEPT tcp – virbr0 any anywhere anywhere tcp dpt:bootps

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 4 336 DOCKER-USER all – any any anywhere anywhere 4 336 DOCKER-ISOLATION-STAGE-1 all – any any anywhere anywhere 2 168 ACCEPT all – any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED 0 0 DOCKER all – any docker0 anywhere anywhere 2 168 ACCEPT all – docker0 !docker0 anywhere anywhere 0 0 ACCEPT all – docker0 docker0 anywhere anywhere 0 0 ACCEPT all – any virbr0 anywhere 192.168.122.0/24 ctstate RELATED,ESTABLISHED 0 0 ACCEPT all – virbr0 any 192.168.122.0/24 anywhere 0 0 REJECT all – any virbr0 anywhere anywhere reject-with icmp-port-unreachable 0 0 REJECT all – virbr0 any anywhere anywhere reject-with icmp-port-unreachable

Chain OUTPUT (policy ACCEPT 347K packets, 5088M bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT udp – any virbr0 anywhere anywhere udp dpt:bootpc

Chain DOCKER (1 references) pkts bytes target prot opt in out source destination

[root@TeamCI-1 ~]# iptables -t nat -L -v Chain PREROUTING (policy ACCEPT 42392 packets, 3349K bytes) pkts bytes target prot opt in out source destination 8411 496K DOCKER all – any any anywhere anywhere ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 8427 packets, 499K bytes) pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 1564 packets, 94986 bytes) pkts bytes target prot opt in out source destination 1 84 DOCKER all – any any anywhere !loopback/8 ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 1563 packets, 94902 bytes) pkts bytes target prot opt in out source destination 2 168 MASQUERADE all – any !docker0 172.17.0.0/16 anywhere 6 1255 RETURN all – any any 192.168.122.0/24 224.0.0.0/24 0 0 RETURN all – any any 192.168.122.0/24 255.255.255.255 0 0 MASQUERADE tcp – any any 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 0 0 MASQUERADE udp – any any 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 0 0 MASQUERADE all – any any 192.168.122.0/24 !192.168.122.0/24

Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 1 84 RETURN all – docker0 any anywhere anywhere

an icmp message sent in the docker container to internet ip

when docker container send a ping <ip-addr> message, it will route to docker0-ip according to iptables

docker container send packet out, through OUTPUT and POSTROUTING

( <docker–continer-ip>,<internet-ip> )---------> nat OUTPUT 1 84 DOCKER all – any any anywhere !loopback/8 ADDRTYPE match dst-type LOCAL hit the target DOCKER packet is 1. Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 1 84 RETURN all – docker0 any anywhere anywhere hit the target RETURN.

------------------------>nat POSTROUTING() 2 168 MASQUERADE all – any !docker0 172.17.0.0/16 anywhere hit target MASQUERADE, the destination is eth0, not docker0, masquerade the source ip with out interface ip

( <docker0-ip>,<internet-ip> )---------> egress eth0

docker daemon will forward this ( <docker0-ip>,<internet-ip> ) in egress eth0 docker container to docker0 ingress in the host

docker host receive the packet at interface ingress docker0, through PREROUTING, FORWARD, POSTROUTING

Chain PREROUTING (policy ACCEPT 42392 packets, 3349K bytes) 8411 496K DOCKER all – any any anywhere anywhere ADDRTYPE match dst-type LOCAL not hit since dst is internet-ip

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) 4 336 DOCKER-USER all – any any anywhere anywhere 4 336 DOCKER-ISOLATION-STAGE-1 all – any any anywhere anywhere 2 168 ACCEPT all – any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED

------------------------>nat POSTROUTING() Chain POSTROUTING (policy ACCEPT 1563 packets, 94902 bytes) 2 168 MASQUERADE all – any !docker0 172.17.0.0/16 anywhere src is 172.17. dst is eno1, hit the target MASQUERADE, the destination is eth0, not docker0, masquerade the source ip with out interface ip eno1 (<host-ip>,<internet-ip>) packets has been sent to the internet ip via host eno1 egress interface

docker host receive the packet at interface ingress eno1, through PREROUTING, FORWARD, POSTROUTING

(<internet-ip>, <host-ip>)

ingress interface----> PREROUTING—route–> FORWARD ----->POSTROUTING------>egress interface NO PRROUTING rule hit, then route, since MASQUERADE using(SNAT), the dst<host-ip> will be replaced with <docker0-ip> here, since dst is not eno1, then FORWARD it. Chain POSTROUTING (policy ACCEPT 1563 packets, 94902 bytes) pkts bytes target prot opt in out source destination 2 168 MASQUERADE all – any !docker0 172.17.0.0/16 anywhere ###since dst is <docker0-ip>, out interface is docker0, not hit this, 6 1255 RETURN all – any any 192.168.122.0/24 224.0.0.0/24 ### it will return so (<internet-ip>, <docker0-ip>) will egress via docker0 interface

docker container receive the packet at interface eth0 through PREROUTING, INPUT

(<internet-ip>, <docker0-ip>) will be received at eth0 in docker container

ingress interface----> PREROUTING—route–> INPUT

NO PRROUTING rule hit, then route, since MASQUERADE using(SNAT), the dst<docker0-ip> will be replaced with <docker-continer-ip> INPUT to the stack protocel with send ping message

iptables debug/trace

target LOG only using target LOG

compare every iptables packets to figure out the packet tranverse path. trace a icmp packet since icmp packet is none unless you ping a host Pkts is the packets which pass the rule, we can add LOG target in every rule

configure mangle table with LOG target to trace all rules(since only mangle table has all the 5 rules)

pi@raspberrypi:~ $ sudo iptables -t mangle -L -v Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 21 3408 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-prerouting”

Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 21 3408 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-input”

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-forward”

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 18 1680 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-output”

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 18 1680 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-postrouting” pi@raspberrypi:~ $

pi@raspberrypi:~ $ ping -c 1 192.168.31.1 PING 192.168.31.1 (192.168.31.1) 56(84) bytes of data. 64 bytes from 192.168.31.1: icmp_seq=1 ttl=64 time=0.753 ms

Log file

pi@raspberrypi:~ $ sed -n 2758,2767p /var/log/aka.log Apr 15 14:09:05 raspberrypi kernel: [1301291.540592] mangle-outputIN= OUT=eth0 SRC=192.168.31.85 DST=192.168.31.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=19795 DF PROTO=ICMP TYPE=8 CODE=0 ID=16052 SEQ=1 Apr 15 14:09:05 raspberrypi kernel: [1301291.540648] mangle-postroutingIN= OUT=eth0 SRC=192.168.31.85 DST=192.168.31.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=19795 DF PROTO=ICMP TYPE=8 CODE=0 ID=16052 SEQ=1 #### in pi which eth0 addr is 192.168.31.85, ping 192.168.31.1, this will generate a icmp packets OUT=eth0(route table) in the output chain #### then postrouting chain, if some SNAT action, then eth0 egress

#### assuming the packet get replied(ICMP_REPLY) from 192.168.31.1, then eth0 ingress get it.

Apr 15 14:09:05 raspberrypi kernel: [1301291.541304] mangle-preroutingIN=eth0 OUT= SRC=192.168.31.1 DST=192.168.31.85 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=13397 PROTO=ICMP TYPE=0 CODE=0 ID=16052 SEQ=1 Apr 15 14:09:05 raspberrypi kernel: [1301291.541367] mangle-inputIN=eth0 OUT= SRC=192.168.31.1 DST=192.168.31.85 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=13397 PROTO=ICMP TYPE=0 CODE=0 ID=16052 SEQ=1 ### thus after input chain, packet will get to upper layer which send the icmp request

Pkts increase one after ping pi@raspberrypi:~ $ sudo iptables -t mangle -L -v Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 22 3408 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-prerouting”

Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 22 3408 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-input”

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-forward”

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 19 1680 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-output”

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 19 1680 LOG icmp – any any anywhere anywhere LOG level warning prefix “mangle-postrouting” pi@raspberrypi:~ $

packets may be dropped in any rules, see if system generate a packet, OUTPUT---->POSTROUTING----->egress if we didn’t see the OUPUT of mangle benn tranversed, but the POSTROUTING of mangle table been tranversed, it may never be reach egress then likely nat/filter table’s OUTPUT rule may drop it. if we see both the OUPUT of mangle benn tranversed, but the POSTROUTING of mangle table been tranversed, it may be reach egress(as long as nat table POSTROUTING not drop it)

packets received in ingress interface will tranverse PREROUTING, then route table to choose INPUT or FORWARD rule. FOWWARD rule’s default action is drop, so make sure the packet be ACCEPT instead of being dropped. if mangle PREROUTING and FORWARD rule both been hit, but no POSTROUTING rule in mangle, then maybe filter’s FORWARD rule drop it.

we can see only mangle table have 5 rules, so every rule in mangle table should be tranversed normally. if some rule is missing, then backtrace the last rule to check which rule drop it.

SNAT another network namespace’s ip as normal eth0 address

pi@raspberrypi:~ $ sudo iptables -L -t nat -v Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 4 336 LOG icmp – any any anywhere anywhere LOG level warning 17 3304 MASQUERADE all – any any 192.168.163.0/24 anywhere

for example eth0 ip addr is 192.168.31.85, then when there is a icmp from 192.168.163.1 to 192.168.31.1, then out=eth0, MASQUERADE means SNAT out interface’s address as source to replace 192.168.163.1 with 192.168.31.85

pi@raspberrypi:~ $ sudo ip route show default via 192.168.31.1 dev eth0 proto dhcp src 192.168.31.85 metric 202 169.254.0.0/16 dev docker0 scope link src 169.254.191.248 metric 204 169.254.0.0/16 dev vethb82cf1e scope link src 169.254.177.27 metric 213 169.254.0.0/16 dev br1_test scope link src 169.254.59.219 metric 218 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.31.0/24 dev eth0 proto dhcp scope link src 192.168.31.85 metric 202 192.168.163.0/24 dev br1_test proto kernel scope link src 192.168.163.10 192.168.163.0/24 dev veth-b scope link src 192.168.163.10 metric 216

but this packet IN=br1_test can’t reach POSTROUTING rule if filter’s FORWARD rule not ACCEPT it: pkts bytes target prot opt in out source destination 0 0 ACCEPT icmp – any br1_test anywhere anywhere pi@raspberrypi:~ $ sudo iptables -t filter -L -v |head -20 Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 263 42313 LOG icmp – any any anywhere anywhere LOG level warning

Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT icmp – any br1_test anywhere anywhere 4 336 ACCEPT icmp – br1_test any anywhere anywhere 5 420 LOG icmp – any any anywhere anywhere LOG level warning prefix “filter-forward ”

reverse direction of SNAT

icmp from 192.168.163.1 to 192.168.31.1 SNAT as SRC=192.168.31.85 DST=192.168.31.1 out=eth0 when 31.1 reply then there’s a packet ICMP_REPLY in eth0 in ingress

SRC=192.168.31.1 DST=192.168.31.85 IN=eth0 out= this will be DNAT back to the orignial src 192.168.163.1 SRC=192.168.31.1 DST=192.168.163.1 IN=eth0 out=br1_test so another FORWARD rule need to accept this packet also

pkts bytes target prot opt in out source destination 4 336 ACCEPT icmp – br1_test any anywhere anywhere

SRC=192.168.31.1 DST=192.168.163.1 IN=eth0 out=br1_test been forward to br1_test egress

owner

man iptables-extensions This module attempts to match various characteristics of the packet creator, for locally generated packets. This match is only valid in the OUTPUT and POSTROUTING chains. Forwarded packets do not have any socket associated with them. Packets from kernel threads do have a socket, but usually no owner.

[!] –uid-owner username

[!] –uid-owner userid[-userid] Matches if the packet socket’s file structure (if it has one) is owned by the given user. You may also specify a nu‐ merical UID, or an UID range.

[!] –gid-owner groupname

[!] –gid-owner groupid[-groupid] Matches if the packet socket’s file structure is owned by the given group. You may also specify a numerical GID, or a GID range.

[!] –socket-exists Matches if the packet is associated with a socket.

iptables -t nat -A OUTPUT -p tcp -m owner –uid-owner linuxaria -j REDSOCKS //tcp packet owner is linuxaria will be chekced by rules in Chain “REDSOCKS”

capture all the packets generated in

pi@raspberrypi:~ $ sudo grep test /etc/passwd test:x:1000:1000::/home/test:/bin/bash

## To track responses to outgoing traffic, a connection mark has to be set in OUTPUT and matched in INPUT.

tproxy

https://powerdns.org/tproxydoc/tproxy.md.html

The routing part

When a packet enters a Linux system it is routed, dropped, or if the destination address matches a local address, accepted for processing by the system itself. Local addresses can be specific, like 192.0.2.1, but can also match whole ranges. This is for example how all of 127.0.0.0/8 is considered as ‘local’. It is entirely possible to tell Linux 0.0.0.0/0 (‘everything’) is local, but this would leave it unable to connect to any remote address. However, with a separate routing table, we can enable this selectively:

configuration

iptables -t mangle -I PREROUTING -p udp –dport 5301 -j MARK –set-mark 1 ip rule add fwmark 1 lookup 100 ip route add local 0.0.0.0/0 dev lo table 100

This says: mark all UDP packets coming in to the system to port 5301 with ‘1’. The next line sends those marked packets to routing table 100. And finally, the last line declares all of IPv4 as local in routing table 100. Intercepting packets: the userspace part

code to intercept

With the routing rule and table above, the following simple code intercepts all packets routed through the system destined for 5301, regardless of destination IP address:

Socket s(AF_INET, SOCK_DGRAM, 0); ComboAddress local(“0.0.0.0”, 5301); ComboAddress remote(local);

SBind(s, local);

for(;;) { string packet=SRecvfrom(s, 1500, remote); cout<<”Received a packet from “<<remote.toStringWithPort()<<endl; }

The two roles of IP_TRANSPARENT

The IP_TRANSPARENT socket option enables:

Binding to addresses that are not (usually) considered local Receiving connections and packets from iptables TPROXY redirected sessions

Binding to non-local IP addresses

Regular sockets are used for transparent proxying, but a special flag, IP_TRANSPARENT, is set to indicate that this socket might receive data destined for a non-local addresses.

Note: as explained above, we can declare 0.0.0.0/0 as “local” (or ::/0), but if this is not in a default routing table, we still need this flag to convince the kernel we know what we are doing when we bind to a non-local IP address.

The following code spoofs a UDP address from 1.2.3.4 to 198.41.0.4:

Socket s(AF_INET, SOCK_DGRAM, 0); SSetsockopt(s, IPPROTO_IP, IP_TRANSPARENT, 1); ComboAddress local(“1.2.3.4”, 5300); ComboAddress remote(“198.41.0.4”, 53);

SBind(s, local); SSendto(s, “hi!”, remote);

Note: this requires root or CAP_NET_ADMIN to work.

With tcpdump we can observe that an actual packet leaves the host:

tcpdump -n host 1.2.3.4 21:29:41.005856 IP 1.2.3.4.5300 > 198.41.0.4.53: [|domain]

IP_TRANSPARENT is mentioned in ip(7). The iptables part

In the code examples above, traffic had to be delivered to a socket bound to the exact port of the intercepted traffic. We also had to bind the socket to 0.0.0.0 (or ::) for it to see all traffic.

TPROXY target in iptable

iptables has a target called TPROXY which gives us additional flexibility to send intercepted traffic to a specific local IP address and simultaneously mark it too.

The basic syntax is:

iptables -t mangle -A PREROUTING -p tcp –dport 25 -j TPROXY \ –tproxy-mark 0x1/0x1 –on-port 10025 –on-ip 127.0.0.1

This says: take everything destined for a port 25 on TCP and deliver this for a process listening on 127.0.0.1:10025 and mark the packet with 1.

This mark then makes sure the packet ends up in the right routing table.

With the iptables line above, we can now bind to 127.0.0.1:10025 and receive all traffic destined for port 25. Note that the IP_TRANSPARENT option still needs to be set for this to work, even when we bind to 127.0.0.1. Getting the original destination address

For TCP sockets, the original destination address and port of a socket is available via getsockname(). This is needed for example to setup a connection to the originally intended destination.

An example piece of code:

Socket s(AF_INET, SOCK_STREAM, 0); SSetsockopt(s, IPPROTO_IP, IP_TRANSPARENT, 1); ComboAddress local(“127.0.0.1”, 10025);

SBind(s, local); SListen(s, 128);

ComboAddress remote(local), orig(local); int client = SAccept(s, remote); cout<<”Got connection from “<<remote.toStringWithPort()<<endl;

SGetsockname(client, orig); cout<<”Original destination: “<<orig.toStringWithPort()<<endl;

For UDP, the IP_RECVORIGDSTADDR socket option can be set with setsockopt(). To actually get to that address, recvmsg() must be used which will then pass the original destination as a cmsg with index IP_ORIGDSTADDR containing a struct sockaddr_in.

Note: as of May 2017, many recently deployed Linux kernels have a bug which breaks IP_RECVORIGDSTADDR. Many TPROXY iptables examples on the internet contain an unexplained refinement that uses -m socket -p tcp. The socket module of iptables matches patches that correspond to a local socket, which may be more precise or faster than navigating a set of specific rules.

The setup you’ll find everywhere sets up a redirect chain which marks and accepts packets:

TPROXY configuration

iptables -t mangle -N DIVERT iptables -t mangle -A DIVERT -j MARK –set-mark 1 iptables -t mangle -A DIVERT -j ACCEPT

The following then makes sure that everything that corresponds to an established local socket gets sent there, followed by what should happen to new packets:

iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT iptables -t mangle -A PREROUTING -p tcp –dport 25 -j TPROXY \ –tproxy-mark 0x1/0x1 –on-port 10025 –on-ip 127.0.0.1 iptables -t mangle -A PREROUTING -p tcp –dport 80 -j TPROXY \ –tproxy-mark 0x1/0x1 –on-port 10080 –on-ip 127.0.0.1

Files

ip_tables.org

Latest commit

History

ip_tables.org

File metadata and controls

IPTables Tables basic introduction

Iptables elements

Tables

1. Filter Table

2. NAT table

3. Mangle table

4. Raw table

IPTABLES CHAINS

IPTABLES RULES

adding rules to a chain in a table

deleting rules to a chain in a table

listing rules in a table

Target Values

rules in order

A Deep Dive into iptables and netfilters

What Are IPTables and Netfilter?

Netfileter Hooks

common filter Hooks

Which Tables are Available?

filter table

nat table

mangle table

raw table

Security Table

table/chains/rules orders applied to packet

overal table order from top to bottom

chains tranverse order

an packet generated in the local host traverse route:

local generated packet to local destination

local genrated packet to be sent out

when interface get the packet

redirect target is similar to DNAT target

generate the packet by curl

ss-redir get the packet from lo

get the packet generated by ss-redir

rules order within a specific chain

rules order in a chain generally

rules order in s subset chain

targets and jumps

Extensions to iptables: New Matches

IPTables and Connection Tracking

The State Match of conntrack

Available States

TCP Extensions

An Explanation of TCP Flags

UDP Extensions

Other Match Extensions

mac

limit

owner

unclean

iptables flag some sort of packets

capture packets with some specific address attribute

capture packets form/to a program

capture packets from/to by a userid process

capture packets from/to by a process process id (pid )

TARGET LOG example

TARGET TRACE example

centos7 add iptable trace

how to configure a linux server as a router

A confiugration of NAT with iptables for a router function

netfliter using

Capture packets generated by uid: 13 to file uid-13.pcap

Capture tcp packets from/to port 80

linux new kernel support TRACE in /var/log/kern.log or /var/log/syslog

Iptables further reference

tcpdump to capture packets generated recently

capture tcp flags with Sync and Final

iptables for docker packet flow

docker container confi

docker host confi

iptables confiuration same in container and host

an icmp message sent in the docker container to internet ip

docker container send packet out, through OUTPUT and POSTROUTING