Skip to content
Italo Cunha edited this page Feb 10, 2021 · 4 revisions

Downstream

BIRD installs routes from all clients on table 20000. Packets received on the upstream interface are routed according to this table by an ip rule (with priority 20000).

Upstream

We create one macvlan interface (called upstream{peer}) for each upstream network. These macvlans are attached to the OpenVPN tunnel and directly accessible by clients. Each macvlan gets assigned IP addresses in the following format:

100.{64+mux}.{upstream >> 8}.{upstream % 256}
2804:269c:ffff:{mux}::{upstream}

BIRD establishes upstream sessions and populates one routing table for each upstream network. Routes from upstream i are installed in table 10000+i. BIRD sets the bgp_next_hop on routes from upstream i announced to clients to the IP address on interface upstream{i}. When clients want to send packets to upstream i, ARP/NDP resolve MAC address on the macvlan interface. (The main trick is to let the mux know which upstream network to use based on the destination MAC on packets received from clients.) Packets arriving on macvlan interface upstream{i} are routed to routing table 10000 + i by an ip rule.

  • Mux and upstream identifiers are global, so there is no collision of macvlan IP addresses among different upstreams.

  • Normally, Linux would reply to ARP/NDP requests for the IP addresses on the macvlan interfaces using the MAC address of the OpenVPN tap interface (where the ARP packets arrive). To force the kernel to reply the ARP requests with the macvlan's MAC, we set net.ipv4.conf.{all,default}.arp_ignore = 1 and net.ipv4.conf.{all,default}.arp_announce = 2. NDP works as expected (although this was not tested without the above two configurations).

OpenVPN tunnels

Each mux has a tap0 interface where all OpenVPN tunnels terminate. To make all the above macvlan addresses reachable to the client, the OpenVPN server at the mux tells clients to route the entirety of 100.{64+mux}.0.0/16 and 2804:269c:ffff:{mux}::/64 through the tunnel. If a client connects to multiple muxes, each tunnel will be attached to different /16s and /64s.

Muxes and clients are allocated IP addresses on the upper half of prefixes, i.e., 100.{64 + mux}.128.0/17 and 2804:269c:ffff:{mux}:0:1::/80. (Note that upstream macvlans are allocated IP addresses on the lower half of the prefixes, i.e., 100.{64 + mux}.0.0/17 and 2804:269c:ffff:{mux}:0:0::/80. Note also that IPv4 limits the maximum number of peers the platform can support to approximately 2^15.)

The mux is always the first address on the upper half: 100.{64 + mux}.128.1 and 2804:269c:ffff:{mux}:0:1::1. Clients are allocated subsequent addresses by DHCP. If a client established multiple simultaneous connections to a mux (all previous instances of this happened by accident), each connection will have a separate tunnel and be be allocated different IP address.

AL2S addressing

The first 24 inside 100.127.0.0/16 is allocated to AL2S interfaces. Muxes connected to AL2S use IPv4 address 100.127.0.{{ id }} Muxes connected to AL2S use IPv6 address 2804:269c:ff01::{{ id }}. For the IP addressing for remote upstreams, we reserved this IP range in order to create a unique upstream address for each remote upstream in the format: 100.126.X.X/16, where X.X is the same formula as for the local upstream (peer.id). For IPv6, we have reserved ff02/48. Only the MUX which connects to that specific peer will have the IP address added to their sub-VLAN interface.

Container virtual interfaces

We use a veth pair to connect user containers running on muxes to the bird/openvpn network namespace. These veths use a /30 prefix (the .0 address is the network address, the .1 is the mux's end of the veth pair, the .2 is the container's end of the veth pair, and .3 is the broadcast address). The /30 is generated as 100.125.{M<<3}.{E*4}/30, where M is the mux id and E is the experiment's container id. For v6 we use 2804:269c:ff03:M::E:0/112, with similar association of IPs and interfaces.

Notes

  • The authoritative information about addressing information is settings.py on the website. Several constants starting with SRVMGR_ define the output of the Python code and Jinja2 templates.