Notes on "High Performance Browser Networking" (second reading, this time trying out stuff on the Raspberry Pi and/or my MacBook in the process).

### `traceroute`

* `traceroute` traces the route to an web address. The mechanism by which it does this is intruiging.

  The IP protocol has a TTL field, whose maximum value is 255 (one octet). The TTL is reduced by one by every single router along the packet travel route. When the TTL reaches 0, the router returns a ICMP message (Internet Control Message Protocol; a web standard encapsulated by IP that handles signal propogation on the Internet) recording a `TIME_EXCEEDED`.
  
  Network time-to-live was implemented to prevent "immortal packets" cycling between different hosts forever.
  
  `traceroute` cleverly sends probe packets with incrementally larger TTLs: 0, 1, ..., N. Since ICMP messages containing identifying information about the router (in particular, its IP address), assuming stable network topology, this will allow you to "illuminate" the route to the target.
  
  `traceroute` sends either an empty UPD packet or a TCP ICMP echo packet. The former will (presumably) result in a connection closed upon reaching the intended target; the latter will result in an echo return upon reaching the intended target. Once `traceroute` gets this result, it is done with its work and exits out.
  
  `traceroute` output looks like so:
  
    ```
    Alexs-MacBook:browser-networking-notes alex$ traceroute google.com
    traceroute to google.com (172.217.0.46), 64 hops max, 52 byte packets
     1  www.routerlogin.com (192.168.7.1)  2.072 ms  0.987 ms  1.029 ms
     2  192.168.0.1 (192.168.0.1)  2.724 ms  3.380 ms  2.855 ms
     3  96.120.90.145 (96.120.90.145)  16.982 ms  15.098 ms  12.252 ms
     4  po-302-1209-rur01.oakland.ca.sfba.comcast.net (68.86.249.113)  13.496 ms  104.963 ms  26.555 ms
     5  be-214-rar01.santaclara.ca.sfba.comcast.net (162.151.78.93)  56.329 ms  27.718 ms  16.772 ms
     6  be-299-ar01.santaclara.ca.sfba.comcast.net (68.86.143.93)  20.916 ms  25.936 ms  22.927 ms
     7  96.112.146.18 (96.112.146.18)  27.572 ms  18.707 ms  30.109 ms
     8  * * *
     9  209.85.252.250 (209.85.252.250)  59.955 ms
        209.85.248.34 (209.85.248.34)  24.171 ms  40.714 ms
    10  108.170.243.13 (108.170.243.13)  23.903 ms  29.063 ms  30.077 ms
    11  74.125.253.151 (74.125.253.151)  22.753 ms
        74.125.253.190 (74.125.253.190)  30.559 ms
        lga15s43-in-f14.1e100.net (172.217.0.46)  25.599 ms
    ```

  Note that for certain hops, the result includes domain names in addition to IP addresses. `traceroute` performs a lightweight reverse DNS lookup on the IP addresses it finds. Specifically, it asks DNS for a PTR (pointer) record for the server. This pointer record will obviously only exist if the server owner has published such a record. Intermediate "switchboard" routers will do so; hence the three Comcast regional service providers that have a well-known name. Wide-area network and local-area network routers will not have such records associated with them.
  
  This is a "lightweight" reverse DNS lookup because a more thorough method for reverse DNS exists: traversing the entire DNS service tree in a top-down manner, from the ISP lead network center all the way down, to get to the server which is responsible for this IP address, which can then report its A record. This is an expensive operation. Described in detail in [this StackOverflow post](https://stackoverflow.com/questions/23981098/how-forward-and-reverse-dns-works).
  
  The first hop is to the local router, which injects its login page as the network address (I think this is a NETGEAR router because mine has the same login page).
  
  Note that one of the hosts couldn't be resolved. That appears to be due to a misbehaving router simply dropping the ICMP echo packets on the floor, instead of returning them to sender like it's supposed to. It's also possible that this is occuring due to network firewall rules blocking this type of traffic.

## TCP flow control and congestion control

* There is an initial three-way handshake:
  1. `SYN` message which has the current machine picks a random number.
  2. `SYN ACK` message from the server which increments the random number by one, then appends its own.
  3. `ACK` message from your machine which increments both numbers by one again.
  
  Only upon completing this handshake will the target server being to return data. The implication is that every TCP connection requires a full roundtrip of latency before any data transfer can occur. This is a big part of the reason why TCP connection reuse is a thing.
  
  From our reading of "Data Intensive Applications" we know that three-way handshakes do not provide strong guarantees against data loss, but this is a simple protocol and it makes sense for an unreliable transport medium like the Internet, on which reconnect and packet resend is absolutely a thing.


* Network traffic throughput on a TCP connection is controlled by two windows: the congestion window and the receive window.

* The **recieve window** provides **flow control** for the data sender and reciever, e.g. a way for these two entities to reconsile incoming traffic with their processing load. TCP packets sent from A to B are "cleared" when A recieves an `ACK` message from B stating that B has received the messages. Once the sender gets this `ACK`, it is allowed to send the next `rwnd` bytes and packets' worth of payload.
  
  To reduce its receive window, the recipient may send an `ACK` with a smaller `rcwd` set.
  
  The current maximum `rcwd` value is 1 GB. The current minimum `rcwd` value is 0 bytes, which is to serve as a signal to stop all traffic until the server sends a new `ACK` packet with a non-zero `rcwd` value.

* The **congestion window** provides **congestion control** for the underlying transport network. This is a separate concern from flow control because neither the sender nor the reciever is entirely aware of the transport capacity of the underlying network.

  The congestion window is a `cwnd` value that is only known by the data sender. Interestingly, whilst the recieve window is stated in terms of bytes of data, the congestion window is stated in terms of number of packets.
  
  Congestion control is peformed using exponential growth, multiplicative backoff. When the sender recieves an `ACK` from the reciever stating that all `N` packets in the current window were recieved, the `cwnd` value is doubled. Thus the value goes from 4 packets in-flight, to 8 packets in-flight, to 16 packets in-flight, and so on. As soon as a packet loss occurs and a retransmit request is made, the `cwnd` value is halved; and the cycle begins anew.

* The maximum number of bytes/packets in flight is always the minimum of the `rcwd` and `cwnd` values.
* Of course, true bandwidth is a product of both window size and network delay. If a network is slow but very reliable, very large segment sizes are better because `ACK` message transmit time is proportionately important and message retransmit is proportionally unimportant. If the opposite is true, small segment sizes are better for the reverse reasons.

* TCP is a well-ordered protocol. TCP sockets will only serve data to applications in packet order. If there is a delay in the arrival of an early packet, the remaining packets in the segment will be blocked in a queue until the late-arriving packet arrives; only then will more data be readable from the socket. This behavior is known as **head-of-line blocking**, and it's responsible for burst randomness in reads from TCP connections that are known as "jitter".


## NAT

* NAT stands for **Network area translation**. A NAT box performs **IP masquerading**: it intercepts messages bound for certain public IP addresses, maps those to a corresponding private IP address, and forwards the message to that address. On the return trip, it ejects the target private IP address and re-injects the public IP address.

  NAT exists because it deals with **IPv4 exhaustion**. Without NAT, an endpoint must publish a public IP address to be able to communicate with other endpoints on the Internet because its public IP address must be in its IP protocol header. With NAT, an endpoint may stay on a private IP address (e.g. not take up a public IP address); the NAT will substitute the private IP address for its own public IP address, and inject a new port number into the packet that is unique to the given endpoint, before passing the message upstream. On the return trip, the NAT will resolve the port number to the corresponding endpoint, perform the necessary header substitutions, and route the traffic thusly.
  
  NAT allows endpoints that would have to set a public IP address to communicate with the public Internet set non-unique private IP addresses instead. These private IP addresses are from one of three subnets reserved specifically for this purpose, and get stacked across many different endpoints on many different private networks. NAT boxes can be stacked hierarchically to obviate the need for huge blocks of IP assignments.


## UDP

* UDP is an unreliable message delivery protocol. It omits connection handshakes, congestion control, flow control, receipt acknowledgement, packet retransmit, and well-ordered packet delivery. This allows for a simple, performant, lossy protocol. UDP is rarely the first choice for network engineering because the gaurantees provided by TCP are useful. If you want the things that TCP offers, you can implement them on top of UDP...or you can just use TCP to begin with.

  UDP sees the most use in contexts where partial delivery as fast as possible is important. For example, it's the protocol of choice for video game multiplayer. It is left up to the server software system and the local copy of the game to reconstruct state transitions from partial information in the event of packet loss, but doing this as-soon-as-possible is important for minimizing lag.
  
* The big problem with UDP on the open Internet is NAT. The management of cache entries in NAT requires a connection state machine. TCP provides such a machine (connection handshake and connection termination), so NAT boxes known exactly when to create and remove cache entries. UDP, meanwhile, is stateless. A NAT box knows when to create a cache entry, but doesn't know when to remove it. If a NAT box removes a map entry before the data is finished being communicated, routing from the receipient to the sender will fail.
* Another major issue that is that some servers may choose to block UDP traffic outright.
* There is an RFC called ICE that specifies a well-known methdology for working around these issues. The protocol is:
  1. Attempt to connect point-to-point using UDP directly.
  2. If this fails, fall back to using specially-designed STUN servers for difficult hops. STUN servers provide IP-mapping-as-a-service; the UDP peers notarize with the STUN server with a set message, and the STUN server independently manages the address mapping on the NAT boxes. When the peers are done communicating, an end notice is sent to the STUN server, which proceeds with entry clean-up.
  3. If this fails, usually due to firewall rules that block UDP traffic completely, fall back to using TURN servers. TURN servers act a relay; they tunnel the UDP traffic over TCP connections to the problematic NAT(s) that the TURN server manages.
  
  Note that STUN servers are preferable to TURN server because they require an additional handshake, but the connection is still peer-to-peer. TURN servers require an additional handshake *and* must recieve and broadcast the traffic. This increases latency, as the route followed is now indirect and is non peer-to-peer anymore. It also requires the TURN server to have enough inbound and outbound network bandwidth to handle the traffic being routed.