# Transport Services and Protocols

- provide **logical communication** between app processes running on different hosts
- transport protocols run in end systems
    - sender: breaks application messages into *segments*, passes to network layer
    - receiver: reassembles segments into messages and passes to application layer
- more than one transport protocol available to applications
    - Internet: TCP and UDP

<img src="img/Snip20191122_3.png" width=60%/>

## UDP - User Datagram Protocol

- no reliability
- reliability requires more resources and incurs delay (RIP, or SNMP: Simple Net Management Protocol)

## TCP - Transmission Control Protocol

- reliable data delivery (FTP: File Transfer Protocol, HTTP)
    - no bit error
    - no packet loss
    - no reordering
    - no duplication



# Multiplexing and Demultiplexing

<img src="img/Snip20191122_5.png" width=80%/>

## Ports

- programs use *port numbers* to *address* other programs

<img src="img/Snip20191122_4.png" width=80%/>

- e.g., "Host #1: IP addr x, Port # n1" and "Host #2: IP addr y, Port # n2" is a pair that uniquely identifies the association between the two apps

- 3 types of addresses
    - MAC address for 1-hop communication
    - IP address for multi-hop communication
    - ports for app-app communications

- **ports are identified by a 16-bit ID**
    - There are $2^{16}$ UDP ports and $2^{16}$ TCP ports
- **reserved ports** (a few 1000s from the lower end, including some well known ports like TCP: #80/HTTP, #179/BGP, #20,21/FTP and UDP #520/RIP)
- **free ports**: allocated by the system admin




# 3.3 Connectionless Transport: UDP

- **Connectionless**: no handshaking between UDP sender and receiver, each UDP segment handled independently of others
- "best effort" service, UDP segments may be lost, or delivered out-of-order to the application
- UDP usage
    - streaming multimedia applications (loss tolerant, rate sensitive)
    - DNS
    - SNMP (Simple Network Management Protocol)
- reliable transfer over UDP
    - add reliability at application layer
    - application-specific error recovery
    
## UDP Segment Header

<img src="img/Snip20191122_6.png" width=60%/>

- **checksum**: computed in the same way as that was explained in the network layer

## UDP Advantage

- no connection establishment and thus lower delay
- simple: no connection state at sender and receiver
- small header size 
- no congestion control: UDP can blast away as fast as desired



# 3.4 Principles of Reliable Data Transfer

- packets are transmitted over unreliable channels which can
    1. corrupt the packets
        - bit errors; solve with checksum & feedback (acknowledgement)
    2. deliver the same packet more than once (duplication)
        - solve with a sequence number in the packet
    3. deliver the packets out of order
        - solve with a sequence number in the packet
    4. lose packets
        - packet loss; use timers and timeouts


- Automatic Repeat Request (ARQ) protocols are used to provide reliable data transfer
- Reliable data transfer requires **feedback** from the receiver to the sender (acknowledgement)
- **Sequence numbers** are used to solve the problem of out of order and duplicate packets
- **Timers** are used to solve the problem of lost packets


## Selective Repeat (ARQ Protocol)

- one of the ARQ protocols
- sender can transmit up to $N$ unack'edd packets in pipeline
- receiver *individually* acknowledges all correctly received packets
    - buffer packets, as needed, for eventual in-order delivery to upper layer
    - any out of order packet (at the receiver) or ACK (at the sender) will be buffered
- sender maintains timer for each unacked packet
    - when timer expires, sender only resends packets for which ACK not received
- sender window
    - $N$ consecuitive sequence numbers
    - limits sequence numbers of sent, unacked packets

<img src="img/Snip20191122_7.png" width=80%/>

- sequence number space is modulo 9

### Selective Repeat Dilemma

- example: sequence number space: 0, 1, 2, 3; window size = 3

correct behavior | erroneous
---|---
<img src="img/Snip20191122_8.png"/> | <img src="img/Snip20191122_9.png"/>

- the receiver sees no difference in two scenarios, but there are duplicated data accepted as new in the erronous scenario

- **window size should be equal to or less than half of the sequence number space**


## Performance of Sliding Window w/o Errors

\begin{equation}
\textrm{efficiency} = {\textrm{the period used to transmit actual data} \over \textrm{the period the transmitter has to wait before being able to transmit again}}
\end{equation}

<img src="img/Snip20191125_27.png" width=80%/>

- RTT is defined as the time at transmitter from finishing the last bit of the packet to until the time the transmitter receives the ACK
- in this example, the window size $N = 3$

\begin{equation}
\textrm{Link Utilization} = \textrm{efficiency} = U = {N {L \over R} \over RTT + {L \over R}}
\end{equation}



# 3.5 Connection-oriented Transport: TCP

- **connection-oriented**: handshaking (exchange of control messages) inits sender/receiver state before data exchange; not circuit switching (intermediate nodes do not keep state of connection)
- **point-to-point**: one sender, one receiver
- **reliable, in-order byte stream**: no "message boundaries"
- **full duplex data**: bi-directional data flow in the same connection
    - MSS: maximum segment size
- **pipelined**: TCP congestion and flow control set window size
- **flow controlled**: sender will not overwhelm receiver



## TCP Segment Structure

<img src="img/Snip20191125_28.png" width=90%/>

### Sequence Numbers & ACKs

- **sequence numbers**: byte stream "number" of first byte in segment's data

- **acknowledgements**: sequence number of next byte expected from the other size-
     - cumulative ACK

sender | receiver
---|---
<img src="img/Snip20191125_29.png"> | <img src="img/Snip20191125_30.png"/>

<img src="img/Snip20191125_31.png" width=60%/>

- An ACK for client-to-server data carried in a segment from server-to-client is called *piggybacking*: which is intended to **eliminate separate ACKS**

<img src="img/Snip20191125_32.png" width=80%/>

## TCP Reliable Data Transfer

Protocol|Selective Repeat|TCP
---|---|---
pipelined| Yes| Yes
Timers|One for each packet | one timer for the oldest unacked packet
ACK| for every packet (individual ack) | cumulative ack, ACK#n will ack all the previous (n-1) packets
Retransmission| Timeout| Timeout/Triple duplicate ACK

- TCP creates RDT service on top of IP's unreliable service
    - pipelined segments
    - cumulative acks
    - single retransmission
- Retransmissions triggered by
    - timeout events
    - duplicate acks

### TCP Sender Events

1. Data received from application layer
    - create segment with sequence number
    - sequence number is byte-stream number of the first data byte in the segment
    - start timer if not already running
        - timer is for the oldest unacked segent
        - expiration interval: $TimeOutInterval$
2. Timeout
    - retransmit segment that caused timeout
    - restart timer
3. Acknowledgement received
    - if ACK acknowledges previously unacked segments
        - update what is known to be ACKed
        - start timer if there are still unacked segments

### TCP Retransmission Scenarios

Lost ACK | Premature Timeout | Cumulative ACK
---|---|---
<img src="img/Snip20191125_33.png">|<img src="img/Snip20191125_34.png">|<img src="img/Snip20191125_35.png">

### TCP ACK Generation

Event at Receiver | TCP Receiver Action
---|---
arrival of in-order segment with expected sequence number; all data up to expected sequence number already ACKed | delayed ACK, wait up to 500 ms for next segment; if no next segment, send ACK
arrival of in-order segment with expected sequence number; one other segment has ACK pending| immediately send single cumulative ACK, ACKing both in-order segments
arrival of out-of-order segment higher-than-expected sequence number, thus gap detected | immediately send *duplicate ACK*, indicating sequence number of next expected byte
arrival of segment that partially or completely fills the gap | immediately send ACK, provided that segment starts at lower end of gap



### TCP Fast Retransmit

- time-out period often relatively long: long delay before resending lost packet
- detect lost segments via duplicate ACKs
    - sender often sends many segments back-to-back
    - if segment is lost, there will likely be many duplicate ACKs

- **TCP fast retransmit**: if sender receives 3 ACKs for the same data (triple duplicate ACKs), resend unacked segment with smallest sequence number
    - likely that unacked segment is lost, so no need to wait for timeout
    
<img src="img/Snip20191127_46.png" width=40%/>

## TCP Flow Control

- flow control: receiver controls sender, so sender will not overflow receiver's buffer by transmitting too much data too fast

<img src="img/Snip20191127_47.png" width=80%/>


- receiver "advertises" free buffer space by including **rwnd** value in TCP header of receiver-to-sender segments
    - **RcvBuffer** size set via socket options (typical default is 4096 bytes)
    - many operating systems autoadjust **RcvBuffer**
- sender limits amount of unacked (in-flight) data to receiver's **rwnd** value
- guarantees receive buffer will not overflow

rwnd|receiver-side buffering
---|---
<img src="img/Snip20191127_52.png">|<img src="img/Snip20191127_53.png">


## TCP Connection Management

### Connection Establishment: 3-way Handshake

- **R**: RESET, connection does not exist
- **S**: SYNC, request to open a TCP connection (full-duplex)
    - if **S = 1**, then the sequence number is the initial sequence number (ISN)
- **F**: FINISH, request to close the TCP connection (full-duplex)


<img src="img/Snip20191127_54.png"/>

1. Client sends the first handshaking request packet with **SYN** set (client wants to open a TCP connection)
    - **SYN = 1**
    - a random **initial sequence number** (ISN): chosen randomly in order to avoid confusion with previous TCP connections


2. The server replies to the request with **ACK** and **SYN** set (server accept the connection from client to server, and wants to open a TCP connection from server to client)
    - **SYN = 1**
    - A random **initial sequence number** (ISN)
    - **ACK = 1**
    - Acknowledge Sequence Number: (the sequence number of the first data byte to be received)
    - **RWND**: receive window size


3. The client replies to the server's request with an **ACK**
    - **ACK**s the second SYN segment
    - **RWND**

### TCP Connection Closing

- TCP is duplex and thus connection needs to be closed on both sides
- client, server each close their side of connection
    - sending TCP segment with **FIN = 1**
- respond to received FIN with **ACK**
    - on receiving FIN, ACK can be combined with own FIN
- simultaneous FIN exchanges can be handled

<img src="img/Snip20191127_55.png" width=60%/>

<img src="img/Snip20191127_56.png" width=80%/>

<img src="img/Snip20191127_58.png" width=80%/>

# 3.7 TCP Congestion Control

- congestion: too many sources sending too much data too fast for the network to handle
    - queue builds up for the outgoing link
    - router starts dropping packets
    - manifestations: lost packets (buffer overflow at routers) and long delays (queueing in router buffers)
    
- TCP Congestion Control has two features: slow start & congestion avoidance

## Additive Increase & Multiplicative Decrease

- sender increases transmission rate (window size), probing for usable bandwidth, until loss occurs
    - **additive increase**: increase **cwnd** (congestion window) by 1 **MSS** (maximum segment size) for every acknowledged segment until loss detected
    - **multiplicative decrease**: cut **cwnd** in half after loss

<img src="img/Snip20191127_59.png" width=80%/>

<img src="img/Snip20191127_60.png" width=40%/>

- sender limits transmission: $\textrm{LastByteSent} - \textrm{LastByteAcked} \le \min(\textrm{cwnd}, \textrm{rwnd})$
    - **rwnd** is usually very large at the receiver
    - **cwnd** is dynamic, function of perceived network congestion
- TCP sending rate: send *cwnd* bytes, wait RTT for ACKs, then send more bytes
\begin{equation}
\textrm{rate} = \frac{\textrm{cwnd}}{RTT} \textrm{ bytes/sec}
\end{equation}

## TCP Slow Start

- initial rate is slow, but ramps up exponentially fast
- when connection **begins**, increase the rate exponentially until first loss event
    - initially set **cwnd = 1** MSS
    - double **cwnd** every RTT
    - done by incrementing **cwnd** for every ACK received

<img src="img/Snip20191127_61.png" width=40%/>

## TCP Congestion Avoidance

- once the congestion window (cwnd) **reaches the slow start threshold** (ssthresh), the TCP connection goes into congestion avoidance (CA) phase
- in CA, the sender will increase it congestion window by 1 MSS when all the segments in the previous cwnd have been ACKed

<img src="img/Snip20191127_62.png" width=60%/>

## TCP Detecting & Reacting to Loss

- when the loss is indicated by *timeout*
    - **cwnd** is set to 1 MSS
    - set $\textrm{ssthresh} = \frac{\textrm{cwnd}_{old}}{2}$
    - window then grows exponentially (as in slow start) to threshold, then grows linearly


- when the loss is indicated by the 3 duplicate ACKs: TCP RENO & TCP Tahoe
    - RENO: duplicated ACKs indicate network still capable of delivering some segments; **cwnd** is cut in half window then grows lienarly
    - Tahoe: same as timeout, drop **cwnd** to 1

Reason for Loss | &nbsp; &nbsp; Timeout Events &nbsp; &nbsp; | Triple-duplicate ACKs
---|---|---
TCP Tahoe|$\textrm{cwnd} = 1$ <br> $\textrm{ssthresh} = \frac{\textrm{cwnd}_{old}}{2}$; <br>enter **slow start**| $\textrm{cwnd} = 1$ <br> $\textrm{ssthresh} = \frac{\textrm{cwnd}_{old}}{2}$; <br>enter **slow start**
TCP RENO|$\textrm{cwnd} = 1$ <br> $\textrm{ssthresh} = \frac{\textrm{cwnd}_{old}}{2}$; <br>enter **slow start**| $\textrm{cwnd} = \frac{\textrm{cwnd}_{old}}{2}$ <br> $\textrm{ssthresh} = \frac{\textrm{cwnd}_{old}}{2}$; <br>enter **congestion avoidance**


<img src='img/Snip20191129_99.png' width=60%/>

- variable $\textrm{ssthresh}$, on loss event, $\textrm{ssthresh}$ is set to 1/2 of **cwnd** just before the loss event




## Slow Start vs Congestion Avoidance

&nbsp; &nbsp; Slow Start  &nbsp; &nbsp;| Congestion Avoidance
---|---
initially, $\textrm{cwnd} = 1$; send 1 segment (MSS) <br> if ACK received before TO, $\textrm{cwnd} = 2$; send 2 segments <br> if ACKs received before TO, $\textrm{cwnd} = 4$; send 4 segments <br>...<br>continue **until** the **ssthresh** is hit| each time the whole window of segments is ACKed, <br> $\textrm{cwnd} = \textrm{cwnd} + 1$ ($\textrm{cwnd}_{max}$ = RWND); <br> if TO occurs, $\textrm{ssthresh} = \frac{\textrm{cwnd}}{2}$ and $\textrm{cwnd} = 1$ <br> if triple duplicate ACKs: <br> Tahoe: $\textrm{ssthresh} = \frac{\textrm{cwnd}}{2}$ and $\textrm{cwnd} = 1$ <br> RENO: $\textrm{ssthresh} = \frac{\textrm{cwnd}}{2}$ and $\textrm{cwnd} = \textrm{ssthresh}$

## TCP Throughput

- average TCP throughput as fcuntion of window size and RTT (ignoring slow start):
\begin{equation}
\textrm{AVG TCP Throughput} = \frac{3}{4} \frac{\textrm{W}}{\textrm{RTT}} bytes/sec
\end{equation}

- W: window size (measured in bytes) where loss occurs
    - throughput fluctuates between max of $\frac{\textrm{W}}{\textrm{RTT}}$ (congestion) and $\frac{\textrm{W}/2}{\textrm{RTT}}$ (after congestion, drop rate to half)
    - average window size (number of in-flight bytes) is $\frac{3}{4} W$
    - average throughput is average window size per RTT

<img src='img/Snip20191129_100.png' width=60%/>


<img src='img/Snip20191129_101.png'/>

# A day in the life of a web request

<img src='img/Snip20191129_102.png'/>