# CS 456

# 1. Computer Network and the Internet


## 1.1 What is the Internet

### 1.1.1 A Nuts-and-Bolts Description

Devices connected to the Internet are **host** (equivalently, **end systems**).

End systems are connected together via **communication links** and **packet switches**.  

**Transmission rate** (or **bandwidth**), measured in bits/second, is the rate of data being transmitted over a link.  
**Packets** are packages of information sent over a link.

A _packet switch_ takes a _packet_ arriving on one of its incoming communication links and forward that packet on one of its outgoing communication links.  
Two prominent packet switches are **routers** and **link-layer switches**.  
_Link-layer switches_ are typically used in access networks, while _routers_ are typically used in the network core.

The sequence of communication links and packet switches traversed by a packet from the sending end system to the receiving end system is known as a **route** or **path** through the network.

End systems access the Internet through **Internet Service Providers** (**ISPs**).  
The Internet is all about connecting end systems to each other, so the ISPs that provide access to end systems must also be interconnected.  
Lower-tier ISPs are interconnected through national and international upper-tier ISPs.  
An upper-tier ISP consists of high-speed routers interconnected with high-speed fiber-optic links.

End systems, packet switches, and other pieces of the Internet run **protocols** that control the sending and receiving of information within the Internet.  
Two most important protocols: **Transmission Control Protocol** (**TCP**) and the **Internet Protocol** (**IP**).  
IP specifies the format of the packets.  
The Internet’s principal protocols are collectively known as **TCP/IP**.

**Internet standards** are developed by the Internet Engineering Task Force (IETF).  
To ensure systems and products can inter-operate with each other.  
The IETF standards documents are called **requests for comments** (**RFCs**).

### 1.1.2 A Services Description

Internet in another angle: _an infrastructure that provides services to applications_.  
**Distributed applications**, applications that involve multiple end systems that exchange data with each other.

End systems attached to the Internet provide an **Application Programming Interface** (**API**) that specifies how a program running on one end system asks the Internet infrastructure to _deliver data_ to a _specific destination_ program running on another end system.


### 1.1.3 What is a Protocol?

> A **protocol** defines the _format_ and the _order_ of messages exchanged between two or more communicating entities, as well as the _actions taken_ on the transmission and/or receipt of a message or other event.

## 1.2 The Network Edge

Applications and end systems are at the _edge of the network_.

_Host = end system._

Hosts can be divided into two categories: **clients** and **servers**.

### 1.2.1 Access Networks

_Access network_ -- the network that physically connects an end system to the first router (also known as _edge router_) on a path from the end system to any other distant end system.

#### Home Access: DSL, Cable, FTTH, Dial-up, and Satellite

Two most prevalent types of broadband residential access are **digital subscriber line** (**DSL**) and cable.

When DSL is used, a customer's telco is also its ISP.  
Each customer's DSL modem uses the existing telephone line to exchange data with a **digital subscriber line access multiplexer** (**DSLAM**) located in the telco's local central office (CO).

Residential telephone line carries both data and telephone signals simultaneously (asynchronous):

* A high-speed downstream channel, in the 50 kHz to 1 MHz band
* A medium-speed upstream channel, in the 4 kHz to 50 kHz band
* An ordinary two-way telephone channel, in the 0 to 4 kHz band

---

**Cable Internet access** utilizes cable television company's existing cable television infrastructure.  
Fiber optics connect the cable head end to neighborhood-level junctions, from which traditional coaxial cable is then used to reach individual houses and apartments.  
Both fiber and coaxial cable are employed in this system, it is often referred to as **hybrid fiber coax** (**HFC**).

Cable Internet access requires a cable modem.  
At the cable head end, the **cable modem termination system** (**CMTS**) serves a similar function as a DSLAM -- turning the analog signal sent from the cable modems in many downstream homes back into digital format.  
Cable modems divide the HFC network into two channels, a downstream and an upstream channel, with asynchronous access.

One important characteristic of cable Internet access is that it is a _shared broadcast medium_.  
If several users are simultaneously using the downstream channel, the actual rate at which each user receives its content will be significantly lower than the aggregate cable downstream rate.  
Because the upstream channel is also shared, a distributed multiple access protocol is needed to coordinate transmissions and avoid collisions.

---

**Fiber to the home** (**FTTH**) provides even higher speed.  
The FTTH concept is simple -- provide an optical fiber path from the CO directly to the home.

The simplest optical distribution network is _direct fiber_, with one fiber leaving the CO for each home.  
More commonly, each fiber leaving the central office is actually shared by many homes;
it is not until the fiber gets relatively close to the homes that it is split into individual customer-specific fibers.

Two competing optical-distribution network architectures that perform splitting:

* active optical networks (AONs) -- essentially switched Ethernet
* passive optical networks (PONs)

In a PON, each home has an **optical network terminator** (**ONT**), which is connected by dedicated optical fiber to a neighbourhood splitter.  
Splitter connects to an **optical line terminator** (**OLT**) in the telco's CO.  
In the PON architecture, all packets sent from OLT to the splitter are replicated at the splitter.

---

In locations where DSL, cable, and FTTH are not available, a satellite link can be used to connect a residence to the Internet at speeds of more than 1 Mbps.

Dial-up access over traditional phone lines is based on the same model as DSL -- a home modem connects over a phone line to a modem in the ISP.  
Dial-up access is excruciatingly slow at 56 kbps.

#### Access in the Enterprise (and the Home): Ethernet and WiFi

Wireless LAN access based on IEEE 802.11 technology, more colloquially known as WiFi, is now just about everywhere.

#### Wide-Area Wireless Access: 3G and LTE

3G -- third-generation wireless

LTE -- Long-term Evolution

### 1.2.2 Physical Media

**Bit**: propagates between transmitter/receiver pairs.

**Physical link**: what lies between transmitter & receiver.

For each transmitter-receiver pair, the bit is sent by propagating electromagnetic waves or optical pulses across a **physical medium**.

Physical media fall into two categories:

* **guided media**: the waves are guided along a solid medium         
* **unguided media**: the waves propagate in the atmosphere and in outer space

#### Twisted-Pair Copper Wire

Least expensive and most commonly used guided transmission medium.  
Two insulated copper wires

* Category 5: 100 Mbps, 1 Gbps Ethernet
* Category 6: 10 Gbps

A Wire pair constitutes as a single communication link.

**Unshielded twisted pair** (**UTP**) is commonly used for computer networks within a building.

#### Coaxial Cable

Two concentric copper conductors.  
Bidirectional.  
Can be used as a guided **shared medium**.

Multiple channels on cable; HFC.

#### Fiber Optics

Glass fiber carrying light pulses, each pulse represents a bit.  
High-speed operation, high-speed point-to-point transmission.

Low error rate:

* repeaters spaced far apart
* immune to electromagnetic noise

#### Terrestrial Radio Channels

Bidirectional.

Radio channels carry signals in the electromagnetic spectrum.  
No installation of physical wires, can penetrate walls, provide connectivity to a mobile user, and can potentially carry a signal for long distances.

Propagation environment effects:

* reflection
* obstruction by objects
* interference

#### Satellite Radio Channels

A communication satellite links two or more Earth-based microwave transmitter/receivers, known as ground stations.  
Satellite receives transmission on one frequency band, regenerates the signal using a repeated, and transmits the signal on another frequency.

Two types of satellites used:

* **geostationary satellites**
    * permanently remain above the same spot on Earth
    * end-to-end delay of 280 ms
* **low-earth orbiting** (**LEO**) **satellites**
    * rotate around Earth and may communicate with each other
    * many satellites required to continuously provide coverage to an area

## 1.3 The Network Core

### 1.3.1 Packet Switching

Long messages are broken into smaller chunks of data known as **packets**, of length $L$.  
Each packet travels through communication links and **packet switches** from source to destination.  
Packets are transmitted over each communication link at a rate equal to the _full_ transmission rate of the link, at rate $R$ bits/s.

$$\text{packet transmission delay} = \text{time needed to transmit L-bit packet into link} = \frac{L\text{ (bits)}}{R\text{(bits/sec)}}$$

Link transmission rate = link **capacity** = **link bandwidth**.

#### Store-and-Forward Transmission

The packet switch must receive the _entire_ packet before it can begin to transmit the first bit of the packet onto the outbound link.

End-to-end delay of $N$ links each of rate $R$ is

$$ d_{\text{end-to-end}} = N \frac{L}{R}$$

#### Queuing Delays and Packet Loss

If an arriving packet needs to be transmitted onto a link but the link is busy with the transmission of another packet, the arriving packet suffers **output buffer**'s **queuing delays**.  
Since output buffer is finite in space, **packet loss** will occur -- either the arriving packet or one of the already-queued packets will be dropped.

#### Forwarding Tables and Routing Protocols

When a source end system sends a packet to a destination end system, it includes the destination's IP address in the packet's header.  
The router examines a portion of the packet's destination address and forwards the packet to an adjacent router.  
Each router has a **forwarding table** that maps destination addresses (or portions of the destination addresses) to that router's outbound links.

The Internet has a number of special **routing protocols** that are used to automatically set the forwarding tables.

### 1.3.2 Circuit Switching

An _alternate_ approach to moving data through a network of links and switches.  
Commonly used in traditional telephone networks.

End-to-end resources along a path (buffers, link transmission rate) reserved for the duration of the communication session between the end systems.

Before the sender can send the information, the network must establish a connection between the sender and the receiver.  
This is a _bona fide_ connection for which the switches along the path maintain connection state for that connection.  
This connection is called a **circuit**.  
Circuit segment idle if not used, i.e. no sharing.

A constant transmission rate is reserved, such that the data transfers at the _guaranteed_ constant rate.

#### Multiplexing in Circuit-Switched Networks

A circuit in a link is implemented in two ways:

* **frequency-division multiplexing** (**FDM**)
    * a frequency band is dedicated for the duration of the connection
    * the width of the band is called **bandwidth**
* **time-division multiplexing** (**TDM**)
    * time is divided into frames of fixed duration, and each frame is divided into a fixed number of time slots
    * each circuit gets all of the bandwidth periodically during brief intervals of time

#### Packet Switching Vs. Circuit Switching

Pros of packet switching:

* offers better sharing of transmission capacity
* simpler, more efficient, and less costly to implement
* great for burst-y data
    * resource sharing

Cons of packet switching:

* not suitable for real-time services
* excessive congestion possible: packet delay and loss
    * protocols needed for reliable data transfer, congestion control

### 1.3.3 A Network of Networks

Review slides 33 +

## 1.4 Delay, Loss, and Throughput in Packet-Switched Networks

Packet _queue_ in router buffers, packet arrival rate to link (temporarily) exceeds output link capacity.  
PAckets queue, wait for turn.

### 1.4.1 Overview of Delay in Packet-Switched Networks

A packet can suffer from several types of delays at _each_ node along the path.  
The most important delays are:

* **nodal processing delay**, $d_{\text{proc}}$
    * examine the packet's header and determine output link
    * check for bit-level errors in the packet during receiving
    * typically on the order of microseconds or less
* **queuing delay**, $d_{\text{queue}}$
    * time waiting at output link for transmission
    * depends on congestion level of router, typically order of microseconds to milliseconds
* **transmission delay**, $d_{\text{trans}}$
    * amount of time to push all of the packet's bits into the link
    * $L$: _packet length_ in bits
    * $R$: _link bandwidth_ in bps
    * $\frac{L}{R}$
    * typically order of microseconds to milliseconds
* **propagation delay**, $d_{\text{prop}}$
    * time needed to _physically_ propagate from one node to another node
    * $d$: _length of physical link_
    * $s$: propagation speed (in the range from $2 \cdot 10^{8}$ m/s to $3 \cdot 10^{8}$ m/s)
    * $\frac{d}{s}$

Combined, they accumulate to **total nodal delay**, $d_{\text{nodal}}$,
$$d_{\text{nodal}} = d_{\text{proc}} + d_{\text{queue}} + d_{\text{trans}} + d_{\text{prop}}$$

### 1.4.2 Queuing Delay and Packet Loss

$R$: link bandwidth (bps)  
$L$: packet length (bits)  
$a$: _average_ packet arrival rate

The average rate at which bits arrive at the queue is $La$ bits/sec.  
The ratio $\frac{La}{R}$, called **traffic intensity**, plays an important role in estimating the extent of the queuing delay.

If $\frac{La}{R} > 1$, the queuing delay will approach infinity!  
More bits arrive into the queue than the bits can be transmitted from the queue.

If $\frac{La}{R} \sim{>} 1$, average queuing delay large.

If $\frac{La}{R} \sim 0$, average queuing delay small.

#### Packet Loss

Queue capacity is finite, packet delays do not approach infinity as the traffic intensity approaches 1.  
Instead, a router will **drop** the packet, resulting in a **packet loss**.

The fraction of lost packets increases as the traffic intensity increases.

Lost packet _may_ be retransmitted by previous node, by source end system, or not at all.

### 1.4.3 End-to-End Delay

`traceroute` program: provides delay measurement from source to router along path towards destination.  
For all $i$:

* sends three packets that will reach router $i$ on path towards destination
* router $i$ will return packets to sender
* sender times the interval between transmission and reply

### 1.4.4 Throughput in Computer Networks

**throughput**: rate (bits/time unit) at which bits transferred between sender/receiver

* **instantaneous**: rate at a given point in time
* **average**: rate over longer period of time

The node with the lowest throughput is the **bottleneck link**.

## 1.5 Protocol Layers and Their Service Models

The Internet is a complicated system with many pieces: hosts, routers, links of various media, applications, protocols, hardware, and software.

### 1.5.1 Layered Architecture

A layered architecture allows us to discuss a well-defined, specific part of a large and complex system.  
Modularization eases maintenance, updating of system (change of implementation of layer's service transparent to rest of system).

The ability to change the implementation of a service without affecting other components of the system is another important advantage of layering.

Each layer implements a service (provide to the layer above) via its own internal-layer actions, while relying on services provided by layer below.

#### Protocol Layering

Each protocol belongs to one of the layers.  
Interested in the **services** that a layer offers to the layer above -- the so-called **service model** of a layer.

A protocol layer can be implemented in software, in hardware, or both.

Application-layer protocols are almost always implemented in software in the end systems; so are transport-layer protocols.

When taken together, the protocols of the various layers are called the **protocol stack**.

#### Internet Protocol Stack

* **Application**
    * supporting network applications
        * FTP, SMTP, HTTP
    * distributed over multiple hosts
        * packet of information exchanged amongst hosts is a **message**
* **Transport**
    * process data transfer
        * TCP, UDP
    * a transport-layer packet is a **segment**
* **Network**
    * routing of **datagrams** (network-layer packets) from source to destination
        * IP, routing protocols
* **Link**
    * data transfer between neighbouring network elements
        * Ethernet, 802.11 (WiFi), PPP
    * link-layer packets is a **frame**
* **Physical**
    * bits "on the wire"

#### The ISO/OSI Model

* **Application**
* **Presentation**
    * allow applications to interpret meaning of data
        * e.g. encryption, compression, machine-specific conventions
* **Session**
    * synchronization, checkpointing, recovery of data exchange
* **Transport**
* **Network**
* **Link**
* **Physical**

### 1.5.2 Encapsulation

![encapsulation](Assets/network-1.24.png)

Routers only implements Network, Link, and Physical layers.  
Link-layer switches only implement Link and Physical layers; unable to recognize IP addresses.  
Host implements all 5 layers.

At each layer, a packet has two types of fields: header fields and a **payload field**.  
The payload is typically a packet form the layer above.

The process of encapsulation can be more complex than that described above.  
For example, a large message may be divided into multiple transport-layer segments (which might themselves each be divided into multiple network-layer datagrams).  
At the receiving end, such a segment must then be reconstructed from its constituent datagrams.

## 1.6 Networks Under Attack

Fields in network security:

* how bad guys can attack computer networks
* how we can defend networks against attacks
* how to design architectures that are immune to attacks

Internet not originally designed with (much) security in mind.  
Original vision,
> a group of mutually trusting users attached to a transparent network

Internet protocol designers playing "catch-up".  
Security considerations in _all_ layers.

#### The bad guys can put malware into your host via the Internet

Malware can get in the host from:

* **virus**: self-replicating infection by receiving/executing object (e.g., email attachment)
* **worm**: self-replicating infection by passively receiving object that gets itself executed

**Spyware** can record keystrokes, websites visited, upload info to collection site, etc.

Infected hosts can be enrolled in **botnet**, used for spam and **distributed** DoS (**DDoS**) attacks.

#### The bad guys can attack servers and network infrastructure

**Denial-of-service** (**DoS**) **attacks**: renders a network, host, or other piece of infrastructure unusable by legitimate traffic by overwhelming resource with bogus traffic.

Steps of attack:

1. Select target
2. Break into hosts around the network
3. Send packets to target from compromised hosts

In a DDoS attack, the attacker controls multiple sources and has each source blast traffic at the target.

Most DoS attacks fall into one of three categories:

* _Vulnerability attack_
    * exploits vulnerable applications
* _Bandwidth flooding_
    * prevent legitimate packets from reaching the server
* _Connection flooding_
    * establishes a large number of half-open or fully open TCP connections, preventing new legitimate connections

#### The bad guys can sniff packets

**packet sniffer**: a passive receiver that records a copy of every packet passing transmitted in a network.  
Sniffed packets contain sensitive information!

Some of the best defences against packet sniffing involves cryptography.

#### The bag guys can masquerade as someone you trust

The ability to inject packets into the Internet with a false source address is **IP spoofing**, and is but on of many ways in which one user can masquerade as another user.

To solve this problem, need _end-point authentication_.  
A mechanism to determine with certainty if a message from originates from where it should be.

## 1.7 History of Computer Networking and the Internet

### 1.7.1  The Development of Packet Switching: 1961 - 1972

* 1961: Kleinrock -- queuing theory shows effectiveness of packet-switching
* 1964: Baran -- packeting-switching in military nets
* 1967: ARPAnet conceived by Advanced Research Projects Agency
* 1969: first ARPAnet node operational
* 1972
    * ARPAnet public demo
    * NCP (Network Control Protocol) first host-host protocol
    * first email program
    * ARPAnet has 15 nodes

### 1.7.2 Proprietary Networks and Internetworking: 1972 - 1980

* 1970: ALOHAnet satellite network in Hawaii
* 1974: Cerf and Kahn - architecture for interconnecting networks
    * minimalism, autonomy -- no internal changes required to interconnect networks
    * best effort service model
    * stateless routers
    * decentralized control
* 1976: Ethernet at Xerox PARC
* late 70s
    * proprietary architectures: DECnet, SNA, XNA
    * switching fixed length packets (ATM precursor)
* 1979: ARPAnet has 200 nodes

Cerf and Kahn's internetworking principles define today's Internet architecture.

### 1.7.3 A Proliferation of Networks: 1980 - 1990

* 1982: SMTP email protocol defined
* 1983
    * deployment of TCP/IP
    * DNS defined for name-to-IP-address translation
* 1985: FTP protocol defined
* 1988: TCP congestion control

New national networks: CSnet, BITnet, NSFnet, Minitel.

100,000 hosts connected to confederation of networks.

### 1.7.4 The Internet Explosion: 1990s

* early 1990s: ARPAnet decommissioned
* 1991: NSF lifts restrictions on commercial use of NSFnet (decommissioned, 1995)
* early 1990s: Web
    * hypertext
    * HTML, HTTP
    * 1994: Mosaic, later Netscape
    * late 1990s: commercialization of the Web
* late 1990s to 2000s
    * more killer apps: instant messaging, P2P file sharing
    * network security to forefront
    * estimated 50 million hosts, 100+ million users
    * backbone links running at Gbps

### 1.7.5 The New Millennium

* 2005 to present
    * ~5B devices attached to Internet (2016)
        * includes smartphones and tablets
    * aggressive deployment of broadband access
    * increasing ubiquity of high-speed wireless access
    * emergence of online social network
    * service providers (Google, Microsoft) create their own networks
        * bypass Internet, providing "instantaneous" access to search, video content, email, etc.
    * e-commerce, universities, enterprises running their services in "cloud" (e.g. Amazon EC2)

# 2. Application Layer

## 2.1 Principles of Network Applications

Core of network app. dev. is writing programs that run on _different_ end systems that communicate over the network.

Network-core devices (e.g. routers or link-layer switches) _do not_ run user applications due to lower level function.

### 2.1.1 Network Application Architectures

Network architecture is fixed and provides a specific set of services to applications.  
The **application architecture** is designed by the application developer and dictates how the application is structured over the various end systems.  
Two predominant architectural paradigms:

* Client-server
    * server
        * always-on
        * permanent IP address
        * data centers for scaling
    * client
        * communicate with server
        * may be intermittently connected
        * may have dynamic IP addresses
        * do not communicate directly with each other
* Peer-to-peer (P2P)
    * no always-on server
    * arbitrary end systems directly communicate
    * peers request service from other peers, provide service in return to other peers
        * self-scalability -- new peers bring new server capacity, as well as new service demands
    * peers are intermittently connected and change IP addresses
        * complex management
    * three major challenges
        1. ISP Friendly. Residential ISPs bandwidth has more downstream than upstream, thus distributing content can put stress on the ISPs
        2. Security
        3. Incentives. Success depends on convincing users to volunteer bandwidth, storage, and computation resources to the applications.

### 2.1.2 Process Communicating

**Process**: program running within a host

* within same host, two processes communicate using **inter-process communication** (defined by OS)
* processes in different hosts communicate by exchanging **messages**

#### Client and Server Processes

**Client process**: process that initiates communication.

**Server process**: process that waits to be contacted.

Aside: applications with P2P architectures have client processes & server processes.

#### The Interface Between the Process and the Computer Network

Process sends/receive messages to/from its **socket**, a software interface.

![socket diagram](Assets/network-2.3.png)

#### Addressing Processes

Process must have **identifier** to receive messages.  
Host device has unique 32-bit IP address.  
_Identifier_ includes both **IP address** and **port numbers** associated with the process on host.

Example port numbers:

* HTTP server: 80
* Mail server (via SMTP): 25

### 2.1.3 Transport Services Available to Applications

A socket is the interface between the application process and the transport-layer protocol.  
The app. pushes messages through the socket;
the transport-layer protocol has the responsibility of getting the messages tot he socket of the receiving process.

What are the services that a transport-layer protocol can offer to applications?

#### Reliable Data Transfer

i.e. Data integrity

If a protocol provides a guaranteed data delivery service (i.e. data is delivered correctly and completely), it is said to provide **reliable data transfer**.  
Some apps _require_ 100% reliable data transfer.

Another potential service is process-to-process reliable data transfer.

If reliable data transfer not provided, this may be acceptable for **loss-tolerant applications*.

#### Throughput

Apps with throughput requirements are said to be **bandwidth-sensitive applications**.  
e.g., multimedia applications.

**Elastic applications** can make use of as much, or as little, throughput as happens to be available; although more throughput is always better.  
e.g., email, file transfer.

#### Timing

Real-time applications require tight timing constraints on data delivery in order to be effective.  
e.g., virtual environments, interactive games.

For non-real-time applications, no tight constraints on end-to-end delays (but lower delay is better).

#### Security

Transfer protocol can encrypt all data transmitted by the sending process, and in the receiving host, the protocol can decrypt the data before delivering them to the receiving process.  
i.e. provides confidentiality.

A transport protocol can also provide services such as data integrity and end-point authentication.

### 2.1.4 Transport Services Provided by the Internet

#### TCP Services

Provides:

* Connection-oriented
    * setup required between client and server processes before sending application messages
    * creates a **TCP connection** between sockets of the two processes
    * connection is full-duplex (messages can be sent from both sides at the same time)
* Reliable data transfer
    * Data will not be missing or duplicate bytes during transfer
* Congestion control
    * throttles a sending process when the network overloaded
* Flow control
    * sender will not overwhelm receiver

Not provided:

* Timing
* Minimum throughput guarantee
* Security (albeit there is **Secure Socket Layer**, **SSL**)

#### UDP Services

* No-frills, lightweight, provides minimal services
* _Unreliable_ data transfer

Not provided:

* Flow control
* Congestion control
* Timing
* Throughput guarantee
* Security
* Connection setup

#### Securing TCP

* TCP & UCP
    * no encryption, e.g., clear text passwords sent into socket traverse Internet in clear text
* SSL
    * provides encrypted TCP connection
    * data integrity
    * end-point authentication
* SSL is at application layer
    * apps use SSL libraries, that "talk" to TCP
* SSL socket API
    * encrypts clear text

### 2.1.5 Application-Layer Protocols

An **application-layer protocol** defines:

* Type of messages exchanged
    * e.g., request, response
* Message syntax
    * what fields in messages & how fields are delineated
* Message semantics
    * meaning of information in fields
* _Rules_ for when and how processes send & response to messages

Open protocols:

* defined in RFCs
* allows for interoperability
* e.g., HTTP, SMTP

Proprietary protocols:

* e.g., Skype

## 2.2 The Web and HTTP

Web page consists of **objects**, e.g., HTML file, images, Java applet, etc.  
Each page consists of **base HTML-file** which includes _several referenced objects_.  
Each object addressable by **URL**.

### 2.2.1 Overview of HTTP

**HTTP** - **HyperText Transfer Protocol**  
Web's application layer protocol.

Implements client/server model:

* Client
    * requests, receives (using HTTP) and displays Web objects
* Server
    * sends (using HTTP) objects in response to requests

Uses _TCP_:

* client initiates TCP connection (creates socket) to server, default port 80
* server accepts TCP connection from client
* HTTP messages exchanged between browser and web server
* TCP connection closed

HTTP is a **stateless** protocol.  
Server maintains no info about past client requests.

Protocols that maintain "state" are complex!

* Past history (state) must be maintained
* If server/client crashes, their views of "state" may be inconsistent, must be reconciled

### 2.2.2 Non-Persistent and Persistent Connections

HTTP uses persistent connections by default.

#### HTTP with Non-Persistent Connections

Each TCP connection is closed after the server sends the object (and only _one_ object).  
For each connection, TCP buffers must be allocated and TCP variables must be kept in both the client and server; possibly significant burden on server.

Modern browsers can open multiple (5 to 10) TCP connections in _parallel_, and each connection handles one request-response transaction.

**Round-trip time** (**RTT**): time it takes for a small packet to travel from client to server and then back to client.  
This time includes: packet-propagation delays, packet queuing delays, and packet-processing delays.

"Three-way handshake":

1. One RTT to initiate TCP connection
2. One RTT for HTTP request and first few bytes of HTTP response to return
3. Message transmission time

$$\text{Non-persistent HTTP response time} = 2 \text{RTT} + \text{message transmission time}$$

#### HTTP with Persistent Connections

Server leaves TCP connection open after sending response.  
_Subsequent_ requests and responses between the same client and server sent over the _same_ open connection.

**Pipelining**: requests for objects made back-to-back, without waiting for replies to pending requests.

TCP connection closed after a configurable timeout interval.

Response time is as little as one RTT and transmission per object.

### 2.2.3 HTTP Message Format

Two types of messages:

* request
* response

#### HTTP Request Message

Written in ASCII.  
First line is **request line**; three fields: method, URL, and HTTP version.  
Subsequent lines are **header lines**.  
Some requests are followed by an **entity body**.

Method types:

* `GET`
    * requests an object specified in the URL field
* `POST`
    * request with user data input (from form fields) in the entity body
* `HEAD`
    * requests without the response object
* `PUT`
    * uploads an object in entity body to path specified in URL field
* `DELETE`
    * deletes an object specified in the URL field

![general format of HTTP request](Assets/network-2.8.png)

Example:

```
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: fr
```

#### HTTP Response Message

Written in ASCII.  
First line is **status line**; three fields: protocol version, status code, and corresponding status message.  
Rest are the same as request message.

Common status codes and messages:

* `200 OK`: request succeeded and information is returned in the response
* `301 Moved Permanently`: requested object has been permanently moved
    the new URL is specified in `Location` header of the response message.
* `400 Bad Request`: generic error code indicating that the request cannot be understood by the server
* `404 Not Found`: requested document does not exist on this server
* `505 HTTP Version Not Supported`: requested HTTP protocol version is not supported by the server

![general format of HTTP response](Assets/network-2.9.png)

Example:

```
HTTP/1.1 200 OK
Connection: close
Date: Tue, 09 Aug 2011 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html

(data data data data data ...)
```

### HTTP/2

RFC 7540, 7541

Introduces a new, non-backwards-compatible **binary framing layer**.

Decreasing latency to improve page load speed by considering:

* data compression of HTTP headers
* multiplexing multiple async requests/responses (**streams and frames**) over a single TCP connection
* HTTP/2 Server PUSH
* stream prioritization

Fixes **Head-Of-Line blocking** in HTTP/1.x.

#### Binary Framing

![http/2 binary framing](Assets/network-binary-framing.png)

#### Headers Compression

![http/2 headers compression](Assets/network-headers-compression.png)


#### Multiplexing Requests/Responses

![http/2 multiplexing](Assets/network-multiplexing-requests-responses.png)

Break down an HTTP message into independent frames, interleave them, and then reassemble them on the other end

* Interleave multiple requests in parallel without blocking on any one
* Interleave multiple responses in parallel without blocking on any one
* Single connection to deliver multiple requests and responses in parallel
* Resolve the header-of-line blocking problem in HTTP/1.x and eliminates the need for multiple connections to enable parallel processing and delivery of request and responses

_Applications faster, simpler, and cheaper to deploy_.

#### Server PUSH

![http/2 server PUSH](Assets/network-server-PUSH.png)

Pushed resources can be cached by the client, reused across different pages, multiplexed alongside other resources, prioritized by the server, declined by the client.

Server push streams are initiated via `PUSH_PROMISE` frames (HTTP headers of the promised resource) which signal the server's intent to push the described resources.

Once the client receives a `PUSH_PROMISE` frame it can decline the stream (`RST_STREAM` frame) if it wants to (e.g., the resource is already in cache).

### 2.2.4 User-Server Interaction: Cookies

HTTP server is stateless, but it is often desirable to identify users;
to restrict user access or serve content as a function of the user identity.  
Thus, cookies (RFC 6265) are used.

Four components of a cookie:

1. cookie header line of HTTP response message
2. cookie header line in next HTTP request message
3. cookie file kept on user's host, managed by user's browser
4. back-end database at server

What cookies can be used for:

* authorization
* shopping carts
* recommendations
* user session state

How to keep "state":

* protocol endpoints: maintain state at sender/receiver over multiple transactions
* cookies: HTTP messages carry state

_Aside_, cookies and privacy:

* cookies permit sites to learn about you; user supplies personal info to sites

### 2.2.5 Web Caching

A **Web cache** -- also called a **proxy server** -- is a network entity that satisfies HTTP requests on the behalf of an origin Web server.

* user sets browser: Web accesses via cache.
* browser sends all HTTP requests to cache
    * object in cache: cache returns object
    * else cache request object from origin server, then returns object to client

Cache acts as _both_ client and server!  
Server for original requesting client.  
Client to origin server.

Cache typically installed by ISP.

Reasons for caching:

* reduces response time for client request
* reduce traffic on an institution's access link
* Internet dense with caches: enables "poor" content providers to effectively deliver content
    * **Content Distribution Networks** (**CDNs**)

### 2.2.6 The Conditional GET

Goal: don't send object if cache has up-to-date cached version

* no object transmission delay
* lower link utilization

Cache: specifies date cached copy in HTTP request with `If-modified-since: <date>`.  
Server: response contains no object if cached copy is up-to-date: `HTTP/1.0 304 Not Modified`.

## 2.3 File Transfer: FTP

Not in slides?

### 2.3.1 FTP Commands and Replies

## 2.4 Electronic Mail in the Internet

Three major components:

* User agents
    * a.k.a. "mail reader"
    * composing, editing, reading mail messages
    * outgoing, incoming messages stored on server
* Mail servers
    * **mailbox**
        * contains incoming messages for users
    * **message queue**
        * queue of outgoing (to be sent) mail messages
* SMTP
    * protocol between mail servers to send email messages

### 2.4.1 SMTP

Uses TCP to reliably transfer email messages from client to server; port 25.

Direct transfer: sending server to receiving server.

Three phases of transfer:

1. Handshaking
2. Transfer of messages
3. Closure

Command/Response interaction (like HTTP):

* **commands**: ASCII text
* **response**: status code and phrase

Messages must be in 7-bit ASCII.

SMTP uses _persistent_ connections.  
SMTP server uses `CRLF.CRLF` to determine end of message.

### 2.4.2 Comparison with HTTP

HTTP: pull (mostly) protocol  
SMTP: push protocol

Both have ASCII command/response interactions, status codes.

HTTP: each object encapsulated in its own response message.  
SMTP: multiple objects sent in multi-part message (MIME RFC 1341).

### 2.4.3 Mail Message Format

SMTP: protocol for exchanging email messages.  
RFC 822: standard for text message format,

* Header lines
    * e.g., To:, From:, Subject:
    * different from SMTP MAIL FROM, RCPT TO commands
* Body
    * the "message"
    * ASCII characters only

RFC 1341: Multipurpose Internet Mail Extensions (MIME)

### 2.4.4 Mail Access Protocols

SMTP: delivery/storage to receiver's server.

Mail access protocol: retrieval from server

* **POP**: Post Office Protocol (RFC 1939)
    * authorization, download
* **IMAP**: Internet Mail Access Protocol (RFC 1730)
    * more features, including manipulation of stored messages on server
* **HTTP**: gmail, Hotmail, etc.

#### POP3

Authorization phase

* client commands
    * `user`: declare username
    * `pass`: password
* server reponses
    * `+OK`
    * `-ERR`

Transaction phase, client:

* `list`: list message numbers
* `retr`: retrieve message by number
* `dele`: delete
* `quit`

POP3 "download-and-keep": copies of messages on different clients.  
POP3 is _stateless_ across sessions.

#### IMAP

Keeps all messages in one place: at server.

Allows user to organize messages in folders.

Keep user state across sessions:

* names of folders and mappings between message IDs and folder name

## 2.5 DNS - The Internet's Directory Service

**Hosts** have 

* **IP address** (32-bit) -- used for addressing datagrams
* **name** -- used by humans

### 2.5.1 Services Provided by DNS

**Domain Name System** is a,

1. _distributed database_
    * implemented in hierarchy of many **DNS servers** (_name servers_)
2. application-layer protocol
    * hosts, name servers to communicate to _resolve_ names (address/name translation)
    * note: core Internet function, implemented as application-layer protocol
    * complexity at network's "edge"

DNS services:

* hostname to IP address translation
* host aliasing
    * canonical, alias names
* mail server aliasing
* load distribution
    * replicated Web servers
        * many IP addresses correspond to one name

### 2.5.2 Overview of How DNS Works

Question: why not centralize DNS?

* single point of failure
    * if the DNS server crashes, so does the entire Internet
* traffic volume
    * one server needs to handle all DNS queries
* distant centralized database
    * cannot be close to all querying clients, thus delays
* maintenance
    * need to keep records for all Internet hosts, thus needs to be frequently updated to account for new hosts

Answer: does _not_ scale!

#### A Distributed, Hierarchical Database

![DNS hierarchy of servers](Assets/network-2.19.png)

Three classes of DNS servers in the hierarchy:

* **Root DNS servers**
    * contacted by local name server that can not resolve name
    * root name server:
        * contacts authoritative name server if name mapping not known
        * gets mapping
        * returns mapping to local name server
* **Top-level domain (TLD) servers**
    * responsible for top-level domains (e.g., com, org)
    * com top-level domain maintained by Verisign Global Registry Services
    * edu top-level domain maintained by Educause
* **Authoritative DNS servers**
    * each organization with publicly accessibly hosts on the Internet must provide publicly accessible DNS records that map the names of those hosts to IP addresses
    * can be maintained by organization or server provider

**Local DNS server**: not strictly belong to the hierarchy but still central to the DNS architecture.  
Each ISP has one, also called "default name server".  
When host makes DNS query, query is sent to its local DNS server:

* has local cache of recent name-to-address translation pairs (but may be out of date)
* acts as proxy, forwards query into hierarchy

Example interaction of various DNS servers
![interaction of various DNS servers](Assets/network-2.21.png)

**Iterated query**: contacted server replies with name of server to contact.

**Recursive query**: puts burden of name resolution on contacted name server.

![recursive queries in DNS](Assets/network-2.22.png)

#### DNS Caching

Once (any) name server learns mapping, it _caches_ mapping

* cache entries timeout after some time (TTL; Time To Live)
* TLD servers typically cached in local name servers
    * thus root name servers not often visited

Cached entries may be out-of-date!

* if name host changes IP address, may not be known Internet-wide until all TTLs expire

Update/Notify mechanism proposed IETF standard, RFC 2136.

### 2.5.3 DNS Records and Messages

DNS: distributed database storing **resource records** (**RRs**).  
Format:
$$\left(\texttt{Name}, \texttt{Value}, \texttt{Type}, \texttt{TTL}\right)$$

TTL is the time to live of the RR;
it determines when a resource should be removed from a cache.

$\texttt{Type}$ determines meaning of $\texttt{Name}$ and $\texttt{Value}$:

* $\texttt{Type = A}$
    * $\texttt{Name}$ is hostname
    * $\texttt{Value}$ is IP address
* $\texttt{Type = NS}$
    * $\texttt{Name}$ is domain
    * $\texttt{Value}$ is hostname of authoritative name server for this domain
* $\texttt{Type = CNAME}$
    * $\texttt{Name}$ is alias name for some "canonical" (the real) name
    * $\texttt{Value}$ is canonical name
* $\texttt{Type = MX}$
    * $\texttt{Value}$ is name of mailserver associated with $\texttt{Name}$

#### DNS Messages

**Query** and **reply** messages, both with same **message format**.

![DNS message format](Assets/network-2.23.png)

Message header

* **identification**
    * 16 bit number for query, reply to query uses same number
* **flags**
    * query or reply
    * recursion desired
    * recursion available
    * reply is authoritative

#### Inserting Records into the DNS Database

1. Register domain at a **registrar**
    * registrar: a commercial entity that verifies the uniqueness of the domain name
2. Provide names and IP addresses of primary and secondary authoritative DNS servers
    * for each, server, a Type NS and Type A record are entered into TLD
    * e.g. `(networkutopia.com, dns1.networkutopia.com, NS)` and `(dns1.networkutopia.com, 212.212.212.1, A)`

#### Attacking DNS

DDoS attacks

* bombard root server with traffic
    * not successful to date
    * traffic filtering
    * local DNS servers cache IPs of TLD servers, allowing root server bypass
* bombard TLD servers
    * potentially more dangerous

Redirect attacks

* man-in-middle
    * intercept queries
* DNS poisoning
    * send bogus replies to DNS servers, which caches

Exploit DNS for DDoS

* send queries with spoofed source address: target IP
* requires amplification

## 2.6 P2P Applications

No always-on server.  
Arbitrary hosts directly communicate.  
Peers are intermittently connected and change IP addresses.

### 2.6.1. P2P File Distribution

#### File Distribution Time: Client-Server

Server transmission: must sequentially upload $N$ file copies:

* time to send one copy: $\frac{F}{u_{s}}$
* time to send $N$ copies: $\frac{NF}{u_{s}}$

Client: each client must download file copy:

* $d_{\min}$ is min. client download rate
* min. client download time: $\frac{F}{d_{\min}}$

Time to distribute $F$ to $N$ clients using client-server approach:
$$D_{\text{C-S}} \geq \max\left\{\frac{NF}{u_{s}}, \frac{F}{d_{\min}}\right\}$$

#### File Distribution Time: P2P

Server transmission: must upload at least one copy:

* time to send one copy: $\frac{F}{u_{s}}$

Client: each client must download file copy:

* min. client download time: $\frac{F}{d_{\min}}$

Clients: as aggregate must download $NF$ bits

* max upload rate (limiting max download rate) is $u_{s} + \sum_{i}^{N}u_{i}$

Time to distribute $F$ to $N$ clients using P2P approach:
$$D_{\text{P2P}} \geq \max\left\{\frac{F}{u_{s}}, \frac{F}{d_{\min}}, \frac{NF}{u_{s} + \sum_{i}^{N}u_{i}}\right\}$$

#### P2P File Distribution: BitTorrent

File divided into 256 Kb chunks.  
Peers in torrent send/receive file chunks.

**Tracker**: tracks peer participating in swarm.

**Swarm**: group of peers exchanging chunks of a file.

---

* Peer joining swarm
    * has no chunks, but will accumulate them over time from other peers
    * registers with tracker to get list of peers, connects to subset of peers ("neighbors")
* While downloading, peer uploads chunks to other peers
* Peer may change peers with whom it exchange chunks
* **Churn**: peers may come and go
* Once peer has entire file, it may leave or remain in torrent

Requesting chunks:

* at any given time, different peers have different subsets of file chunks
* periodically, peer asks each peer for list of chunks they have
* peer requests missing chunks from peers, rarest first

Sending chunks: tit-for-tat

* peer sends chunks to 4 peers that are sending chunks at highest rate
    * other peers are choked by peer
    * re-evaluate top 4 every 10 seconds
* every 30 seconds, randomly select another peer, starts sending chunks
    * "optimistically unchoke" this peer
    * newly chosen peer may join top 4

### 2.6.2 Distributed Hash Tables (DHTs)

## 2.7 Socket Programming: Creating Network Applications

### 2.7.1 Socket Programming with UDP

### 2.7.2 Socket Programming with TCP