## 17.9 Summary

A **graph** consists of a set of items called **nodes** and a set of pairs of nodes,
called **edges**.
Each edge connects a node with a different node and can be directed or undirected.
A **directed graph** (or **digraph**) only has directed edges;
an **undirected graph** only has undirected edges.
Between each pair of nodes, there's at most one undirected edge or
two directed edges in opposing directions.

A relation between pairs of items is a **binary relation**. A binary relation is
**symmetric** if it's reciprocal: if A is related to B, then B is related to A.

Since edges connect pairs of nodes, graphs can model any set of entities
and a binary relation over them. This includes many kinds of networks.
We can use an undirected graph only if the relation is symmetric.

### 17.9.1 Terminology

Two nodes are **adjacent** if connected by an edge. If there's an edge from A to B,
then A is an **in-neighbour** of B and B is an **out-neighbour** of A.
An undirected edge is both from A to B and from B to A.

A node's **neighbours** are its adjacent nodes.
The **degree** of a node is the number of edges attached to it.

The **in- and out-degree** of a node are the number of its in- and out-neighbours,
respectively. In an undirected graph, the in-neighbours and the out-neighbours
of a node are the same, so its in- and out-degrees are the same as the degree.

Every undirected graph can be represented as a digraph with twice the edges,
by replacing each undirected edge with two opposing directed edges.
Algorithms based on in- and out-neighbours and in- and out-degrees can thus be applied to
directed *and* undirected graphs.

A **path** of length *k* from node A to node B is a sequence of *k* + 1 nodes,
starting with A and ending with B.
There must be an edge from each node in the sequence to the next one.
In addition, all nodes and edges in a path must be different.
Node B is **reachable** from node A if there's a path from A to B.
A **shortest path** has the fewest possible edges among all paths from A to B.

An undirected graph is **connected** if all pairs of nodes are mutually reachable;
otherwise it's **disconnected**. If there's a node S from which all other nodes are
reachable, then all nodes are mutually reachable via S: the graph is connected.
A digraph is connected or disconnected if we obtain a connected or disconnected
undirected graph when ignoring all edge directions.

A **cycle** is a path with the exception that the first and last node are the same.
A graph with at least one cycle is **cyclic**; otherwise it's **acyclic**.
The acronym **DAG** stands for directed acyclic graph.
A **tree** is a connected acyclic undirected graph.
In a tree, there's a single path from one node to any other node.
A tree with *n* nodes has *n* – 1 edges.

The **empty graph** has neither nodes nor edges.
A **null graph** has nodes but no edges.
A **path graph** is a non-empty undirected graph that can be laid out in a line,
with each node except the last connected to the next one. If we take
a path graph with at least three nodes and connect the last node to the first,
we obtain a cycle graph.
A **complete graph** has all possible edges.
A **random graph** has each possible edge with the same probability.

A graph is **dense** if it has many of the possible edges;
it's **sparse** if it has few of the possible edges. There's no precise
definition to classify each graph as sparse, dense or neither.
Complete graphs are the densest graphs; null graphs are the sparsest graphs.

A **subgraph** of a graph G consists of only a subset of the nodes and edges of G.

### 17.9.2 Data structures

An **edge list** representation consists of one collection containing the nodes and
one collection containing the edges. Several data structures are possible,
depending on the type of the collections, e.g. sequences or sets, and how they are stored,
e.g. as arrays, linked lists, lookup tables or hash tables.

An **adjacency matrix** representation consists of an *n* by *n* matrix that
represents all possible edges and a lookup table that maps the indices to nodes.
The cell in the row for node A and in the column for node B is a Boolean
indicating if there's an edge from A to B.
Adjacency matrices waste memory for sparse graphs and for undirected graphs.

An **adjacency list** representation consists of a map where the keys are nodes
and the values are their out-neighbours (and possibly their in-neighbours too).
Each value is the adjacency list for the corresponding node.
Several data structures are possible to store the map and each adjacency list.
For example, the map can be implemented with a lookup table, hash table or BST,
and an adjacency list can be represented as a sequence or a set.

Of all these possibilities, the most efficient for most graphs is
an adjacency list representation in which the map and the adjacency lists are
stored in hash tables. This requires the representation of nodes to be hashable.

In M269, the undirected and directed graph ADTs have the following operations
with the listed worst-case complexities for undirected and directed graphs.
When referring to graphs,
we use *n* for the number of nodes and *e* for the number of edges.

Operation | Written as | Undirected | Directed
:-|:-|:-|:-
new | let G be an empty graph | Θ(1)| Θ(1)
has node | A in G | Θ(1)| Θ(1)
add node | add A to G | Θ(1)| Θ(1)
remove node | remove A from G | Θ(*n*) | Θ(*n*)
has edge | (A, B) in G | Θ(1)| Θ(1)
add edge | add (A, B) to G | Θ(1)| Θ(1)
remove edge | remove (A, B) from G | Θ(1)| Θ(1)
nodes | nodes of G | Θ(*n*) | Θ(*n*)
edges | edges of G | Θ(*e*) | Θ(*e*)
in-neighbours | in-neighbours of A in G | Θ(degree(A)) | Θ(*n*)
out-neighbours | out-neighbours of A in G | Θ(degree(A)) | Θ(out-degree(A))
neighbours | neighbours of A in G | Θ(degree(A)) | Θ(*n*)
in-degree | in-degree of A in G | Θ(1) | Θ(*n*)
out-degree | out-degree of A in G | Θ(1) | Θ(1)
degree | degree of A in G | Θ(1) | Θ(*n*)

### 17.9.3 Traversals

A **traversal** of a graph starts from a given node and follows the outgoing edges
to visit every node reachable from the start.
The algorithm keeps a collection of yet-unprocessed edges.
While the collection isn't empty, it removes one edge from the collection.
If it leads to a yet-unvisited node, the edge is followed, the node is visited,
and its outgoing edges are added to the unprocessed collection.

If the collection is a set, the traversal visits nodes in no particular order.

If the collection is a FIFO sequence (queue), the nodes are visited in
breadth-first order: first the out-neighbours of the start node, then
their out-neighbours, and so on.
This finds the shortest paths from the start node to each other reachable node.

If the collection is a LIFO sequence (stack), the nodes are visited in
depth-first order: first one out-neighbour of the start node, then
one of its out-neighbours, etc.

All three variants traverse an acyclic subgraph of the input graph.
They all take constant time in the best case and Θ(*n* + *e*) in the worst case.
They can all be used to decide if a non-empty undirected graph is connected
and, in particular, a tree.

### 17.9.4 Python

Class `Hashable` in module `typing` can be used to annotate function parameters
as being hashable objects.

Function `random` in module `random` returns each time it's called a random
floating-point number from 0 (inclusive) to 1 (exclusive).

Replacing `self.` with `super().` allows a subclass to
call methods of its superclass.

⟵ [Previous section](17_8_bfs_dfs.ipynb) | [Up](17-introduction.ipynb) | [Next section](../18_Greed/18-introduction.ipynb) ⟶