In [1]:
# automatically reload dependant notebooks
%load_ext autoreload
%autoreload 2
import import_ipynb

# Graph Representation

Mathematically, a graph $G = (V, E)$ is a tuple of the vertex set $V$ and the edge set $E$. So, we begin by defining the `Vert` and the `Edge` types and their respective sets. There are two common representations of graphs: adjacency matrix and adjacency list. The adjacency matrix representation is convenient for dense graphs. But most graph algorithms employ relatively sparce graphs. So, we shall use the adjacency list representation, here. See §20.1 *Representations of graphs* p.549.

Do note that because this notebook defines the fundamental graph representations, we factored out the tests, whcih perform printing and drawing, into the [`graphtest.ipynb`](./graphtest.ipynb) notebook.

## *vertex*

First, we define the `Vert` type. Because graph algorithms label the vertices and edges, we derive `Vert` from the `Tagged` base type, which is defined in [`util.ipynb`](./util.ipynb) notebook. And these algorithms store some state data in the vertex to keep track of progress as it explores the input graph. The `par` variable points to the parent vertex, the one from which the current vertex was discovered. But since a root vertex has no parent, we assign this variable the `Option[Vert]` type. The `Option` utility type is also defined in the `util` notebook. During construction of the vertex, we initialise `par` to `None`. The `dis` variable is the time the vertex was discovered. It is set to `Infinity` during initialisation. The `fin` variable is the time when processing of the vertex finished. It is used only by DFS, and is initialised to `-Infinity`, since a valid finish time can never be smallter than a valid discover time. And the `col` variable is the vertex colour. `White` means the vertex has not been explored. `Gray` means the vertex has been discovered. And `Black` means the vertex has been fully processed. All vertices are `White` at the start of a graph search algorithm. Here, we added `init()`, so that we may call it to reinitialise the vertex, later.

If we are objectively pedantic, we would have defined the `BFVertex` type for breadth-first search (BFS) and the `DFVertex` for depth-first search (DFS), and `fin` would be a member only of `DFVertex`. But since these are simple, elementary algorithms used everywhere, it is better not to over complicate the matter and settle for a less elegant but more convenient combined type `Vert`.

In [2]:
from util import *

class VColor:
  White = "White"
  Gray = "Gray"
  Black = "Black"

class Vert(Tagged):
  def __init__(self, tag: Tag):
    super().__init__(tag)
    self.par: Option[Vert] = None
    self.dis = Infinity
    self.fin = -Infinity
    self.col = VColor.White
  def init(self) -> None: self.__init__(self.tag)

  def __str__(self) -> str: return f"{self.tag} {self.showParent()} {self.showTimes()}"
  def show(self) -> str: return f"{self.tag}{f' {self.showTimes()}' if self.dis != Infinity else ''}"
  def showParent(self) -> str: return "^" + self.par.tag if not self.isRoot() else "None"
  def showTimes(self) -> str: return f"{self.dis if self.dis != Infinity else ''}{f'/{self.fin}' if self.fin != -Infinity else ''}"

  def isRoot(self) -> bool: return isNone(self.par)

importing Jupyter notebook from util.ipynb


## *edge*

An edge merely points to the from-vertex $u$ from whence it originates and the to-vertex $v$ into which it terminates. For debugging and logging convenience, we define the edge tag to be `utag` and `vtag` joined by `-`. We use the `makeETag()` utility function to make the edge tag from the vertex tags, during the construction of the edge.

Also, some graph search algorithms produce a forest of trees, and their edges must be categorised as described in *Classification of edges* p.569: tree edge `T`, back edge `B`, forward edge `F`, and cross edge `C`. These four classification label are defined in `EClass` enumeration, and they are stored in the `cls` attribute. We additionally define the `X` label to initialise `cls` during the construction of an edge.

As we did with `Vert`, we inject these oft-used attributes into `Edge`, instead of polluting the code with additional edge types.

In [3]:
from typing import Tuple

class EClass:
  X = "X"  # don't care
  T = "T"  # tree edge
  B = "B"  # back edge
  F = "F"  # forward edge
  C = "C"  # cross edge

class Edge(Tagged):
  def __init__(self, u: Vert, v: Vert):
    super().__init__(makeETag(u, v))
    self.u = u
    self.v = v
    self.cls = EClass.X
  def init(self) -> None: self.__init__(self.u, self.v)

  def __str__(self) -> str: return f"{self.tag} {self.showClassification()}"
  def show(self) -> str: return f"{self.showClassification()}"
  def showClassification(self) -> str: return self.cls if self.cls != EClass.X else ""

  def isSelfLoop(self) -> bool: return self.u == self.v

def makeETag(u: Vert, v: Vert) -> Tag: return f"{u.tag}-{v.tag}"

def parseETag(etag: Tag) -> Tuple[Tag, Tag]: return etag.split("-")

## *graph*

With `Vert` and `Edge` in hand, we now define `Vset` and `ESet`. We could use the Python `Set` type here, but since we wish to retrieve vertices and edges by their tags, we use the Python `Dict` type instead.

In [4]:
from typing import Dict, TypeVar

β = TypeVar("β")
VSet = Dict[Tag, Vert]

ϵ = TypeVar("ϵ")
ESet = Dict[Tag, Edge]

Since graph search algorithms produce trees as their results, we need to define the `Tree` type, as well. But a tree is conveniently represented as a graph. So, we define the `VE` type as the base class for both `Graph` and `Tree`. We could use `Tuple[VSet, ESet]` to represent `VE`; that would be mathematically accurate. Computationally, though, it is more convenient to derive `VE` from `Tagged`. We may then use the tag in debugging and logging graphs and trees.

In [5]:
from typing import Generic, List

class VE(Tagged, Generic[β, ϵ]):
  def __init__(self, tag: Tag):
    super().__init__(tag)
    self.vv: VSet = {}
    self.ee: ESet = {}
  def init(self) -> None:
    for u in self.getVV(): u.init()
    for e in self.getEE(): e.init()

  def makeVE(self, vt: List[Tag], et: List[Tag]) -> None:
    self.makeV(vt)
    self.makeE(et)
  def makeV(self, vt: List[Tag]) -> None:
    for vtag in vt: self.vv[vtag] = Vert(vtag)
  def makeE(self, et: List[Tag]) -> None:
    for etag in et:
      [utag, vtag] = parseETag(etag)
      e = Edge(self.getV(utag), self.getV(vtag))
      self.ee[e.tag] = e

  def __str__(self) -> str:
    return self.tag + "\n" + self.showVertices() + "\n" + self.showEdges()
  def showVertices(self) -> str:
    def neighbors(u: β) -> str: return ",".join([v.tag for v in self.adj(u)])
    return "\n".join([f"  {u}\n    [{neighbors(u)}]" for u in self.getVV()])
  def showEdges(self) -> str:
    return "\n".join([f"  {e}" for e in self.getEE()])

  # vertex

  def insV(self, v: β) -> None:
    self.vv[v.tag] = v
  def delV(self, v: β) -> None:
    self.vv.pop(v.tag)
  def dupVV(self, vv: VSet) -> None:
    self.vv = {**vv}
  def getV(self, vtag: Tag) -> β:
    return self.vv[vtag]
  def getVV(self) -> List[β]:
    return list(self.vv.values())
  def numVV(self) -> int:
    return len(self.getVV())
  def adj(self, u: β) -> List[β]:
    return [self.getV(e.v.tag) for e in self.getEE() if e.u.tag == u.tag]
  def path(self, s: β, v: β) -> List[β]:
    # see p.562
    if v.isRoot(): return []
    return [s] if v == s else [v, *self.path(s, v.par)]
  def isAncestor(self, u: β, v: β) -> bool:
    # check if vertex u is the ancestor of vertex v (there exists a path from u ~> v)
    def reachable(a: β) -> bool:
      # check if a can reach v
      aa = self.adj(a)  # edge a -> v
      return True if v in aa else reduce(lambda acc, b: acc or reachable(b), aa, False)  # path a ~> v
    return True if u == v else reachable(u)  # self-loop or path
  def isDescendant(self, v: β, u: β) -> bool:
    return self.isAncestor(u, v)
  def hasV(self, vtag: Tag) -> bool:
    return vtag in self.vv

  # edge

  def insE(self, e: ϵ) -> None:
    self.ee[e.tag] = e
  def delE(self, e: ϵ) -> None:
    self.ee.pop(e.tag)
  def dupEE(self, ee: ESet) -> None:
    self.ee = {**ee}
  def getE(self, etag: Tag) -> ϵ:
    return self.ee[etag]
  def getEE(self) -> List[ϵ]:
    return list(self.ee.values())
  def numEE(self) -> int:
    return len(self.getEE())
  def hasE(self, etag: Tag) -> bool:
    return etag in self.ee

Note that we use double-letter variables `vv` and `ee` to represent the plurals "vertices" and "edges". This is a convention in the [*Blue Book* legal citation](https://guides.library.harvard.edu/law/bluebook), where single-page citation is written "p.*n*", and multi-page citation is written either "pp.*m*,*n*" for discontiguous pages or "pp.*m*-*n*" for contiguous pages.

Above, `adj(u)` returns vertex $u$'s adjacency list, which is a list of all the neighbouring vertices. To be consistent with our own naming convention, we should call it `adjVV()`. But we chose to follow CLRS's naming convention and call it `adj()`, instead. For convenience, we wish to construct graphs and trees from textual configuration data comprised of a list of vertex tags `List[Tag] = ["u", "v", ...]` and a list of edge tags `List[Tag] = ["u-a", "u-b", "v-c"], ...]`. The utility function `makeVE()` interprets the configuration data and constructs the vertices and the edges. So, we may create and configure a graph thus: `g = Graph("g").makeVE(vt, et)`.

Now that we have an adequate `VE`, we may define `Graph` and `Tree` atop it. Since all the functionalities have been implemented in the base class, the derived classes are just empty tags.

In [6]:
class Graph(VE): pass
class Tree(VE): pass

## *visualisation*

In tests, we use the *Graphviz* graph visualisation library to draw the graph `bg`. We use the `sfdp` scalable force-directed placement layout engine, because it yields the most pleasing arrangement of vertices. See [*Graphviz documentation*](https://graphviz.org/) for details.

In [7]:
import graphviz as V

def draw(g: Union[Graph, Tree], directed: bool, label: str = "", engine: str = "sfdp") -> V.Graph:
  # returned gv must be evaluated in the top level scope for it to render in the notebook
  gv = V.Digraph(engine=engine) if directed else V.Graph(engine=engine)
  gv.attr(label=label if label != "" else g.tag)
  for v in g.getVV(): gv.node(v.tag, label=v.show(), shape=f"{'rectangle' if v.isRoot() else 'ellipse'}")
  ed = {}  # already drawn edges
  for e in g.getEE():
    vutag = makeETag(e.v, e.u)
    if directed or vutag not in ed.keys():  # for undirected graph, avoid drawing (u, v) if (v, u) has already been drawn
      gv.edge(e.u.tag, e.v.tag, label=e.show())
      ed[e.tag] = e
  return gv

# Elementary Graph Algorithms

If you have not read CLRS 4ed Chapter 20 *Elementary Graph Algorithms*, read it now, before proceeding with this notebook. The chapter presents two graph algorithms—bread-first search (BFS) and depth-first search (DFS)—which are used by just about every graph application. These algorithms work on both directed and undirected graphs.

## *breadth-first search*

BFS discovers a breadth-first tree (BFT) within graph $G = (V, E)$, staring from the source vertex $s$, exploring outward one edge-distance at a time, until all the vertices have been explored. See §20.2 *Breadth-first search* p.554.

Since the `Graph` constructor has already initialised the vertex and edge attributes, it may seem superfluous to reinitialise ther vertices, here. But this step is actually necessary, since we may wish to call `bfs()` multiple times on the same graph. The `Gray` vertices, which have been discovered but have not finished processing, are stored in a first-in, first-out (FIFO) queue described in §10.1.3 *Stacks and queues* p.254.

The purpose of BFS is to discover a breadth-first tree (BFT) witihn the graph. So, we implement `bft()`. This function accepts a graph, runs `bfs()` on the graph, and extracts a BFT therefrom. Since both BFS and DFS use the same initialisation sequence, we will implement the `init()` function, first.

In [8]:
from queue import Queue

def init(g: Graph) -> None:
  for u in g.getVV():
    u.par = None
    u.dis = Infinity
    u.col = VColor.White

def bfs(g: Graph, s: Vert) -> Graph:
  def explore() -> Graph:
    if q.empty(): return g
    u = q.get()
    for v in g.adj(u):
      if v.col == VColor.White:
        # v discovered
        v.par = u
        v.dis = u.dis + 1
        v.col = VColor.Gray
        q.put(v)
    # u finished
    u.col = VColor.Black
    return explore()

  # initialize
  init(g)
  # s discovered
  s.par = None
  s.dis = 0
  s.col = VColor.Gray
  # search g
  q = Queue()
  q.put(s)
  return explore()

def bft(g: Graph, s: Vert) -> Tree:
  g = bfs(g, s)
  t = Tree(f"{g.tag}†")
  for u in g.getVV():
    if u == s or not u.isRoot(): t.insV(u)
  for u in t.getVV():
    if not u.isRoot():
      e = g.getE(makeETag(u.par, u))
      t.insE(e)
  return t

Above, we call the `explore()` inner function recursively, instead of using the `while` loop as CLRS does. When studying algorithms, indeed for all things mathematics, it is essential to be comfortable with recursion. Recursive expressions can be understood by visual inspection, whereas to understand loopy statements, one must be mentally executed the sequence of statements. Familiarity with recursion is also becoming important to programmers now, since many modern programming languages are OO-FP hybrids with good compilers that can optimise tail-call recursions into jump instructions, thereby eliminating the attendant function call overhead. Python's support for recursion is pitiful, for sure. But here, the emphasis is on studying algorithms, not efficiency. If efficiency is our top priority, we should not use Python in the first place.

## *depth-first search*

DFS discovers a depth-first forest (DFF) within the graph by staring from some vertex $u$ and following the outbound edge $(u, v)$ to the neighbour vertex $v$, then proceeding as far as possible before backtracking to $u$ and trying another neighbour, and continuing in this manner until all the vertices have been explored. See §20.3 *Depth-first search* p.563.

Since the point of DFS is to discover DFF, we implement `dff()`, as well. And since DFF is pointless without edge classification, we implement edge classification within `dfs()` as described on CLRS p.570. Our implementation of DFS, therefore, is slightly more complicated than that described on CLRS p.565, but this is a necessary, minor departure.

In [9]:
from functools import reduce

def dfs(g: Graph) -> Graph:
  def explore(u: Vert) -> None:
    time[0] += 1
    # u discovered
    u.dis = time[0]
    u.col = VColor.Gray
    for v in g.adj(u):
      e = g.getE(makeETag(u, v))
      if v.col == VColor.Black:
        e.cls = EClass.F if u.dis < v.dis else EClass.C
      elif v.col == VColor.Gray:
        e.cls = EClass.B
      elif v.col == VColor.White:
        v.par = u
        e.cls = EClass.T
        explore(v)
    time[0] += 1
    # u finished
    u.fin = time[0]
    u.col = VColor.Black

  # initialize
  init(g)
  # search g
  time = [0]  # use array instead of a scalar to allow explore() to mutate time
  for u in g.getVV():
    if u.col == VColor.White: explore(u)
  return g

def dff(g: Graph) -> Graph:
  g = dfs(g)
  f = Graph(f"{g.tag}†")
  f.dupVV(g.vv)
  for v in f.getVV():
    if not v.isRoot():
      e = g.getE(makeETag(v.par, v))
      f.insE(e)
  return f

As we had done with BFS, we use the `explore()` recursive inner function to perform the search. CLRS uses the `time: int` global variable. We do not. Instead, we define a local `time: List[int]` in `dfs()`, and update it from within `explore()`. The inner function `explore()` can access variables defined in its outer function `dfs()`. This is called the *closure* property of functions. Python allows reading closure variables, but prohibits mutating them. But if a closure variable references an objects, the inner function may mutate the contents of the object but not the object reference itself. This is the case here: the `time` variable in `explore()` references a list object allocated in `dfs()`, so we may mutate the element `time[0]` but not `time` itself. This is an inelegant solution, but it allows us to follow the CLRS description.

# Applications of Elementary Graph Algorithms

CLRS Chapter 20 presents two applications: topological sort (TSort) and strongly connected components (SCC).

## *topological sort*

TSort applies DFS to a directed acyclic graph (DAG) to obtain a linear ordering of the vertices. Many simple tasks, like cooking and cleaning, to intricate processes, like surgical procedure and software compilation, depend on steps being performing in a particular order. A DAG can describe task dependencies, where vertices $u$ and $v$ are tasks and edges $(u, v)$ indicate that $u$ must be perfodelEd after $v$. TSort produces a sensible order of such tasks by applying DFS to the dag `sg` then sorting the vertices in the descending order of their finish times. See §20.4 *Topological sort* p.573.

In [10]:
def tsort(g: Graph) -> List[Vert]:
    g = dfs(g)
    return sorted(g.getVV(), key=lambda u: u.fin, reverse=True)

## *strongly connected components*

Strongly connected components (SCC) of a directed graph (digraph) are sets of vertices that are reacheable from each other. See Appendix B.4 *Graphs* pp.1164-1168. CLRS §20.5 *Strongly connected components* presents an algorithm that applies DFS twice to extract an SCC from a digraph. But first, we define the `Comp` component type, which is a `Vert` that contains a set of strongly connected vertices. We also define the `makeCTag()` utility function that forms the component's tag by merging the IDs of its constituent vertices.

In [11]:
class Comp(Vert):
  def __init__(self, tag: Tag):
    super().__init__(tag)
    self.vv: VSet = {}  # strongly connected vertices
  def init(self) -> None: self.__init__(self.tag)

  def insVV(self, vv: List[Vert]) -> None:
    for u in vv: self.vv[u.tag] = u
  def getVV(self) -> List[Vert]: return list(self.vv.values())

def makeCTag(vv: List[Vert]) -> Tag: return "+".join([v.tag for v in vv])

Now, we implement the SCC algorithm. To comput SCC according to the CLRS algorithm presented on p.577, we require two utility functions: `transpose()` that reverses the edges of a digraph, `sort()` that sorts the vertices by some attribute, and `contract()` that merges the strongly connected vertices into components by contracting the edges. `contract()` uses the DFF to contract its input graph. The contraction of an undirected graph is given on p.1168.

In [12]:
from typing import Callable

def scc(g: Graph) -> Graph:
  g = dfs(g)
  r = transpose(g)
  s = sort(r, attr=lambda u: u.fin, reverse=True)  # descending sort of vertices by finish times
  s = dfs(s)
  f = dff(s)
  return contract(g, f)

def transpose(g: Graph) -> Graph:
  # reverse edges
  r = Graph(f"{g.tag}!")
  r.dupVV(g.vv)
  for e in g.getEE(): r.insE(Edge(e.v, e.u))  # flip (u, v) to (v, u)
  return r

def sort(g: Graph, attr: Callable[[Vert], int], reverse: bool = False) -> Graph:
  # sort vertices
  s = Graph(f"{g.tag}§")
  for u in sorted(g.getVV(), key=attr, reverse=reverse): s.insV(u)  # sorted vertices
  s.dupEE(g.ee)
  return s

def contract(g: Graph, f: Graph) -> Graph:
  # contract DFS g using DFF f
  def scv(u: Vert) -> List[Vert]:
    aa = f.adj(u)  # vertex u's adjacent vertices in DFF f
    return [] if not aa else [v := aa[0], *scv(v)]

  c = Graph(f"{g.tag}₵")  # SCC c
  # create vertices of SCC c
  for r in [v for v in g.getVV() if v.isRoot()]:  # for each root vertex r in DFS g
    vv = [r, *scv(r)]  # strongly connected vertices rooted at vertex r
    x = Comp(makeCTag(vv))  # create component x by merging strongly connected vertices vv
    x.insVV(vv)
    c.insV(x)  # insert component x into SCC c
  # create edges of SCC c
  cc: List[Comp] = c.getVV()  # components of SCC c
  for x in cc:  # for each component x in SCC c
    aa: VSet = {}  # adjacent vertices of component x in DFS g
    vv = x.getVV()  # constituent vertices of component x
    for u in vv:  # for each constituent vertex u of component x
      for v in [a for a in g.adj(u) if a not in vv]: aa[v.tag] = v  # for each (u, v) leaving component x
    for a in aa.values():  # for each adjacent vertex a of component x
      for y in [b for b in cc if b != x]:  # for every other component y in SCC c
        if a in y.getVV():  # component y is adjacent to component x
          c.insE(Edge(x, y))  # insert edge (x, y) into SCC c
  return c

# Conclusion

In this notebook, we implemented the graph data structure, the BFS and DFS elementary graph algorithms, and the two graph applications described in CLRS Chapter 20. Other more advanced graph algorithms use BFS and DFS. Tests for these elementary graph algorithms and their applications are in [`graphtest.ipynb`](./graphtest.ipynb)