In [1]:
# automatically reload dependant notebooks
%load_ext autoreload
%autoreload 2
import import_ipynb

# Graph Representation

Mathematically, a graph $G = (V, E)$ is a tuple of the vertex set $V$ and the edge set $E$. So, we begin by defining the `Vert` and the `Edge` types and their respective sets. There are two common representations of graphs: adjacency matrix and adjacency list. The adjacency matrix representation is convenient for dense graphs. We will not come across dense graphs until the later chapters. So, we shall begin with the adjacency list representation. See §20.1 *Representations of graphs* p.549.

Do note that because this notebook defines the fundamental graph representations, we factored out the tests, whcih perform printing and drawing, into the [`graphtest.ipynb`](./graphtest.ipynb) notebook.

By the way, the Python `import` statement can only import Python modules, but not Jupyter notebooks. This is why you must recite those magical incantations at the very top of this notebook. Indeed, they must appear in every notebook that imports other notebooks. This notebook, for instance, imports [`util.ipynb`](./util.ipynb). The `import_ipynb` module, which is installed when you ran the `pip3 install -r ~/Documents/clrs/requirements.txt` command during JLD installation, patches the Jupyter runtime to allow importing notebooks.

## *vertex*

First, we define the `Vert` type. Because graph algorithms label the vertices and edges, we derive `Vert` from the `Tagged` base type, which is defined in [`util.ipynb`](./util.ipynb) notebook. And these algorithms store some state data in the vertex to keep track of progress as it explores the input graph. The `par` variable points to the parent vertex, the one from which the current vertex was discovered. But since a root vertex has no parent, we assign this variable the `Option[Vert]` type. The `Option` utility type is also defined in the `util` notebook. During construction of the vertex, we initialise `par` to `None`. The `dis` variable is the time the vertex was discovered. It is set to `Infinity` during initialisation. The `fin` variable is the time when processing of the vertex finished. It is used only by DFS, and is initialised to `-Infinity`, since a valid finish time can never be smallter than a valid discover time. And the `col` variable is the vertex colour. `White` means the vertex has not been explored. `Gray` means the vertex has been discovered. And `Black` means the vertex has been fully processed. All vertices are `White` at the start of a graph search algorithm. Here, we added `init()`, so that we may call it to reinitialise the vertex, later.

If we are objectively pedantic, we would have defined the `BFVert` type for breadth-first search (BFS) and the `DFVert` for depth-first search (DFS), and `fin` would be a member only of `DFVert`. But since these are simple, elementary algorithms used everywhere, it is better not to over complicate the matter and settle for a less elegant but more convenient combined type `Vert`.

In [2]:
from util import *

class VCol:
  White = "White"
  Gray = "Gray"
  Black = "Black"

class Vert(Tagged):
  def __init__(self, tag: Tag):
    super().__init__(tag)
    self.par: Option[Vert] = None
    self.dis = Infinity
    self.fin = -Infinity
    self.col = VCol.White
  def init(self) -> None: self.__init__(self.tag)

  def __str__(self) -> str: return f"{self.tag} {self.showParent()} {self.showTimes()}"
  def show(self) -> str: return f"{self.tag}{f' {self.showTimes()}' if self.dis != Infinity else ''}"
  def showParent(self) -> str: return "^" + self.par.tag if not self.isRoot() else "None"
  def showTimes(self) -> str: return f"{self.dis if self.dis != Infinity else ''}{f'/{self.fin}' if self.fin != -Infinity else ''}"

  def isRoot(self) -> bool: return isNone(self.par)

importing Jupyter notebook from util.ipynb


## *edge*

An edge merely points to the from-vertex $u$ from whence it originates and the to-vertex $v$ into which it terminates. For debugging and logging convenience, we define the edge tag to be `utag` and `vtag` joined by `-`. We use the `makeETag()` utility function to make the edge tag from the vertex tags, during the construction of the edge.

Also, some graph search algorithms produce a forest of trees, and their edges must be categorised as described in *Classification of edges* p.569: tree edge `T`, back edge `B`, forward edge `F`, and cross edge `C`. These four classification label are defined in `ECls` enumeration, and they are stored in the `cls` attribute. We additionally define the `X` label to initialise `cls` during the construction of an edge.

As we did with `Vert`, we inject these oft-used attributes into `Edge`, instead of polluting the code with additional edge types.

In [3]:
class ECls:
  X = "X"  # don't care
  T = "T"  # tree edge
  B = "B"  # back edge
  F = "F"  # forward edge
  C = "C"  # cross edge

class Edge(Tagged):
  def __init__(self, u: Vert, v: Vert):
    super().__init__(makeETag(u, v))
    self.u = u
    self.v = v
    self.cls = ECls.X
  def init(self) -> None: self.__init__(self.u, self.v)

  def __str__(self) -> str: return f"{self.tag} {self.showClassification()}"
  def show(self) -> str: return f"{self.showClassification()}"
  def showClassification(self) -> str: return self.cls if self.cls != ECls.X else ""

  def isSelfLoop(self) -> bool: return self.u == self.v

def makeETag(u: Vert, v: Vert) -> Tag: return f"{u.tag}-{v.tag}"

def parseETag(etag: Tag) -> [Tag, Tag]: return etag.split("-")

def indicesOfETag(etag: Tag) -> [int, int]:
  [utag, vtag] = parseETag(etag)
  return [int(utag) - 1, int(vtag) - 1]  # vtag starts at 1

def etagOfIndices(i: int, j: int): return f"{i + 1}-{j + 1}"

## *graph*

With `Vert` and `Edge` in hand, we now define `Vset` and `ESet`. We could use the Python `Set` type here, but since we wish to retrieve vertices and edges by their tags, we use the Python `Dict` type instead. And in anticipation of later chapters that use adjacency matrix representation, we will go ahead and define `WMtx`.

In [4]:
VSet = {Tag, Vert}  # vertex set
ESet = {Tag, Edge}  # edge set
WMtx = [[float]]  # adjacency matrix of edge weights

Since graph search algorithms produce trees as their results, we need to define the tree type, as well. But a tree is conveniently represented as a graph. So, we define the `VE` type as the base class for both graph and tree. Furthermore, since we will need both the adjacency list and the adjacency matrix representations, we make `VE` an abstract base class, and derive `LstVE` and `MtxVE` from it. Note that we could use `Tuple[VSet, ESet]` to represent `VE`; that would be mathematically accurate. Computationally, though, it is more convenient to derive `VE` from `Tagged`. We may then use the tag in debugging and logging graphs and trees.

We use the type variable `β` to stand for the `Vert` family of types and the type variable `ϵ` to stand for `Edge` family of types. This way, when algorithms that use types derived from `Vert` and `Edge` call `getV()` and `getE()`, they receive objects of the correct derived types, instead of those of the base `Vert` and `Edge` types.

On a side note, the Greek letter $β$ stands for the English letter $b$ as well as $v$, depending on the context. The Greek letter $𝜈$ stands for the English letter $n$. Hence, we use the type variable `β` for `Vert`.

Well, we have gone off onto a side track, so let us ploud along a bit further. In mathematical, scientific, and engineering computing—unlike in business [CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete)ing—we use short variable names, often single-letter names. And our functions tend to have very short names, too. The reason is plain. In technical computing, we always know what we must implement, because we have either derived and proved the equations and the algorithms ourselves or we lifted them off an authoritative textbook. In order that we may verify the correctness of our code readily by visual inspection, us make our code resemble as close as possible to the equations and the algorithms in the source.

Moreover, we document our code with chapter, section, and page citation to source. In this line of work, we do not subscribe to that abhorent twin motto of IT: "Real men do not document; the code is the document. And real men do not comment; long variable names are the comments."

There are other differences between the two fields. In business CRUDing, the data structures tend to be rather complicated, but there really are no algorithms to speak of, let alone complicated ones. By contrast, in technical computing, the lists and queues are the common data structures, but the algorithms can be quite intricate. In other words, business CRUDing complexities are in the data structure, whereas technical computing complexities are in the code structure. Using short names make our code compact, and the compact code allows us to see the whole algorithm on the screen at once, thereby enabling us to gain a better purchase upon the code.

In [5]:
from typing import Generic, TypeVar
from functools import reduce

β = TypeVar("β")
ϵ = TypeVar("ϵ")

class VE(Tagged, Generic[β, ϵ]):  # abstract base for graph and tree types
  def __init__(self, tag: Tag):
    super().__init__(tag)

  def makeVE(self, vt: [Tag], et: [Tag]) -> None:
    self.makeV(vt)
    self.makeE(et)
  def makeV(self, vt: [Tag]) -> None: todo()
  def makeE(self, et: [Tag]) -> None: todo()

  def __str__(self) -> str: return self.tag + "\n" + self.showVertices() + "\n" + self.showEdges()
  def showVertices(self) -> str:
    def neighbors(u: β) -> str: return ",".join([v.tag for v in self.adj(u)])
    return "\n".join([f"  {u}\n    [{neighbors(u)}]" for u in self.getVV()])
  def showEdges(self) -> str: return "\n".join([f"  {e}" for e in self.getEE()])

  # vertex

  def insV(self, v: β) -> None: todo()
  def delV(self, v: β) -> None: todo()
  def dupVV(self, vv: VSet) -> None: todo()
  def getV(self, vtag: Tag) -> β: todo()
  def getVV(self) -> [β]: todo()
  def numVV(self) -> int: todo()
  def adj(self, u: β) -> [β]: todo()
  def hasV(self, vtag: Tag) -> bool: todo()

  # edge

  def insE(self, e: ϵ) -> None: todo()
  def delE(self, e: ϵ) -> None: todo()
  def dupEE(self, ee: ESet) -> None: todo()
  def getE(self, etag: Tag) -> ϵ: todo()
  def getEE(self) -> [ϵ]: todo()
  def numEE(self) -> int: todo()
  def hasE(self, etag: Tag) -> bool: todo()

Now, let us define the adjacency list version `LstVE` concrete implementation of the abstract base class `VE`.

In [6]:
class LstVE(VE):  # adjacency list representation of graphs and trees
  def __init__(self, tag: Tag):
    super().__init__(tag)
    self.vv: VSet = {}
    self.ee: ESet = {}

  def makeV(self, vt: [Tag]) -> None:
    for vtag in vt: self.vv[vtag] = Vert(vtag)
  def makeE(self, et: [Tag]) -> None:
    for etag in et:
      [utag, vtag] = parseETag(etag)
      e = Edge(self.getV(utag), self.getV(vtag))
      self.ee[e.tag] = e

  def pathSV(self, s: β, v: β) -> [β]:
    # path from source vertex s to vertex v; see p.562
    return [s] if v.isRoot() or v == s else [v, *self.pathSV(s, v.par)]
  def isAncestor(self, u: β, v: β) -> bool:
    # check if vertex u is the ancestor of vertex v (there exists a path from u ~> v)
    def reachable(a: β) -> bool:
      # check if a can reach v
      aa = self.adj(a)  # edge a -> v
      return True if v in aa else reduce(lambda acc, b: acc or reachable(b), aa, False)  # path a ~> v
    return True if u == v else reachable(u)  # self-loop or path
  def isDescendant(self, v: β, u: β) -> bool: return self.isAncestor(u, v)

  def insV(self, v: β) -> None: self.vv[v.tag] = v
  def delV(self, v: β) -> None: self.vv.pop(v.tag)
  def dupVV(self, vv: VSet) -> None: self.vv = {**vv}
  def getV(self, vtag: Tag) -> β: return self.vv[vtag]
  def getVV(self) -> [β]: return list(self.vv.values())
  def numVV(self) -> int: return len(self.getVV())
  def adj(self, u: β) -> [β]: return [self.getV(e.v.tag) for e in self.getEE() if e.u.tag == u.tag]
  def hasV(self, vtag: Tag) -> bool: return vtag in self.vv

  def insE(self, e: ϵ) -> None: self.ee[e.tag] = e
  def delE(self, e: ϵ) -> None: self.ee.pop(e.tag)
  def dupEE(self, ee: ESet) -> None: self.ee = {**ee}
  def getE(self, etag: Tag) -> ϵ: return self.ee[etag]
  def getEE(self) -> [ϵ]: return list(self.ee.values())
  def numEE(self) -> int: return len(self.getEE())
  def hasE(self, etag: Tag) -> bool: return etag in self.ee

Similarly, we define the adjacency matrix version `MtxVE` concrete implementation of the abstract base class `VE`. Below, we inherit `MtxVE` from the adjacency list version `LstVE`, because we may then continue to use `draw()` and other convenience functions without having to provide the matrix-specific versions thereof. All we need do is to add the `ww` matrix and modify the `makeV()` function. This expedient, but shoddy, shortcut is acceptable here, because the dense graphs used in the later chapters are very small. In a practical situation where space conservation is important for large, dense graphs, we would derive `MtxVE` directly from the abstract class `VE`, and provide matrix-specific implementations.

In [7]:
WMtx = [[float]]  # adjacency matrix of edge weights

class MtxVE(LstVE):  # adjacency matrix representation of graphs and trees
  def __init__(self, tag: Tag):
    super().__init__(tag)
    self.ww: WMtx = []  # edge weight matrix

  def pathASP(self, i: int, j: int) -> [int]:
    if not dd or not pp: raise Exception("shortest paths not yet computed")
    if self.pp[i][j] == -Infinity: return []
    elif i == j: return [i]
    else: return self.pathASP(i, pp[i][j])

  def makeV(self, vt: [Tag]) -> None:
    for vtag in vt: self.vv[vtag] = Vert(vtag)
    n = len(vt)
    r = range(0, n)
    # see Equation 23.1 p.647
    self.ww = [[Infinity] * n for _ in r]
    for i in r: self.ww[i][i] = 0

Note that we use double-letter variables `vv` and `ee` to represent the plurals "vertices" and "edges". This is a convention in the [*Blue Book* legal citation](https://guides.library.harvard.edu/law/bluebook), where single-page citation is written "p.*n*", and multi-page citation is written either "pp.*m*,*n*" for discontiguous pages or "pp.*m*-*n*" for contiguous pages.

Above, `adj(u)` returns vertex $u$'s adjacency list, which is a list of all the neighbouring vertices. To be consistent with our own naming convention, we should call it `adjVV()`. But we chose to follow CLRS's naming convention and call it `adj()`, instead. For convenience, we wish to construct graphs and trees from textual configuration data comprised of a list of vertex tags `[Tag] = ["u", "v", ...]` and a list of edge tags `[Tag] = ["u-a", "u-b", "v-c"], ...]`. The utility function `makeVE()` interprets the configuration data and constructs the vertices and the edges. So, we may create and configure a graph thus: `g = Graph("g").makeVE(vt, et)`.

Finally, we define `LstGraph`, `LstTree`, `MtxGraph`, and `MtxTree`. Since all the functionalities have been implemented in the base classes, the derived classes are just empty tags.

In [8]:
class LstGraph(LstVE): pass
class LstTree(LstVE): pass

class MtxGraph(MtxVE): pass
class MtxTree(MtxVE): pass

## *visualisation*

In tests, we use the *Graphviz* graph visualisation library to draw the graph `bg`. We use the `sfdp` scalable force-directed placement layout engine, because it yields the most pleasing arrangement of vertices. See [*Graphviz documentation*](https://graphviz.org/) for details.

In [9]:
import graphviz as V

def draw(g: Union[LstGraph, MtxGraph, LstTree], directed: bool, label: str = "", engine: str = "sfdp") -> V.Graph:
  # returned gv must be evaluated in the top level scope for it to render in the notebook
  gv = V.Digraph(engine=engine) if directed else V.Graph(engine=engine)
  gv.attr(label=label if label != "" else g.tag)
  for v in g.getVV(): gv.node(v.tag, label=v.show(), shape=f"{'rectangle' if v.isRoot() else 'ellipse'}")
  ed = {}  # already drawn edges
  for e in g.getEE():
    vutag = makeETag(e.v, e.u)
    if directed or vutag not in ed.keys():  # for undirected graph, avoid drawing (u, v) if (v, u) has already been drawn
      gv.edge(e.u.tag, e.v.tag, label=e.show())
      ed[e.tag] = e
  return gv

# Conclusion

In this notebook, we implemented the graph data structures described in CLRS Chapter 20.