# Graphs
In this lecture we will discuss several common problems in graph theory. The algorithms we discuss are not only useful in practice, they are used in real-life applications. In this lecture we will discuss:
1. A brief introduction to graphs
2. Topological Sort
3. Shortest-Path Algorithms
4. Network Flow Problems
5. Minimum Spanning Tree
6. Depth First Search

## Definitions
A **graph** $G = (V,E)$ consists of a set of **vertices**, V, and a set of **edges**, E. Each edge is a pair $(v,w)$, where $v,w \in V$. If the pair is ordered, or the matrix of edges is not symmetric, then the graph is directed. A vertex w is **adjacent** to v if and only if $(v,w) \in E$. In an **undirected** graph if there is an edge $(v,w) \in E \rightarrow (w,v) \in E$, in other words the matrix of edges is symmetric. Sometimes an edge has a third component, known as either a **weight** or **cost** to take it. <br>

A **path** in a graph is a sequence of vertices $w_1,w_2,w_3,...,w_n$ such that $(w_i,w_{i+1}) \in E$ for $1 \leq i \leq n$. The **length** of a path is the number of edges on the path, which is equal to $n-1$. We also allow for a vertex to have a path to itself which would be of length 0. This allows for us to define a special kind of edge. If the graph contains an edge $(v,v)$ from a vertex to itself, then the path v,v is referred to as a **loop**. A **simple path** is a path such that all vertices are distinct, except that the first and last vertex could be the same. <br>

A **cycle** in a directed graph is a path of length at least 1 such that $w_1 = w_n$. This cycle is simple if the path is simple. For undirected graphs, we require that the edges be distinct. This is defined so that the path u,v,u is not a cycle, because $(u,v)$ and $(v,u)$ are the same edge. So we define a cycle in an undirected graph a path, $p = w_1,w_2,...,w_n$, where $w_1 = w_n$ and $\forall(w_i,w_{i+1}) \in p\,,(w_{i+1},w_i) \not\in p$. However, in a directed graph these are different edges, so the path $p=u,v,u$ such that $(u,v),(v,u) \in E$. A directed graph is **acyclic** if it has no cycles, we also refer to this graph as a **DAG** (directed acyclic graph). <br>

An undirected graph is **connected** if there is a path from every vertex to every other vertex. A directed graph with this property is called **strongly connected**. If a directed graph is not strongly connected, but the underlying graph (without directed edges) is connected then it is said to be **weakly connected**. A **complete graph** is a graph in which there is an edge between every pair of vertices. <br>

An real world example that can be modeled as a graph is a road system. Each intersection is a vertex and each street is a directed edge. You could also associate a cost with each edge being speed limit, distance or time it takes to travel from one intersection to another. 

## Representations of Graphs
One simple way to represent a graph is to use a two-dimensional array known as an **adjacency matrix**. For each edge $(u,v) \in E$, we set A\[u\]\[v\] to 1; otherwise it's 0. If there is a weight associated to the edge we could simply set the value to the weight and for edges that don't exist we could set the weight to $\infty/-\infty$ (depending on the problem we are trying to solve) or null. <br>

However if the graph is **sparse**, meaning $\mid E \mid \lt \mid V \mid^2$, a better solution is to use an **adjacency list**. This is because a sparse matrix will have a bunch of empty entries taking up more space than neccessary. 

## Topological Sort
A **topological sort** is an ordering of vertices in a directed acyclic graph, such that there is a path from $v_i$ to $v_j$, then $v_j$ appears *after* $v_i$ in the ordering. Also the ordering is not unique as there are multiple legal sorts of a directed acyclic graph. It is easy to see topological ordering cannot be applied on a cyclic graph because for two vertices $v$ and $w$ on the cycle $v$ precedes $w$ and $w$ precedes $v$. If we look at the graph below $v_1, v_2, v_5, v_4, v_3, v_7, v_6$ and $v_1, v_2, v_5, v_4, v_7, v_6, v_3$ are both topological orderings.
<img src="./files/Graphs/topological_sort.png" width="400"/>

We can create a simple algorithm to find a topological ordering by first finding any vertex with no incoming edges. We define the **indegree** of a vertex $v$ as the number of edges $\mid\{(u,v) \in E\}\mid$, meaning the number of edges going into $v$. If we make the assumption that each vertex keeps track of it's indegree, which we can easily calculate then the following algorithm would be simple. The initial thought is to look for vertices that always have an indegree of 0 then assign it a ordering. Whenever we find one we decrease the indegree of it's adjacent vertices by one and find another vertex with indegree 0 and does not currently already an ordering. We repeat this until all the vertices have an order. Let's look at the code below:
```python
def find_next_vertex():
    # Iterate through the vertices in the graph
    for v in V:
        if v.indegree == 0 and v.ordering == None:
            return v
   
    return None

def top_sort():
    for i in range(len(V)):
        v = find_next_vertex()
        v.ordering = i
        for w in v.adjacent:
            w.indegree -= 1
```

While this code works it is inefficient as the running time is $O(V^2)$ since we iterate through the list of vertices each time we want to find a new vertex. Since we are only decreasing the indegree of vertices adjacent to the current vertex we know that the next 0 indegree vertex must be in that list. Therefore we could use a queue to keep track of all vertices we encounter with an indegree of 0 while iterating through the list of neighbors. Also in doing this we are assured to never encounter a vertex that already has an ordering unless the graph has a cycle. The topological ordering is then the order at which the vertices are dequeued. This algorithm also runs faster is $O(V+E)$ if we use an adjancency list. Let's look at the algorithm with queues.
```python
def top_sort():
    q = Queue()
    counter = 0 # keeps track of our ordering
    
    # Gather initial vertex
    for v in V:
        if v.indegree == 0:
            q.enqueue(v)
    
    while not q.isEmpty():
        v = q.dequeue()
        v.ordering = counter
        counter += 1
        
        for w in v.adjacent:
            w.indegree -= 1
            if w.indegree == 0:
                q.enqueue(w)
       
        if counter > len(V):
            raise Exception # we have encountered a cycle
```

## Shortest-Path Algorithms
A popular problem in graph theory is finding the shortest path from an initial vertex $v$ to another vertex $w$. You can think of this as of how Google maps finds the shortest route when looking up directions. In these problems the input is a weighted graph where each edge $(v_i,v_j)$ has an associated cost $c_{i,j}$ to traverse (or take) that edge. The cost of a path $v_1v_2...v_n = \sum_{i=1}^{n-1} c_{i,i+1}$. We refer to this as the **weighted path length**. The **unweighted path length** is just the number of edges on the path. <br>

##### Single-Source Shortest-Path Problem
Given as input a weighted graph, G = (V,E), and a distinguished vertex, s, find the shortest weighted path from s to every other vertex in G. <br>

For example in the following graph the shortest weighted path from $v_1$ to $v_6$ has a cost of 6 and a path of $v_1,v_4,v_7,v_6$
<img src="./files/Graphs/ex_one_shortest_path.png" width="400"/>

However what happens if the graph has negative cost edges like the graph below?
<img src="./files/Graphs/ex_two_shortest_path_neg.png" width="400"/>

There is a path from $v_5$ to $v_4$ is cost 1 however there exist a shorter path $v_5, v_4, v_2, v_5, v_4$ which has cost -5. This path is also not the shortest as we could go through the loop infinitely many times and constantly get a decreasing cost. Thus the shortest path between the two points is undefined. This loop is known as a **negative-cost cycle** and when one exists in a graph the shortest paths are undefined. Negative-cost edges are not necessarily bad, whereas negative-cost cycles are, but their presence makes the problem harder to solve. In the absence of a negative-cost cycle the shortest path from a vertex to iteslf is 0. 

### Unweighted Shortest Paths
In this instance when a graph is unweighted we are only interested in the number of edges contained on the path. In this case we could assign all the edges a weight of 1. For now we are only interested in the length of the shortest paths and not the actual paths themselves, as keeping track of the paths would just be a simple addition. So in the following graph suppose we choose s to be $v_3$. 

<img src="./files/Graphs/unweighted_shortest.png" width="400"/>

Here we immediately know the shortest path from s to $v_3$ is 0 so we would mark it down. Now we would look at all vertices adjacent to s which will be a distance of 1 away. In this case $v_1$ and $v_6$ are both 1 away from s. We continue this finding all vertices of distance 2,3,... until all the shortest paths from s to each vertex are known. What we are doing here is performing **breadth-first search (BFS)**. It operates by processing each vertex in layers. The vertices that are closest are evaluated first then the most distance vertices are evaluated last. Another way to think about it is with trees. We can also perform BFS in trees by look at the children of the current node first then look at the childrens children until we reach the lowest level containing only leaves. <br>

<img src="./files/Graphs/unweighted_shortest_table.png" width="200"/>

With this strategy we can create an initial algorithm. We would create an initial table to keep track of information during the algorithm as above. For each vertex we will maintain three pieces of information. First, it's distance from s in the entry $d_v$. Initially all vertices are set to $\infty$ except for s whose distance from s to s is 0. The entry $p_v$ is used to maintain the actual paths from s to each vertex. The entry *known* is set to **true** after each vertex has been processed. Initially all vertices are not *known*, including s. When a vertex is marked *known*, we have a gurantee that no cheaper path will ever be found, which means processing for that vertex is complete. Now let's look at the following algorithm in code:
```python
def shortest_unweighted(G, s):
    dist = [math.inf for _ in range(len(G.V))]
    known = [False for _ in range(len(G.V))]
    path = [None for _ in range(len(G.V))]
        
    dist[s] = 0
    for i in range(len(V)):
        for v in V:
            if not known[v] and dist[v] == i:
                known[v] = True
                for u in v.adjacent:
                    if dist[u] == math.inf:
                        dist[u] = i + 1
                        path[u] = v
```
We can easily find the paths by back-tracking through the path variable. The running time of this algorithm is $O(\lvert V\rvert^2)$ because of the double nested for loops. The obvious inefficiency is that the first for loop goes until the size of V, even if all the vertices are *known*. We can remove this inefficiency by simply using a queue. At the start of each pass the queue only contains vertices of the current\_distance. We then enqueue all the adjacent vertices that have current\_distance + 1. In this case we can remove the *known* variable since once a vertex is processed it will never enter the queue again. Now let's look at the refined algorithm:
```python
def shortest_unweighted(G, s):
    q = Queue()
    
    dist = [math.inf for _ in range(len(G.V))]
    path = [None for _ in range(len(G.V))]
       
    dist[s] = 0
    q.enqueue(s)
    
    while not q.isEmpty():
        v = q.dequeue()
        
        for w in v.adjacent:
            if dist[w] == math.inf:
                dist[w] = dist[v] + 1
                path[w] = v
                q.enqueue(w)
```

We can see with the addition of using a queue that the running time has improved to $O(\lvert E\rvert + \lvert V\rvert)$, as long as we use an adjaceny list. 

### Dijkstra's Algorithm
If the graph is weighted, the problem becomes harder, but we can still use the general idea of the unweighted case. We use all the same information table as before. Thus each vertex is either *known* or *unknown*, a tenative distance $d_v$ is kept for each vertex (this distance is the shortest path from s to v using only known vertices), and we record $p_v$ which is the last vertex to cause a change to $d_v$. <br>

The general method to solve the single-source shortest-path problem is known as **dijkstra's algorithm**. This solution is a **greedy algorithm**. A greedy algorithm will solve the problem by always choosing what appears to be the best option at each stage. For example when creating change cashiers will always start with the highest bill first then work all the way down to the 1 dollar bill, this way they use the minimum number of bills. <br>

Dijkstra's algorithm works in stages. At each stage we select a vertex v which has the smallest $d_v$ among all the *unknown* vertices and declare that the shortest path from s to v is *known*. The remainder of the stage is updating the values of $d_v$ using this information. As we saw in the unweighted case we set $d_w = d_v + 1$ if $d_v = \infty$. Thus we essentially lowered the value of $d_w$ if v offered a shorter path. Here we want to apply the same logic so $d_w = d_v + c_{v,w}$ if this new value for $d_w$ is less than it's current. Thus the idea it is only a good idea to use v on the path to w if it offers a lower cost. We will use the following graph for our example which is followed by our initial table, which in this example $s = v_1$. 
<img src="./files/Graphs/dijkstra_graph.png" width="400"/>
<img src="./files/Graphs/dijkstra_initial_ex.png" width="200"/>

The first vertex selected is $v_1$, with a path length 0 (since the path from a vertex to itself is 0). We then mark this vertex as *known*. Now that $v_1$ is *known*, some entries need to be adjusted. The vertices adjacent to $v_1$ are $v_2$ and $v_4$, both these vertices get their entries adjusted as follows:

<img src="./files/Graphs/dijkstra_1.png" width="200"/>

Next $v_4$ is selected since it has the smallest distance of the *unknown* vertices and mark it as *known*. Vertices $v_3, v_5, v_6$ and $v_7$ are adjacent and updated accordingly. 

<img src="./files/Graphs/dijkstra_2.png" width="200"/>

Next we select $v_2$. Here $v_4$ is adjacent but it is already *known*, so no work is performed on it. $v_5$ is adjacent but it is not adjusted since it would raise the cost from 3 to 12. So no changes occur other than changing $v_2$ to *known*. Next we select $v_5$ which only has $v_7$ as adjacent, but it is not adjusted as it would raise the cost. Then we select $v_3$, and we adjust $v_6$ since 8 < 9 resulting in the following table.

<img src="./files/Graphs/dijkstra_4.png" width="200"/>

Next we select $v_7$ and $v_6$ gets updated from 8 to 6.

<img src="./files/Graphs/dijkstra_5.png" width="200"/>

Lastly $v_6$ is selected however no changes are made resulting in the final table

<img src="./files/Graphs/dijkstra_6.png" width="200"/>

Now let's look at some code for the algorithm itself:
```python
def dijkstra(G, s):
    dist = [math.inf for _ in range(len(G.V))]
    known = [False for _ in range(len(G.V))]
    path = [None for _ in range(len(G.V))]
    
    dist[s] = 0
    
    while unknown_vertices != 0:
        v = get_smallest(dist)
        known[v] = True
        
        for w in v.adjacent:
            if not known[w] and dist[w] > dist[v] + c[v,w]: #c is a matrix of cost for each edge
                dist[w] = dist[v] + c[v,w]
                path[w] = v
```

This algorithm will always work as long as there exist no negative cost edges. If any edge has negative cost the algorithm would produce the wrong answer. The total running time of this algorithm $O(\lvert E\rvert + \lvert V \rvert^2) = O(\lvert V \rvert^2)$. This is because each phase will take $O(\lvert V \rvert)$ to find the minimum vertex and thus $O(\lvert V \rvert^2)$ time will be spent finding the minimumover the course of the algorithm also the time of updating each vertex is $O(\lvert E\rvert)$ since at most we update each adjacent vertex. We can however optimize it by using a heap to get the minimum vertex and during the update of each vertex if we could use a **decrease\_key** operation if the vertex's distance was updated. This would give a total running time of $O(\lvert E \rvert \log\lvert V\rvert)$. 

### Graphs with Negative Cost Edges
As we've stated before if the graph has negative cost edges then dijkstra's algorithm no longer works. The problem is that once a vertex $u$ is declared *known*, is it possible from some other, *unknown* vertex $v$ there is a path back to $u$ that is negative. In such a case, taking the path from s to $v$ back to $u$ is better than going from s to $u$ without using $v$. <br>

However the solution could be to combine the weighted and unweighted solutions together. This however causes a drastic increase in running time. In this case we forget about the concept of *known* vertices, since if we find a lower cost option we should be able to change our mind. We begin by placing s onto the queue. Then at each stage we dequeue vertex $v$ and find all vertices $w$ adjacent such that $d_w > d_v + c_{v,w}$. We then update $d_w$ and $p_w$ and place $w$ onto the queue if it is not already there. We would repeat this until the queue is empty. This increases the runtime of dijkstra's algorithm to $O(\lvert E\rvert\cdot\lvert V\rvert)$. However if there exist negative-cost cycles it no longer works. At this point we would need to create a stopping point to account for negative-cost cycles. We could stop the algorithm if any vertex $v$ has been dequeued $\lvert V\rvert-1$ times. This will gurantee termination otherwise the algorithm would run forever.

### Acyclic Graphs
If the graph is known to be acyclic, we can improve dijkstra's algorithm by changing the order in which vertices are declared *known*, which is known as **vertex selection rule**. The new rule is to select vertices in topological order. The algorithm can be done in one pass, since the selections and updates can take place as the topological sort is bring performed. This selection rule works because when a vertex $v$ is selected, its distance $d_v$, can no longer be lowered, since by the topological ordering rule it has no incoming edges coming from *unknown* vertices. This also removes the need to using a priority queue since we already have a predefined method of selecting vertices. The running time using this method is now $O(\lvert E\lvert + \lvert V\rvert)$ since the selection is in constant time. <br>

### All-Pairs Shortest Path
Sometimes we want to find the shortest paths between all vertices in the graph. The question is how would we perform this well the solution is quite simple. In this case we could just run dijkstra's algorithm $\lvert V\rvert$ times. This however does run in $O(\lvert V\rvert^3)$. 

## Network Flow Problems
Suppose we have a directed graph G with edge weights $c_{v,w}$. These weights could represent the amount of water that could flow through a pipe or amount of traffic that could flow on a street between two intersections. We have two vertices: s which we call the **source** and t which is the **sink**. Through any edge, (v,w) at most $c_{v,w}$ units of *flow* may pass through. At any vertex, v, that is not either s or t the total flow coming in must equal the total flow going out. The **maximum flow problem** is to determine the maximum amount of flow that can pass from s to t. 

### Simple Maximum Flow Algorithm
We will attempt to solve the problem in stages. We start with our graph, G, and construct a flow graph $G_f$. $G_f$ tells the flow that has been attained at any stage in the algorithm. Initially all edges in $G_f$ have no flow, and when the algorithm terminates, $G_f$ contains a maximum flow. We also construct a graph $G_r$, called the **residual graph**. $G_r$ tells, for each edge, how much more flow can be added. We can calculate this by subtracting the current flow from the capacity for each edge. An edge in $G_r$ is known as a **residual edge**. Lets look at an example graph:<br>

<img src="./files/Graphs/flow_graph_1.png" width="600"/>

The maximum flow of this graph is 5. However in the graph we can see that t has capacities of both 3 and 3 entering it possibly showing the maximum flow could be 6. However we can show that the maximum flow is indeed 5. We do this by cutting the graph into two parts: one contains s and some other vertices; the other part contains t. Since the flow must cross the cut, the total capacity of all edges (u,v) where u is in s's partition and v is in t's partition is a bound on the maximum flow. These edges are (a,c) and (d,t) with a total capacity of 5, so the maximum flow cannot excede 5. Any graph has a large number of cuts but the cut with the minimum capacity provides a bound on the maximum flow. So the minimum cut capacity is exactly equal to the maximum flow. We can see and example below. 

<img src="./files/Graphs/flow_cut.png" width="250"/>


At each stage, we find a path in $G_r$ from s to t. This path is known as an **augmenting path**. The minimum edge on this path is the amount of flow that can be added to every edge on the path. We do this by adjusting $G_f$ and recomputting $G_r$. When we from no path from s to t in $G_r$ we terminate. This algorithm is nondeterministic, in that we are free to choose any path from s to t. Now lets look at an example in the following graphs. In order the graphs are G, $G_f$, and $G_r$ and the graph below is the initial graph.

<img src="./files/Graphs/flow_1.png" width="600"/>

There are many paths from s to t in the residual graph. Suppose we select s,b,d,t. Then we can send two units of flow through every edge on this path. We will adopt the convention that once we have filled (**saturated**) an edge it is removed from the residual graph. After choosing this path and sending the flow we have the following graph:

<img src="./files/Graphs/flow_2.png" width="600"/>

Next we arbitrarily select s,a,c,t which allows two units of flow. Resulting in the following graph.

<img src="./files/Graphs/flow_3.png" width="600"/>

The only path left in $G_r$ is now s,a,d,t which allows one unit of flow. The resulting graph is seen below:

<img src="./files/Graphs/flow_4.png" width="600"/>

The algorithm would terminate at this point as there is no longer a path from s to t in $G_r$. The resulting flow of 5 happens to be the maximum. This is because we have a flow of 2 coming from $c \rightarrow t$ and 3 coming from $d \rightarrow t$. Now suppose instead we took s,a,d,t as the first path which would result in the residual graph terminating after two iterations as the only viable option after is s,a,c,t. However the result is not optimal as the maximum flow is 4. This is an example of how a greedy algorithm would not work. This is greedy because we took the path that had the maximum flow along it. <br>

In order to make this algorithm work, we need to allow the algorithm to change its mind. To do this for every edge (v,w) with flow $f_{v,w}$ in the flow graph, we will add an edge in the residual graph (w,v) of capacity $f_{v,w}$. In effect we are allowing the algorithm to change it's mind by sending flow back in the opposite direction. Now using this addition if we select s,a,d,t we obtain the following graph. 

<img src="./files/Graphs/impr_flow_1.png" width="600"/>

Now notice in the residual graph, there are edges in both directions between a and d. Either one more unit of flow can go from a to d or up to three units can be pushed back (we undo the flow). Now after choosing s,a,d,t the algorithm chooses s,b,d,a,c,t. By pushing two units of flow from d to a we add the two units of flow back from a to d. As we can see in the following graph:

<img src="./files/Graphs/impr_flow_2.png" width="600"/>
<img src="./files/Graphs/impro_flow_cut.png" width="400"/>

There is no augmenting path in the residual graph so it terminates. Again we can see that the maximum flow is correctly 5. We can even prove this by taking the minimum cut. Although in this example the graph was acyclic it is not a requirement for the graph to work as it will also work for graphs with cycles. However if we nondeterministically choose an augmenting path we do not always make the best choice. A simple method around this is to always choose the augmenting path that allows for the largest increase in flow, which goes back into making this a greedy algorithm. <br>

Using this rule it can be shown that $O(\lvert E\rvert\cdot\lvert V\rvert)$ augmenting steps are required. Each step takes $O(\lvert E\rvert)$, yielding $O(\lvert E\rvert^2\lvert V\rvert)$ bound on the running time. <br>

The algorithm we just described is actually also known as the Ford-Fulkerson algorithm. Lets look at some pseudo-code for it
```python
def ford_fulkerson(G, s, t):
    # create a residual graph with all flows being the cost
    for (u,v) in G.E:
        g_r[u][v] = G.c[u][v]
        
    # now we create a flow graph
    for (u,v) in G.E:
        g_f[u][v] = 0

    # while there exist a path from s to t in the residual graph
    p = max_flow(g_r, s, t) # here max flow returns the path
                            # In this case we greedily take the path resulting in the maximum 
    while get_cost(max_flow(p)) > 0: # if the path doesn't exist it returns a cost of 0
        minimum_flow = min(p) # get the amount of flow that can be added.
        for (u,v) in p:
            g_f[u][v] = g_f[u][v] + minimum_flow
            g_f[v][u] = g_f[v][u] - minimum_flow
        
```

A related even more difficult problem is the **min-cost flow** problem. Each edge has not only a capacity but a cost per unit of flow. The problem is to find, among all maximum flows, the one flow of minimum cost. Which we do not explore the algorithm to solve this problem as they span beyond the scope of this class. 

## Minimum Spanning Tree
Now we consider the problem of finding the **minimum spanning tree** in an undirected graph. A minimum spanning tree of an undirected graph G is a tree formed from graph edges that connect all the vertices in G at lowest total cost. A minimum spanning tree exists if and only if G is connected. In the following algorithms we will assume G is connected and there already exists an algorithm that checks if the graph is connected. Lets look at an example below:<br>

<img src="./files/Graphs/mst_example.png" width="500"/>

In the graph above the second graph is the minimum spanning tree of the first. Notice that the number of edges in the minimum spanning tree is $\lvert V\rvert - 1$. The minimum spanning tree is a *tree* because it is acyclic, and it is *spanning* because it covers every vertex. <br>

For any spanning tree T, if an edge $e$ that is not in T is added, a cycle is created. The removal of any edge the on the cycle reinstates the spanning tree property. The cost of the spanning tree is lowered if $e$ has lower cost than the edge that was removed. If the edge that is added is one of the minimum cost that avoids creation of a cycle, then the cost of the resulting spanning tree cannot be improved, because any replacement edge would have cost at least as much as an edge already in the spanning tree. This gives the impression that greedy algorithms can work. We will discuss two different greedy algorithms that find the minimum spanning tree of a graph. 

### Prim's Algorithm
One way to compute a minimum spanning tree is to grow the tree in successive stages. In each stage, one node is picked as the root, and we add an edge, and thus an associated vertex to the tree. At any point in the algorithm, we can see that we have a set of vertices that have already been included in the tree; the rest of the vertices have not. The algorithm the finds, at each stage, a new vertex to add to the tree by choosing the edge (u,v) such that the cost of (u,v) is the smallest among all edges where u is in the tree and v is not. Lets look at the following example where we initially start at $v_1$ as the root and the tree has no edges. Then each step adds one edge and one vertex.

<img src="./files/Graphs/prim_tree.png" width="600"/>

We can see that Prim's algorithm is identitical to Dijkstra's algorithm. So we will keep, for each vertex, values $d_v$ and $p_v$ as an indicator of whether it is *known* or *unknown* (meaning it has already been the root). $d_v$ is the weight of the shortest edge connecting $v$ to a *known* vertex, and $p_v$ is the last vertex that caused change to $d_v$. In this case we will use a different update rule than dijkstra. After a vertex v is selected, for each *unknown* vertex, w, adjacent to v $d_w = min(d_w,c_{w,v})$. Below we can see the initial table:

<img src="./files/Graphs/prim_1.png" width="200"/>

Initially we select $v_1$ and $v_2, v_3$, and $v_4$ are updated. Resulting in the following table

<img src="./files/Graphs/prim_2.png" width="200"/>

The next vertex selected is $v_4$ because it has the minimum $d_v$ of the *unknown* vertices. Again every vertex adjacent to v is examined except $v_1$ because it is already *known*. All the edges except $v_2$ are updated since it's $d_v$ is smaller than $c_{v_4,v_2}$. Resulting in the following table

<img src="./files/Graphs/prim_3.png" width="200"/>

The next vertex chosen is $v_2$ which we can choose arbitrarily since both $v_2$ and $v_3$ are minimum. $v_2$ does not update any edges so we next select $v_3$ which causes a change in $v_6$ resulting in the following table

<img src="./files/Graphs/prim_4.png" width="200"/>

Then we see the resulting table after selecting $v_7$ which updates $v_6$ and $v_5$.

<img src="./files/Graphs/prim_5.png" width="200"/>

Finally $v_6$ and $v_5$ are selected completing the algorithm. 

<img src="./files/Graphs/prim_6.png" width="200"/>

So based on the final table above the edges in the minimum spanning tree are $(v_1,v_2),(v_3,v_4),(v_4,v_1),(v_5,v_7),(v_6,v_7),(v_7,v_4)$ where the total cost of the tree is 16. Remember that Prim's algorithm only runs on undirected graphs. The running time is $O(\lvert V\rvert^2)$. Similar to Dijkstra's we can improve this algorithm by using a binary heap which improves the running time to $O(\lvert E\rvert\log\lvert V\rvert)$.

### Kruskal's Algorithm
A different greedy strategy is to continually select edges in order of smallest weight and accept an edge if it does not cause a cycle. Kruskal's algorithm maintains a forest (a collection of trees). Initially there are $\lvert V\rvert$ single-node trees. Adding an edge merges two trees into one. When the algorithm finishes there is only one tree and this is the minimum spanning tree. The algorithm terminates when enough edges are accepted. To determine whether an edge $(u,v)$ should be accepted or rejected we can simply use disjoint sets. Two vertices belong to the same set if and only if they are connected in the current forest. If u and v are in the same set then we reject the edge otherwise we add it. For selecting edges we could simply sort the edges in linear time since they are numbers. If they are not integers a transformation could easily performed to convert them. <br>

The worst case running time of Kruskal's algorithm is $O(\lvert E\rvert)$ if using a linear sorting algorithm for selection otherwise using heapsort, quicksort, or mergesort it would be $O(\lvert E\rvert\log\lvert E\rvert)$. The following shows a example of performing Krukal's algorithm:

<img src="./files/Graphs/kruskal_ex.png" width="800"/>

## Depth-First Search
Depth-first search **(DFS)** is a search in which starting at some vertex, $v$, we process $v$ and then recursively traverse all vertices adjacent to $v$. We can think of this as a generalization of preorder traversal on a tree. If we were to perform this on a tree then all vertices are visited in a total of $O(\lvert E\rvert)$ time. If we perform dfs on a graph we need to be careful of encountering a cycle. To prevent from getting stuck in a cycle when we visit a vertex, $v$, we *mark* it visited, to show we have already explored that vertex, and recursively call dfs on all adjacent vertices that have not been visited. Let's look at some template code for dfs and breadth-first search (bfs) to spot the differences:

```python
def dfs(v):
    v.visited = True
    for w in v.adjacent:
        if not w.visited:
            dfs(w)
            
def bfs(v):
    q = Queue()
    q.enqueue(v)
    
    while not q.isEmpty():
        v = q.dequeue()
        v.visited = True
        for w in v.adjacent:
            if not w.visited:
                q.enqueue(w)
```

As we can see with dfs we recursively traverse all the way through one path before going back up and starting another whereas in bfs we search all the adjacent vertices in the order they were visited instead of expanding on a single path. If the graph is undirected and not connected or directed and not strongly connected we will fail to visit some nodes per definition of connected and strongly connected. We can then continue search from an unmarked node and continue the process until all the nodes are marked. In a graph the runtime of dfs is $O(\lvert E\rvert + \lvert V\rvert)$.

### Undirected Graphs
Knowing how dfs works we can also say an undirected graph is connected if and only if a dfs starting from any node visits every node. Lets look at an example of dfs in an undirected graph below starting at vertex A: 
<img src="./files/Graphs/dfs_graph.png" width="400"/>
<img src="./files/Graphs/dfs_example.png" width="400"/>

Where the second graph is a dfs of the first. We also refer to this as a **depth-first spanning tree**. The root of the tree is A. Each edge $(u,v)$ is present in the tree. If when we process $(u,v)$ we find v is unmarked we indicate this with a tree edge. If when we process $(u,v)$ and v is already marked we draw a dashed line which is called a **back edge**, to indicate the edge is not really part of the tree. If the graph is not connected we will have to perform multiple dfs and each generates a tree which is called a **depth-first spanning forest**.

### Biconnectivity
A connected undirected graph is **biconnected** if there are no vertices whose removal disconnects the rest of the graph. For example the graph we just used to create a depth-first spanning tree is biconnected. We would want to make sure a graph is biconnected in the case of building a network system if one of the nodes goes down the network remains connected so the network remains intact. <br>

If a graph is not biconnected, the vertices whose removal would disconnect the graph are known as **articulation points**. The following graph is not biconnected and it's articulation points are C and D as C would disconnect G and D would disconnect E and F. 

<img src="./files/Graphs/biconnected_ex.png" width="400"/>

DFS provides a linear time algorithm to find all the articulation points in a connected graph. First, starting at any vertex, we perform a dfs and number as they are visited. For each vertex, $v$, we call this number $Num(v)$. Then, for every vertex $v$ in the depth-first spanning tree, we compute the lowest-numbered vertex, denoted $Low(v)$, that is reachable from v by taking zero or more tree edges and then possibly one back edge. Lets look at the depth-first spanning tree of the above graph:

<img src="./files/Graphs/biconnected_dfst.png" width="500"/>

The lowest numbered vertex reachable by A, B, and C is 1 because they can take the back edge from D. We can efficiently calculate the $Low(v)$ by performing postorder traversal of the depth-first spanning tree. By definition $Low(v)$ is:
1. Num(v)
2. the lowest $Num(w)$ among all back edges (v,w)
3. The lowest $Low(w)$ among all tree edges (v,w)

All that is left to do is to find the articulation points. The root is an articulation point if and only if it has more than one child. This is because removing the root then disconnects the two nodes. Any other vertex $v$ is an articulation point if and only if v has some child w such that $Low(w) \geq Num(v)$. This condition is always satisfied for the root which is why it has it's own special rule. Lets look at what the result if we started at C:

<img src="./files/Graphs/biconnected_c.png" width="500"/>

We can see C, the root has two children so it's an articulation point. We can also see that D has child E whose $Low(E) \geq Num(D)$ making D an articulation point as well. Let's look at some code below of generating these numbers:
```python
def assign_num(v, counter = 1):
    v.num = counter
    v.visited = True
    for w in v.adjacent:
        if not w.visited:
            w.parent = v
            counter = assign_num(v, counter + 1)
            
    return counter

def assign_low(v, articulation_points = []):
    v.low = v.num # try rule 1 first
    
    for w in v.adjacent:
            if w.num > v.num: # A forward edge
                articulation_points.extend(assign_low(w, articulation_points))
                if w.low >= v.num:
                    articulation_points.extend(w)
                v.low = min(v.low, w.low) # rule 3
            else:
                if v.parent != w: # back edge
                    v.low = min(v.low, w.num) # rule 2
                    
    return articulation_points
```

### Euler Circuits
A popular problem in graph theory and one you may have seen before is that of starting any vertex, $v$, can we trace every edge exactly once and end up back where we started. Let's look at three example graphs below and see intuitively and see if it's possible to do this on any of the three. 

<img src="./files/Graphs/euler_1.png" width="500"/>

This is not possible is either of the first or third graphs but it is possible in the second. This problem in graph theory is called the **euler circuit problem**. The first observation we can make is that the graph must first be connected and each vertex have an even degree. This is because a vertex is entered and then left on different edges. If any vertex has an odd degree, then eventually we will reach the point where only one edge into v is unvisited and taking it will strand us at v. There is a second problem called the **euler tour** which is similar to the euler circuit problem except we do not have to start and stop at the same vertex. In this case if the graph has exactly two odd degree vertices is still possible only if we start in one of the odd degree vertices. If more than two vertices have an odd degree a euler tour is not possible. <br>

With this information we can easily tell if a connected graph has a euler circuit. Since any connected graph whose vertices all have an even degree must have a euler circuit. We can also find this circuit in linear time. Assuming a euler cirtcuit exist (which we can check in linear time) then we can run depth-first search on the graph to find the euler circuit. In this application of depth first search we would terminate when we finally reach the vertex we originally started at. However this is not as trivial as it sounds as depending on the path the dfs takes it may prematurely end back at the starting position before traversing all the edges. 

There do exist algorithms out there that do find the path of the euler circuit in linear time $O(\lvert E\rvert)$. The algorithm we will discuss is called **Hierholzer's algorithm** which was created in 1873. Which works as follows:
1. Start at any vertex v and follow a trail of edges from that vertex until returning to v. It is not possible to get stuck at any vertex other than v because the even degree of vertices ensures that. The tour formed in this way may not cover all vertices and edges of the intiail graph.
2. As long as there exist a vertex $u$ that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from $u$ following unused edges until returning to $u$, and join this tour into the previous. 

When all edges have been traversed we will have the final path of the euler circuit as the result. 