# Graphs
In this lecture we will discuss several common problems in graph theory. The algorithms we discuss are not only useful in practice, they are used in real-life applications. In this lecture we will discuss:
1. A brief introduction to graphs
2. Topological Sort
3. Shortest-Path Algorithms
4. Network Flow Problems
5. Minimum Spanning Tree
6. Depth First Search

## Definitions
A **graph** $G = (V,E)$ consists of a set of **vertices**, V, and a set of **edges**, E. Each edge is a pair $(v,w)$, where $v,w \in V$. If the pair is ordered, or the matrix of edges is not symmetric, then the graph is directed. A vertex w is **adjacent** to v if and only if $(v,w) \in E$. In an **undirected** graph if there is an edge $(v,w) \in E \rightarrow (w,v) \in E$, in other words the matrix of edges is symmetric. Sometimes an edge has a third component, known as either a **weight** or **cost** to take it. <br>

A **path** in a graph is a sequence of vertices $w_1,w_2,w_3,...,w_n$ such that $(w_i,w_{i+1}) \in E$ for $1 \leq i \leq n$. The **length** of a path is the number of edges on the path, which is equal to $n-1$. We also allow for a vertex to have a path to itself which would be of length 0. This allows for us to define a special kind of edge. If the graph contains an edge $(v,v)$ from a vertex to itself, then the path v,v is referred to as a **loop**. A **simple path** is a path such that all vertices are distinct, except that the first and last vertex could be the same. <br>

A **cycle** in a directed graph is a path of length at least 1 such that $w_1 = w_n$. This cycle is simple if the path is simple. For undirected graphs, we require that the edges be distinct. This is defined so that the path u,v,u is not a cycle, because $(u,v)$ and $(v,u)$ are the same edge. So we define a cycle in an undirected graph a path, $p = w_1,w_2,...,w_n$, where $w_1 = w_n$ and $\forall(w_i,w_{i+1}) \in p\,,(w_{i+1},w_i) \not\in p$. However, in a directed graph these are different edges, so the path $p=u,v,u$ such that $(u,v),(v,u) \in E$. A directed graph is **acyclic** if it has no cycles, we also refer to this graph as a **DAG** (directed acyclic graph). <br>

An undirected graph is **connected** if there is a path from every vertex to every other vertex. A directed graph with this property is called **strongly connected**. If a directed graph is not strongly connected, but the underlying graph (without directed edges) is connected then it is said to be **weakly connected**. A **complete graph** is a graph in which there is an edge between every pair of vertices. <br>

An real world example that can be modeled as a graph is a road system. Each intersection is a vertex and each street is a directed edge. You could also associate a cost with each edge being speed limit, distance or time it takes to travel from one intersection to another. 

## Representations of Graphs
One simple way to represent a graph is to use a two-dimensional array known as an **adjacency matrix**. For each edge $(u,v) \in E$, we set A\[u\]\[v\] to 1; otherwise it's 0. If there is a weight associated to the edge we could simply set the value to the weight and for edges that don't exist we could set the weight to $\infty/-\infty$ (depending on the problem we are trying to solve) or null. <br>

However if the graph is **sparse**, meaning $\mid E \mid \lt \mid V \mid^2$, a better solution is to use an **adjacency list**. This is because a sparse matrix will have a bunch of empty entries taking up more space than neccessary. 

## Topological Sort
A **topological sort** is an ordering of vertices in a directed acyclic graph, such that there is a path from $v_i$ to $v_j$, then $v_j$ appears *after* $v_i$ in the ordering. Also the ordering is not unique as there are multiple legal sorts of a directed acyclic graph. It is easy to see topological ordering cannot be applied on a cyclic graph because for two vertices $v$ and $w$ on the cycle $v$ precedes $w$ and $w$ precedes $v$. If we look at the graph below $v_1, v_2, v_5, v_4, v_3, v_7, v_6$ and $v_1, v_2, v_5, v_4, v_7, v_6, v_3$ are both topological orderings.
<img src="./files/Graphs/topological_sort.png" width="400"/>

We can create a simple algorithm to find a topological ordering is first to find any vertex with no incoming edges. We define the **indegree** of a vertex $v$ as the number of edges $\mid\{(u,v) \in E\}\mid$, meaning the number of edges going into $v$. If we make the assumption that each vertex keeps track of it's indegree, which we can easily calculate then the following algorithm would be simple. The initial thought is to look for vertices that always have an indegree of 0 then assign it a ordering. Whenever we find one we decrease the indegree of it's adjacent vertices by one and find another vertex with indegree 0 and does not currently already an ordering. We repeat this until all the vertices have an order. Let's look at the code below:
```python
def find_next_vertex():
    # Iterate through the vertices in the graph
    for v in V:
        if v.indegree == 0 and v.ordering == None:
            return v
   
    return None

def top_sort():
    for i in range(len(V)):
        v = find_next_vertex()
        v.ordering = i
        for w in v.adjacent:
            w.indegree -= 1
```

While this code works it is inefficient as the running time is $O(V^2)$ since we iterate through the list of vertices each time we want to find a new vertex. Since we are only decreasing the indegree of vertices adjacent to the current vertex we know that the next 0 indegree vertex must be in that list. Therefore we could use a queue to keep track of all vertices we encounter with an indegree of 0 while iterating through the list of neighbors. Also in doing this we are assured to never encounter a vertex that already has an ordering unless the graph has a cycle. The topological ordering is then the order at which the vertices are dequeued. This algorithm also runs faster is $O(V+E)$ if we use an adjancency list. Let's look at the algorithm with queues.
```python
def top_sort():
    q = Queue()
    counter = 0 # keeps track of our ordering
    
    # Gather initial vertex
    for v in V:
        if v.indegree == 0:
            q.enqueue(v)
    
    while not q.isEmpty():
        v = q.dequeue()
        v.ordering = counter
        counter += 1
        
        for w in v.adjacent:
            w.indegree -= 1
            if w.indegree == 0:
                q.enqueue(w)
       
        if counter > len(V):
            raise Exception # we have encountered a cycle
```

## Shortest-Path Algorithms
A popular problem in graph theory is finding the shortest path from an initial vertex $v$ to another vertex $w$. You can think of this as of how Google maps finds the shortest route when looking up directions. In these problems the input is a weighted graph where each edge $(v_i,v_j)$ has an associated cost $c_{i,j}$ to traverse (or take) that edge. The cost of a path $v_1v_2...v_n = \sum_{i=1}^{n-1} c_{i,i+1}$. We refer to this as the **weighted path length**. The **unweighted path length** is just the number of edges on the path, which is n-1. <br>

##### Single-Source Shortest-Path Problem
Given as input a weighted graph, G = (V,E), and a distinguished vertex, s, find the shortest weighted path from s to every other vertex in G. <br>

For example in the following graph the shortest weighted path from $v_1$ to $v_6$ has a cost of 6 and a path of $v_1,v_4,v_7,v_6$
<img src="./files/Graphs/ex_one_shortest_path.png" width="400"/>

However what happens if the graph has negative cost edges like the graph below?
<img src="./files/Graphs/ex_two_shortest_path_neg.png" width="400"/>