# Graph

**Graph** is a pair of sets `(V,E)`, where `V` - array of vertices (nodes) and `E` - array of edges. 

Example:

`V = {a,b,c,d,e,f,g,h,i}`
 
`E ={(a,b); (b,c); (c,e); (e,h); (h,i); (c,i)}`

## Adjacency Matrix

![alternatvie text](https://www.programiz.com/sites/tutorial2program/files/adjacency-matrix-graph.png)

![image.png](https://www.programiz.com/sites/tutorial2program/files/adjacency-matrix-representation_1.png)

In [6]:
class UndirectedGraph:
    def __init__(self):
        self.graph = {}
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        if v not in self.graph:
            self.graph[v] = []
        self.graph[u].append(v)
        self.graph[v].append(u)

        
g = UndirectedGraph()
g.addEdge(0, 1)
g.addEdge(1, 2)
g.addEdge(0, 2)
g.addEdge(0, 3)
print(g.graph)

{0: [1, 2, 3], 1: [0, 2], 2: [1, 0], 3: [0]}


## Definitions

**A directed graph**, or digraph, is a graph in which edges have orientations.

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Directed_graph_no_background.svg/1280px-Directed_graph_no_background.svg.png)


In [11]:
class DirectedGraph:
    def __init__(self):
        self.graph = {}
        
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        self.graph[u].append(v)

        
g = DirectedGraph()
g.addEdge(0, 1)
g.addEdge(1, 2)
g.addEdge(0, 2)
g.addEdge(0, 3)
print(g.graph)

{0: [1, 2, 3], 1: [2]}


**A weighted graph** is a graph in which a number (the weight) is assigned to each edge.

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Weighted_network.svg/1920px-Weighted_network.svg.png)


In [12]:
class WeightedGraph:
    def __init__(self):
        self.graph = {}
        
    def addEdge(self, u, v, w):
        if u not in self.graph:
            self.graph[u] = []
        self.graph[u].append([v,w])

        
g = WeightedGraph()
g.addEdge(0, 1, 1)
g.addEdge(1, 2, -4)
g.addEdge(0, 2, 2)
g.addEdge(0, 3, 1)
print(g.graph)

{0: [[1, 1], [2, 2], [3, 1]], 1: [[2, -4]]}


**A multigraph** is a graph that is permitted to have multiple edges (also called parallel edges), that is, edges that have the same end nodes. Thus, two vertices may be connected by more than one edge.

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Multi-pseudograph.svg/800px-Multi-pseudograph.svg.png)



## Node degree

**The degree** of a vertex equals to the number of edges incident to this vertex.

**Odd vertex** v: degree(v) - odd

**Even vertex** v: degree(v) - even

For directed graphs:

**Indegree** - the number of incoming edges

**Outdegree** - the number of outcoming edges

Node is **balanced** if indegree equals outdegree

Node is **semi-balanced** if indegree differs from outdegree by 1

A graph is balanced if all of its nodes are balanced.

In [21]:
class DirectedGraph:
    def __init__(self):
        self.graph = {}
        self.degree = {}
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        if u not in self.degree:            
            self.degree[u] = [0, 0] # [outdegree, indegree]
        if v not in self.degree:            
            self.degree[v] = [0, 0] # [outdegree, indegree]
            
        self.graph[u].append(v)
        self.degree[u][0] += 1 # increment u outdegree
        self.degree[v][1] += 1 # increment v indegree
    def isBalanced(self, u):
        return self.degree[u][0] == self.degree[u][1]
    def isSemiBalanced(self, u):
        return abs(self.degree[u][0] - self.degree[u][1]) == 1

            
g = DirectedGraph()
g.addEdge(0, 1)
g.addEdge(1, 2)
g.addEdge(0, 2)
g.addEdge(0, 3)

print(g.degree[2])
print(g.isBalanced(2))
print(g.isBalanced(1))
print(g.isSemiBalanced(3))

[0, 2]
False
True
True


# DFS

**Depth-first search (DFS)** is an algorithm for traversing or searching tree or graph data structures. 

![alternatvie text](https://www.codesdope.com/staticroot/images/algorithm/dfs.gif)

Time complexity: $O(V+E)$

In [1]:
class Graph:
    def __init__(self):
        self.graph = {}
        
    # function to add an edge to graph
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        self.graph[u].append(v)
 
    def dfs(self, v, visited):
        visited.add(v)
        for neighbour in self.graph[v]:
            if neighbour not in visited:
                self.dfs(neighbour, visited)
 
    def DFSMain(self, v):
        visited = set()
        self.dfs(v, visited)
        return visited
        
g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
g.addEdge(4, 5)
g.addEdge(5, 6)
g.addEdge(6, 5)


print(g.DFSMain(2))


{0, 1, 2, 3}


In [10]:
visited = set()
for node in g.graph:
    if node not in visited:
        visited.update(g.DFSMain(node))

print(visited)

{0, 1, 2, 3, 4, 5, 6}


-
## Connectivity 

A **path** is a finite or infinite sequence of edges that joins a sequence of vertices.

Example:

Path from $v_0$ to $v_4$: 

{$v_0e_1v_1e_2v_2e_3v_3e_4v_4$}

A **simple path** is a path in a graph which does not have repeating vertices.

### Undirected graph connectivity

In an undirected graph G, two vertices u and v are called **connected** if G contains a path from u to v.

A graph is said to be **connected** if every pair of vertices in the graph is connected.

### Directed graph connectivity

A directed graph is called **weakly connected** if replacing all of its directed edges with undirected edges produces a connected (undirected) graph.

It is **strongly connected**, or simply strong, if it contains a directed path from u to v and a directed path from v to u for every pair of vertices u, v.

A **connected component** is a maximally connected subgraph of an undirected graph. Each vertex belongs to exactly one connected component.

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e1/Scc-1.svg/440px-Scc-1.svg.png)


**We can count connected components in undirected graph using DFS algorithm:**

We increase the number of connected components by 1 until we visit all vertices during dfs.

## Cycles

**A cycle** in a graph is a non-empty trail in which only the first and last vertices are equal. A directed cycle in a directed graph is a non-empty directed trail in which only the first and last vertices are equal. A graph without cycles is called an acyclic graph.

**We can find cycles using DFS**

If we found a gray vertex during dfs (back edge), then the cycle exists.




## Spanning tree 

**A spanning tree T** of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G.

- We can get a spanning tree of a graph as a set of tree edges during DFS:

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Tree_edges.svg/1280px-Tree_edges.svg.png)


# Breadth first search

Vertices are processed in order of increasing distance from the starting vertex. A queue is used to store the nodes to visit.

- Black - vertex that has been extracted from the queue

- Gray - vertex that is in the queue

- White - vertex that has not yet been processed

![Title](https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif)

Time complexity: $O(V+E)$

In [17]:
from collections import deque
class Graph:
    def __init__(self):
        self.graph = {}
        self.vertices = set()
 
    # function to add an edge to graph
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        self.graph[u].append(v)
        self.vertices.add(u)
        self.vertices.add(v)

    def bfs(self, v):
        visited = {}
        for u in self.vertices:
            visited[u] = False
        queue = deque([])
        queue.append(v)
        visited[v] = True
        while queue:
            current_node = queue.popleft()
            print(current_node, end=' ')
            for u in self.graph[current_node]:
                if not visited[u]:
                    queue.append(u)
                    visited[u] = True
                
g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
g.bfs(2)

2 0 3 1 

#  Eulerian walk

The task is to find a path (or cycle) that passes through all the edges of the graph once in an undirected graph. The corresponding path or cycle is called **Eulerian**.

![alternatvie text](https://networkx.org/nx-guides/_images/part1.png)



### G - connected undirected graph

An Euler path exists if and only if G has at most two odd vertices

An Euler cycle exists if and only if all vertices are even.


=> 

When we process some vertex, we use 2 edges, all intermediate vertices included to the path should be even. 

<=

###  G - connected directed graph

An Euler cycle exists if and only if all vertices are balanced.

An Euler path exists if and two vertices are semi-balanced and other vertices are balanced.


In [25]:
class DirectedGraph:
    def __init__(self):
        self.graph = {}
        self.degree = {}
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        if u not in self.degree:            
            self.degree[u] = [0, 0] # [outdegree, indegree]
        if v not in self.degree:            
            self.degree[v] = [0, 0] # [outdegree, indegree]
            
        self.graph[u].append(v)
        self.degree[u][0] += 1 # increment u outdegree
        self.degree[v][1] += 1 # increment v indegree
    def getDegreeCount(self):
        n_balanced = 0
        n_semibalanced = 0
        n_other = 0
        for u in self.degree:
            if self.degree[u][0] == self.degree[u][1]:
                n_balanced +=1
            elif abs(self.degree[u][0] - self.degree[u][1]) == 1:
                n_semibalanced +=1
            else:
                n_other+=1
        return n_balanced, n_semibalanced, n_other

g = DirectedGraph()
g.addEdge(0, 1)
g.addEdge(1, 2)
g.addEdge(0, 2)
g.addEdge(0, 3)

n_balanced, n_semibalanced, n_other = g.getDegreeCount()
print(n_balanced)
print(n_semibalanced)
print(n_other)

isConnected = True # we need to check if a directed graph is weakly connected

if isConnected and n_other == 0 and n_semibalanced == 2:
    print("Eulerian path exists")

if isConnected and n_other == 0 and n_semibalanced == 0:
    print("Eulerian cycle exists")


1
1
2


## How to find an Eulerian path: 

- Connect two odd vertices (path start and path finish) with a new edge. Now the resulting graph contains an Eulerian cycle
- Start walking from random vertex and move along some outcoming edge, remove this edge from the graph. Add current node to result path
...
- Correct the result so that it starts at the path start vertex and ends at the path finish vertex.




![alternatvie text](img/eulerian_walk.png)


In [39]:
class DirectedGraph:
    def __init__(self):
        self.graph = {}
        self.degree = {}
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        if u not in self.degree:            
            self.degree[u] = [0, 0] # [outdegree, indegree]
        if v not in self.degree:            
            self.degree[v] = [0, 0] # [outdegree, indegree]
            
        self.graph[u].append(v)
        self.degree[u][0] += 1 # increment u outdegree
        self.degree[v][1] += 1 # increment v indegree
        
    def getDegreeCount(self):
        n_balanced = 0
        n_semibalanced = 0
        n_other = 0
        for u in self.degree:
            if self.degree[u][0] == self.degree[u][1]:
                n_balanced +=1
            elif abs(self.degree[u][0] - self.degree[u][1]) == 1:
                n_semibalanced +=1
            else:
                n_other+=1
        return n_balanced, n_semibalanced, n_other
    def hasEulerianPath(self):
        n_balanced, n_semibalanced, n_other = g.getDegreeCount()
        return n_other == 0 and n_semibalanced == 2
    def hasEulerianCycle(self):
        n_balanced, n_semibalanced, n_other = g.getDegreeCount()
        return n_other == 0 and n_semibalanced == 0
    
    def isEulerian(self):
        n_balanced, n_semibalanced, n_other = g.getDegreeCount()
        isConnected = True # we need to check if a directed graph is weakly connected
        if isConnected and n_other == 0 and n_semibalanced == 2:
            return True
        if isConnected and n_other == 0 and n_semibalanced == 0:
            return True
    def FindStart(self):
        for u in self.degree:
            if self.degree[u][0] - self.degree[u][1] == 1:
                return u
    def FindFinish(self):
        for u in self.degree:
            if self.degree[u][1] - self.degree[u][0] == 1:
                return u
            
    def eulerianPath(self):
        if not self.isEulerian():
            return {}
        g = self.graph.copy()
        result_eulerian_path = []

        # 1. if eulerianPath connect start node with finish to create cycle
        if self.hasEulerianPath():
            self.start_node = self.FindStart()
            self.finish_node = self.FindFinish()
            if self.finish_node not in g:
                g[self.finish_node] = []
            g[self.finish_node].append(self.start_node)

        # 2. get Eulerian cycles
        start = next(iter(self.graph)) # pick arbitrary starting node

        def __visit(n):
            while len(g[n]) > 0:
                dst = g[n].pop()
                __visit(dst)
            result_eulerian_path.append(n)

        __visit(start)
        result_eulerian_path = result_eulerian_path[::-1][:-1]

        if self.hasEulerianPath():
            # Adjust node list so that it starts at head and ends at tail
            sti = result_eulerian_path.index(self.start_node)
            result_eulerian_path = result_eulerian_path[sti:] + result_eulerian_path[:sti]

        # Return result_eulerian_path
        return result_eulerian_path

g = DirectedGraph()
g.addEdge(0, 1)
g.addEdge(1, 5)
g.addEdge(1, 2)
g.addEdge(2, 3)
g.addEdge(3, 1)

g.eulerianPath()

[0, 1, 2, 3, 1, 5]

# De novo Assembly

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/b/b6/Types_of_sequencing_assembly.png)




## Shortest common superstring

Given a set of strings S, find shortest string that contains all strings in S as substrings.

NP-complete

This problem is similar to the builing assembly from reads

Algorithm result depends on the order we concatinate strings

![alternatvie text](img/scs_not_greedy.png)


n! different orderings to check


## Suffix-prefix match

Reads that come from the same region of the genome can overlap:

![alternatvie text](img/suf_pref.png)

Mismatches because of:

- sequencing errors
- polyploidy

More coverage - more and longer overlaps

## Overlap graph

Vertices are sequences, they are connected with directed edge if they overlap (from a vertex that have a common substring as a suffix to a vertex where it is a prefix). Each edge is lables with a length of overlap

We set some threshold length of an overlap between nodes

![alternatvie text](img/overlap_graph.png)



A sequence of the original genome can be build by walking a path that visits every vertex in this graph


**Hamiltonian path** - a path that visits every vertex in the graph exactly once, NP-Complete problem

## Greedy algorithm

At each time we neew to peek an edge that represents the lognest overlap in overlap graph

Result is not always optimal

![alternatvie text](img/scs_greedy.png)

**Overlap graph can eliminate repeats that are present in common string (genome) -> we want to update our model** 

![alternatvie text](img/overlap_graph_repeats.png)


##  De Bruijn graph

Building a De Bruijn graph helps us to overcome this repeat collapsing problem

Let's assume that our reads are k-mers from the genome

Split each k-mer to left and right k-1-mer (1 base difference) - L and R, draw an edge from L to R

Each k-mer in the genome corresponds to one **edge** in this graph

![alternatvie text](img/de_bruijn.png)

How to construct an original sequence from De Bruijn graph -> Eulerian path in graph

What if there is more than one Eulerian path in graph?

In [None]:
#######################################################3

## Topological sorting

Directed graph $G(V,E)$

Topological sorting is the ordering of the vertices V such that if $(u,v) \in E$ then u is located in an ordered array before v. There can be many valid topological sorts.

Analogy: vertices are actions. Some actions can occur in parallel, other actions must follow one after the other. We need to build a sequence of actions (schedule).

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/thumb/0/03/Directed_acyclic_graph_2.svg/610px-Directed_acyclic_graph_2.svg.png)

- 5, 7, 3, 11, 8, 2, 9, 10 (visual top-to-bottom, left-to-right)
- 3, 5, 7, 8, 11, 2, 9, 10 (smallest-numbered available vertex first)
- 5, 7, 3, 8, 11, 10, 9, 2 (fewest edges first)
- 7, 5, 11, 3, 10, 8, 9, 2 (largest-numbered available vertex first)
- 5, 7, 11, 2, 3, 8, 9, 10 (attempting top-to-bottom, left-to-right)
- 3, 7, 8, 5, 11, 10, 2, 9 (arbitrary)

![alternatvie text](img/topological_sort_dfs.png)


The graph is required to be acyclic.

To do topological sorting we run DFS and save the sequence of new black nodes.


For each vertex v, we can set $\phi(v) = |V|+1-leave[v]$ as a position in topologically sorted array.

 $\phi(u) < \phi(v)$ for each $(u,v) \in E$:
 

## Kosaraju's algorithm

How to find strongly connected components in graph $G(V,E)$?

1) We construct a transpose graph $H = G^t$

2) DFS(H), collect all $leave_H[v]$

3) DFS(G), iterate over the vertices in descending order of their $leave_H[v]$. Spanning trees of this DSF call contain the vertices of strongly connected components.

![alternatvie text](img/kosaraju.png)

<details>
  <summary>Proof (click to expand)</summary>
    
- First, we will prove that each strongly connected component (SCC) is completely contained in a tree:

t and s are vertices from one SCC. Then the paths from s to t and from t to s exist. v is the first vertex in dfs path s -> t -> s. Then at moment entry[v] s and t are reachable from v via white paths and according to Lemma 2 they will be processed.

![alternatvie text](img/Kosaraju_1.png)

- Second, one tree contains only one SCC

If C - SCC, then leave[C]=max(leave[v]) for all $v \in C$ 

Lemma

C, C'  - SCCs, edge (u,v) connects C and C'. Then leave[C] > leave[C']. 

a) If C was processed before C': w is first vertex in C during dfs. At the time of entering the C component, the entire C' component is white. leave[w] > leave[C']

b) If C' was processed before C: No path from C' to C exists => whole C' will be processed before C and leave[C] > leave[C']

![alternatvie text](img/Kosaraju_2.png)

Let's show that one tree T contains only one SCC.

If T contains two SCC: C and C' and C is the first component processed in DFS(G). Edge (u,v) that connects C and C' exists.

$leave_H[C] > leave_H[C'] $ (This follows from how we constructed T)

However $leave_H[C] < leave_H[C'] $ follows from Lemma if it is applied to DFS(H).

</details>

# Bridges

## Edge-connection

Vertices u and v in an *undirected graph* are **edge-connected** if two different paths exist between these vertices that do not intersect in edges (**edge-disjoint paths**).

![alternatvie text](img/edge_connection.png)

Edge-connectivity is an equivalence relation:

- $u \sim u$
- $u \sim v \implies v \sim u$
- $u \sim v, v \sim w \implies u \sim w$


Let's show transitivity:

1) $u \sim v \implies$ We define cycle $c$ - a union of two edge-disjoint paths from u to v

2) $v \sim w \implies$  P1 and P2 are the two edge-disjoint paths from w to v. P1 and P2 intersect C in 2 vertices - a and b; a or b can be the same vertex as v

We can build 2 edge-disjoint paths from w to u:

- u -> a -> w

- u -> v -> b -> w

![alternatvie text](img/edge_conn_trans.png)


All graph vertices are partitioned into edge-connected equivalence classes:

![alternatvie text](img/bridges.png)

**Bridge**
is:

1) an edge connecting 2 different edge-connected components

2) an edge, upon removal of which the connected component breaks up

3) an edge that lies on any path connecting u with v.

$1 \implies 2$ : If we delete bridge edge, we won't find the second path between two vertices connected by the bridge

$2 \implies 3$ : If other path from u to v that do not contain a bridge exist, then the connected component would not break up

$3 \implies 1$ : If u and v belong to the same edge-connected component,then 2 edge-disjoint paths exist between u and v


### Finding bridges in graph

We can find bridges in graph using DFS: 

Suppose that during DFS, we pass an edge from vertex v to vertex to: $(v,to)$

If there is no path of back edges to the vertex v or any of its ancestors while performing DFS from $to$ vertex then $(v,to)$ is a bridge. There is no other path from v to *to* except $(v,to)$ edge.

![alternatvie text](img/dfs_bridges.png)

To find a bridge we need to check that this condition is met.

During DFS for each vertex v compute value $lowest[v]$:

$$lowest[v]=\begin{equation}
min\left\{ 
  \begin{aligned}
    entry[v]\\
    entry[p]\\
    lowest[to]\\
  \end{aligned}
  \right.
\end{equation}
$$

where:
- entry - entry time in DFS
- $p$ is a gray vertex from back edge $(v,p)$, 
- $to$ is a child vertix of v in spanning tree of DFS

$(v,to)$ is a bridge if when leaving vertex $to$

$lowest[to] > entry[v]$

We failed to find **any child vertex of $to$** that have back edge connected with the vertex which entry time is less than $entry[v]$

![alternatvie text](img/dfs_bridges_lowest.png)


In [1]:
class Graph:
    def __init__(self):
         self.graph = {}
    # function to add an edge to graph
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        self.graph[u].append(v)
 
    def dfs(self, v, visited, entry, lowest, time, p = -1):
        visited.add(v)
        time = time+1
        entry[v] = time
        lowest[v] = time
        print(v, end=' ')
        for to in self.graph[v]:
            if to == p:
                continue
            if to not in visited:
                self.dfs(to, visited, entry, lowest, time, v)
                lowest[v] = min(lowest[v], lowest[to])
                if lowest[to] > entry[v]:
                    print(f"Bridge found ({v},{to})")
            else:
                lowest[v] = min(lowest[v], entry[to])
 
    def DFSMain(self, v):
        visited = set()
        entry = {}
        lowest = {}
        time = 0
        self.dfs(v, visited, entry, lowest, time)
        
g = Graph()
g.addEdge(0, 1)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(3, 4)
g.addEdge(4, 5)
g.addEdge(5, 3)
g.addEdge(0, 3)
g.DFSMain(2)

2 0 1 3 4 5 Bridge found (0,3)


# Articulation points

## Vertex-connection

Two edges in a graph are **vertex-connected** if two vertex-disjoint paths that connect their ends exist.

![alternatvie text](img/vertex_connection.png)

All graph edges are partitioned into vertex-connected equivalence classes:

- $(a,b) \sim (a,b)$
- $(a,b) \sim (c,d) \implies (c,d) \sim (a,b)$
- $(a,b) \sim (c,d), (c,d) \sim (e,f) \implies (a,b) \sim (e,f)$

Let's show transitivity:

- A path from f to d and a path from e to c form a cycle $C$
- P is the path from a to c and Q is the path from b to d, P and Q intersect the cycle C in vertices x and y

We can find two vertex-disjoint paths that connect b and e and a and f:

- Path1: b -> y -> e
- Path2: a -> x -> c -> d -> f

![alternatvie text](img/vert_conn_trans.png)


All graph edges are partitioned into vertex-connected equivalence classes:

![alternatvie text](img/art_points.png)


**Articulation point** is 

1) a vertex, that is incident to edges belonging to two or more vertex-connected connected components

2) a vertex, upon removal of which, together with the edges incident to it, the connected component breaks up

$1 \implies 2$ : If we look at any pair of vertices connected to an articulation point with edges from different vertex-connected connected components, a path that connects these two vertices but does not contain the articulation point does not exist. If we remove the articulation point, the connected component breaks up

$2 \implies 1$ : Suppose that incident edges of a removed articulation point belong to one vertex-connected connected component. Then any two neighbors of the articulation point are from the edges of the same vertex-connected component. Then a path exists that connects this two vertices but does not contain the articulation point. It means that any two vertices connected to the articulation point stay in one connected component.

### Finding articulation points in graph

We can find articulation points in graph using DFS: 

1) Check all vertices except the starting one

2) Check starting vertex

- Let $v$ not be the starting vertex. Then if there is a child of vertex  $v$ in the DFS tree that does not have a back edge in any of $v$ ancestors, then this vertex is an articulation point
- If $v$ is the starting vertex, then it is an articulation point if and only if unvisited vertices exist among those connected to $v$ after the DFS from the first vertex connected to $v$.


$$lowest[v]=\begin{equation}
min\left\{ 
  \begin{aligned}
    entry[v]\\
    entry[p]\\
    lowest[to]\\
  \end{aligned}
  \right.
\end{equation}
$$

$(v,to)$ is a bridge if when leaving vertex to

$v$ is not the starting vertex, $v$ is an articulation point if vertex $to$ exist such as when leaving vertex $to$:

$lowest[to] \geq entry[v]$

If $v$ is the starting vertex, then it is an articulation point if, after processing one adjacent vertex, the graph still has unprocessed vertices connected to $v$. 

![alternatvie text](img/dfs_art_point_lowest.png)


## Minimum spanning tree

**A minimum spanning tree** is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.

![alternatvie text](https://he-s3.s3.amazonaws.com/media/uploads/146b47a.jpg)


## Prim's algorithm


At each step, we select a node with the minimum distance to the already constructed part of the minimum spanning tree

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/9/9b/PrimAlgDemo.gif)

Prim's algorithm can be implemented using priority queue.


Steps:

1) Initialize a tree with an arbitrarily chosen vertex. Allocate a priority queue and add all nodes adjacent to starting node, priority of each item = weight of the corresponding edge, save the corresponding edge

2) Dequeue node with the highest priority and add the corresponding edge to the minimum spanning tree

3) At each step for the node last added to the tree: add all adjacent vertices, update the priority and rewrite edge if the edge has smaller weight

(until all vertices are in the tree).

![alternatvie text](img/prim.png)

Time complexity = $O((|V|+|E|) \cdot log(|V|)$ for binary heap.


## Kruskal algorithm

1) Sort all edges of the graph by weight in ascending order

2) Init |V| trees - each vertex is a tree

3) Iterate over a sorted list of edges:

- If an edge connect nodes from different trees - merge the trees into one

- If an edge connect nodes from the same trees - spit the edge

![alternatvie text](img/kruskal.png)

We want to quickly check if two current vertices are in the same tree.

We want to quickly merge two sets of vertices


##  Disjoint Set Union (DSU)

Interface:

1) Create(u) - create a set with one element u

2) Find(u) - get the identifier of the set to which the element u belongs. Follow parent pointers back to the root

3) Union(u,v) - get the union of two sets: a set with an element u and a set with an element v. Make one root a child of the other. 

Each set is stored as a tree. Each element in the set is connected to the reference element.

![alternatvie text](https://www2.hawaii.edu/~nodari/teaching/s18/Notes/Topic-16/disjoint-set-union-alt.jpg)


### Path Compression

While the find method is running, all vertices encountered on the path will update their reference element. They will point to the root.

![alternatvie text](https://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture18/img035.gif)


### Union by Rank
How to select the root element when merging two trees?

The height of each subtree is **rank**
Set the subtree with the smaller height (rank
make the root of the tree of lower height a child of the root of the tree with larger height.

- union of two trees of equal rank increments the rank of the final tree by one.

- union of two trees of unequal rank, the tree with lower rank becomes the child, and ranks are unchanged. 

![alternatvie text](img/dsu_rank.png)


##  Shortest path problem

**The shortest path** problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized.

In unweighted graph (weight of each edge = 1) we can using a modified version of BFS algorithm. 
At each moment of time in the process of BFS, there are vertices in the queue that are at a distance k from the start, and behind them - at a distance k + 1. We keep storing the predecessor of a given vertex while doing the breadth-first search to restore the path. 


In [25]:
from collections import deque
class Graph:
    def __init__(self):
        self.graph = {}
        self.vertices = set()
 
    # function to add an edge to graph
    def addEdge(self, u, v):
        if u not in self.graph:
            self.graph[u] = []
        self.graph[u].append(v)
        self.vertices.add(u)
        self.vertices.add(v)

    def bfs(self, v):
        dist = {} # shortest distance to given vertex
        pi = {} # previous node from the shortest path
        for u in self.vertices:
            dist[u] = float("inf")
            pi[u] = -1
        queue = deque([])
        queue.append(v)
        dist[v] = 0
        pi[v] = -1
        while queue:
            current_node = queue.popleft()
            for u in self.graph[current_node]:
                if dist[u] > dist[current_node]+1:
                    dist[u] = dist[current_node]+1
                    pi[u] = current_node
                    queue.append(u)
        print(dist)
                
g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
g.bfs(2)

{0: 1, 1: 2, 2: 0, 3: 1}


## Dijkstra's algorithm

Dijkstra's algorithm can be used to solve shortest path problem in weighted graph.

- $G$ - directed weighted graph
- all weights are **non-negative**.


Dijkstra's algorithm works like BFS, but uses a priority queue instead of a regular queue. Distance to the node is set as a priority of this node in a priority queue. If we reach a vertex that is already in the queue using shorter path, then this vertex will change its position in the priority queue.

The next vertex extracted from the queue is the unvisited vertex closest to the start.

Relaxation function - updating minimum distance to the current node

In [10]:
# Relaxation
# d - shortest path weight, w -weights, pi - parent from the shortest path, u, v - vertices
def relax(d, w, pi, u, v):
    if d[v] > d[u] + w[(u,v)]:
        d[v] = d[u] + w[(u,v)]
        pi[v] = u

![alternatvie text](https://upload.wikimedia.org/wikipedia/commons/5/57/Dijkstra_Animation.gif)


Look at the implementation at https://github.com/denizetkar/priority-queue/blob/main/graph.py

T = adding node to queue + update node in priority queue. We use binary heap to implement a priority queue.

$T = O(E*log(V)+V*log(V)) = O((E+V)*log(V))$

### Shortest path problem classification

- Single pair shortest path problem - finding the shortest path between two vertices:

Dijkstra's algorithm, A*, IdaStar
- Single source shortest paths problem - finding the shortest paths from the selected vertex to all the others:

Dijkstra's algorithm, Bellman–Ford algorithm
- All pairs shortest paths problem - finding shortest paths between all pairs of vertices:

Floyd–Warshall algorithm, Johnson's algorithm

