# <center>Graph Search, Shortest Paths, and Data Structures</center>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Breath-First-Search-(BFS)" data-toc-modified-id="Breath-First-Search-(BFS)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Breath First Search (BFS)</a></span><ul class="toc-item"><li><span><a href="#Algorithm-description" data-toc-modified-id="Algorithm-description-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Algorithm description</a></span></li><li><span><a href="#Shortest-Paths" data-toc-modified-id="Shortest-Paths-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Shortest Paths</a></span></li><li><span><a href="#Connected-Components-via-BFS" data-toc-modified-id="Connected-Components-via-BFS-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Connected Components via BFS</a></span></li></ul></li><li><span><a href="#Deep-First-Search-(DFS)" data-toc-modified-id="Deep-First-Search-(DFS)-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Deep First Search (DFS)</a></span><ul class="toc-item"><li><span><a href="#Algorithm-description" data-toc-modified-id="Algorithm-description-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Algorithm description</a></span></li><li><span><a href="#Strictly-Connected-Components-(SCC)" data-toc-modified-id="Strictly-Connected-Components-(SCC)-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Strictly Connected Components (SCC)</a></span></li><li><span><a href="#Course-example" data-toc-modified-id="Course-example-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Course example</a></span></li></ul></li><li><span><a href="#Dijkstra" data-toc-modified-id="Dijkstra-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Dijkstra</a></span><ul class="toc-item"><li><span><a href="#Presentation" data-toc-modified-id="Presentation-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Presentation</a></span></li><li><span><a href="#Properties" data-toc-modified-id="Properties-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Properties</a></span></li><li><span><a href="#Tests" data-toc-modified-id="Tests-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Tests</a></span></li></ul></li><li><span><a href="#Heaps-&amp;-Balanced-Binary-Search-Trees" data-toc-modified-id="Heaps-&amp;-Balanced-Binary-Search-Trees-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Heaps &amp; Balanced Binary Search Trees</a></span><ul class="toc-item"><li><span><a href="#Heap-properties" data-toc-modified-id="Heap-properties-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Heap properties</a></span></li><li><span><a href="#Median-Maintenance" data-toc-modified-id="Median-Maintenance-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Median Maintenance</a></span></li></ul></li><li><span><a href="#Hashing-and-Bloom-Filters" data-toc-modified-id="Hashing-and-Bloom-Filters-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Hashing and Bloom Filters</a></span></li></ul></div>

# Graph Search And Connectivity
<img src="Images/BFS_vs_DFS.png" width="600"/>

## Breath First Search (BFS)

### Algorithm description
<img src="Images/BFS_Code.png" width="600"/>

In [1]:
from BFS import *

graph_example = {'s': ['a', 'b'],
                 'a': ['s', 'c'],
                 'b': ['s', 'c', 'd'],
                 'c': ['a', 'b', 'd', 'e'],
                 'd': ['b', 'c', 'e'],
                 'e': ['c', 'd']}

# Example as in the course 
start_vertex = 's'; BFS_display(graph_example, start_vertex)
start_vertex = 'a'; BFS_display(graph_example, start_vertex)
start_vertex = 'e'; BFS_display(graph_example, start_vertex)

Start exploration with the node s
- Exploring Order: ['s', 'a', 'b', 'c', 'd', 'e']
- Layers: {0: ['s'], 1: ['a', 'b'], 2: ['c', 'd'], 3: ['e']} 

Start exploration with the node a
- Exploring Order: ['a', 's', 'c', 'b', 'd', 'e']
- Layers: {0: ['a'], 1: ['s', 'c'], 2: ['b', 'd', 'e']} 

Start exploration with the node e
- Exploring Order: ['e', 'c', 'd', 'a', 'b', 's']
- Layers: {0: ['e'], 1: ['c', 'd'], 2: ['a', 'b'], 3: ['s']} 



### Shortest Paths    
<img src="Images/BFS_ShortestPath.png" width="600"/>

In [2]:
start_vertex = 's'; target_vertex = 'e'
BFS_display(graph_example, start_vertex, target_vertex)

Start exploration with the node s
- Exploring Order: ['s', 'a', 'b', 'c', 'd', 'e']
- Layers: {0: ['s'], 1: ['a', 'b'], 2: ['c', 'd'], 3: ['e']} 

BFS shortest path: start vertex s and target vertex e
Distance path: 3 and shortest path: ['s', 'b', 'c', 'e']


### Connected Components via BFS
<img src="Images/BFS_Connected_Components.png" width="600"/>

In [3]:
graph_SCC_example = {1: [3, 5],
                     2: [4],
                     3: [1, 5],
                     4: [2],
                     5: [1, 3, 7, 9],
                     6: [8, 10],
                     7: [5],
                     8: [6, 10],
                     9: [5],
                     10:[6, 8]
}

number_SCCs, SCCs = BFS_SCC(graph_SCC_example)
print(f'There is {number_SCCs} connected components in the graph:')
for SCC in SCCs:
    print(f'*{SCC}')

There is 3 connected components in the graph:
*[2, 4]
*[1, 3, 5, 7, 9]
*[6, 8, 10]


## Deep First Search (DFS)

### Algorithm description
<img src="Images/DFS_overview.png" width="600"/>
<img src="Images/DFS_Code.png" width="600"/>

In [4]:
from DFS import *

graph_example = {'s': ['a', 'b'],
                 'a': ['s', 'c'],
                 'b': ['s', 'c', 'd'],
                 'c': ['a', 'e', 'd'],
                 'd': ['b', 'c', 'e'],
                 'e': ['c', 'd']}

start_vertex = 's'
exploring_params = DFS_simple_exploration(graph_example, start_vertex)
print('The exploring order with DFS is:', exploring_params.explored)

The exploring order with DFS is: ['s', 'a', 'c', 'e', 'd', 'b']


In [5]:
# On multiple connected components graph:
params = DFS_loop(graph_SCC_example)
print('Exploring order:', params.explored)
print('SCCs:', params.SCCs)

Exploring order: [10, 6, 8, 9, 5, 1, 3, 7, 4, 2]
SCCs: {10: [10, 6, 8], 9: [9, 5, 1, 3, 7], 4: [4, 2]}


<img src="Images/DFS_properties.png" width="600"/>

### Strictly Connected Components (SCC)
<img src="Images/DFS_SCC.png" width="600"/>
<img src="Images/DFS_Kosaraju.png" width="600"/>
<img src="Images/DFS_SCC_Code.png" width="600"/>

The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Every row indicates an edge, the vertex label in first column is the tail and the vertex label in second column is the head (recall the graph is directed, and the edges are directed from the first column vertex to the second column vertex). So for example, the 11th row looks liks : "2 47646". This just means that the vertex with label 2 has an outgoing edge to the vertex with label 47646

Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph.

Output Format: You should output the sizes of the 5 largest SCCs in the given graph, in decreasing order of sizes, separated by commas (avoid any spaces). So if your algorithm computes the sizes of the five largest SCCs to be 500, 400, 300, 200 and 100, then your answer should be "500,400,300,200,100" (without the quotes). If your algorithm finds less than 5 SCCs, then write 0 for the remaining terms. Thus, if your algorithm computes only 3 SCCs whose sizes are 400, 300, and 100, then your answer should be "400,300,100,0,0" (without the quotes).  (Note also that your answer should not have any spaces in it.)

WARNING: This is the most challenging programming assignment of the course. Because of the size of the graph you may have to manage memory carefully. The best way to do this depends on your programming language and environment, and we strongly suggest that you exchange tips for doing this on the discussion forums.

### Course example
<img src="Images/korasaju_example1.png" width="500"/>
<img src="Images/korasaju_example2.png" width="500"/>

In [6]:
from korasaju import *

G = {1: [4],
     2: [8],
     3: [6], 
     4: [7],
     5: [2],
     6: [9],
     7: [1],
     8: [5, 6],
     9: [3, 7]}

SCCs = korasaju(G, verbose=True).SCCs

---------Graphs---------
Input  :                         {1: [4], 2: [8], 3: [6], 4: [7], 5: [2], 6: [9], 7: [1], 8: [5, 6], 9: [3, 7]}
Step 1 - Reverse graph :         {1: [7], 2: [5], 3: [9], 4: [1], 5: [8], 6: [3, 8], 7: [4, 9], 8: [2], 9: [6]}
Step 2 - Compute finishing time: {1: 7, 2: 3, 3: 1, 4: 8, 5: 2, 6: 5, 7: 9, 8: 4, 9: 6}
Step 3 - Find the leaders :      {1: [5], 2: [3], 3: [4], 4: [2, 5], 5: [6], 6: [1, 9], 7: [8], 8: [9], 9: [7]}
---------Results--------
Leader for each node:            {9: 9, 7: 9, 8: 9, 6: 6, 1: 6, 5: 6, 4: 4, 2: 4, 3: 4}
Leaders of SCCs in the graphs:   {9, 4, 6}
SCCs (grouped by leader):        {9: [9, 7, 8], 6: [6, 1, 5], 4: [4, 2, 3]}


In [7]:
korasaju(G, use_stack=True).SCCs # Work also with a stack method of recursion

{9: [8, 7, 9], 6: [5, 1, 6], 4: [3, 2, 4]}

In [8]:
import os

SCC_file = '../Data/SCC.txt'
filepath = os.path.join(os.getcwd(), SCC_file)
#graph, graph_rev = load_data(filepath)
#nbr_nodes = len(graph.keys())
#print(f'The number of nodes in the graph is: {nbr_nodes}')
#SCCs = korasaju(graph, graph_rev, use_stack=True) # 434821,968,459,313,211

## Dijkstra
### Presentation
<img src="Images/dijkstra_pseudocode.png" width="600"/>
<img src="Images/dijkstra_example.png" width="600"/>

In [9]:
from dijkstra import *

graph_example1 = {'s': {'v':1, 'w':4},
                 'v': {'s':1, 'w':2, 't':6},
                 'w': {'s':4, 't':3},
                 't': {'v':6, 'w':3}}

In [10]:
dijkstra_straightforward(graph_example1, 's')

({'s': 0, 'v': 1, 'w': 3, 't': 6},
 {'s': ['s'],
  'v': ['s', 'v'],
  'w': ['s', 'v', 'w'],
  't': ['s', 'v', 'w', 't']})

### Properties
<img src="Images/dijkstra_invariants.png" width="600"/>
<img src="Images/dijkstra_invariants2.png" width="600"/>

### Tests
The file contains an adjacency list representation of an undirected weighted graph with 200 vertices labeled 1 to 200.  Each row consists of the node tuples that are adjacent to that particular vertex along with the length of that edge. For example, the 6th row has 6 as the first entry indicating that this row corresponds to the vertex labeled 6. The next entry of this row "141,8200" indicates that there is an edge between vertex 6 and vertex 141 that has length 8200.  The rest of the pairs of this row indicate the other vertices adjacent to vertex 6 and the lengths of the corresponding edges.

Your task is to run Dijkstra's shortest-path algorithm on this graph, using 1 (the first vertex) as the source vertex, and to compute the shortest-path distances between 1 and every other vertex of the graph. If there is no path between a vertex vvv and vertex 1, we'll define the shortest-path distance between 1 and vvv to be 1000000. 

You should report the shortest-path distances to the following ten vertices, in order: 7,37,59,82,99,115,133,165,188,197.  You should encode the distances as a comma-separated string of integers. So if you find that all ten of these vertices except 115 are at distance 1000 away from vertex 1 and 115 is 2000 distance away, then your answer should be 1000,1000,1000,1000,1000,2000,1000,1000,1000,1000. Remember the order of reporting DOES MATTER, and the string should be in the same order in which the above ten vertices are given. The string should not contain any spaces.  Please type your answer in the space provided.

IMPLEMENTATION NOTES: This graph is small enough that the straightforward O(mn)O(mn)O(mn) time implementation of Dijkstra's algorithm should work fine.  OPTIONAL: For those of you seeking an additional challenge, try implementing the heap-based version.  Note this requires a heap that supports deletions, and you'll probably need to maintain some kind of mapping between vertices and their positions in the heap.

In [11]:
import os
dijkstra_file = '../Data/dijkstra.txt'
filepath = os.path.join(os.getcwd(), dijkstra_file)
graph = load_data(filepath)

nodes = [7,37,59,82,99,115,133,165,188,197]
dijkstra_shortest_path(graph, 1, nodes)

[2599, 2610, 2947, 2052, 2367, 2399, 2029, 2442, 2505, 3068]

In [12]:
# Work also with a heap data structure
dijkstra_shortest_path(graph, 1, nodes, heap=True)

[2599, 2610, 2947, 2052, 2367, 2399, 2029, 2442, 2505, 3068]

## Heaps & Balanced Binary Search Trees
### Heap properties
<img src="Images/heap_properties.png" width="600"/>
<img src="Images/heap_supported_operations.png" width="600"/>

### Median Maintenance
<img src="Images/median_maintenance.png" width="600"/>
The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 3 lecture on heap applications).  The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one.  Letting x_i denote the ith number of the file, the kth median m_k is defined as the median of the numbers x_1,...,x_k. (So, if k is odd, then m_k is ((k+1)/2)th smallest number among x_1,...,x_k; if k is even, then m_k is the (k/2)th smallest number among x1,…,xk.)

In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits).  That is, you should compute (m_1+m_2+m_3 + ... + m_10000) mod 10000.

OPTIONAL EXERCISE: Compare the performance achieved by heap-based and search-tree-based implementations of the algorithm.

In [13]:
from median_maintenance import *
import os

median_file = os.path.join(os.getcwd(), '../Data/Median.txt')
data = load_data(median_file)

median_maintenance(data)

1213

## Hashing and Bloom Filters

In [None]:
from two_sum_hashset import *
import os  

filepath = os.path.join(os.getcwd(), '../Data/algo1-programming_prob-2sum.txt')
data = load_data(filepath)
two_sum_hash(data)

  0%|          | 0/20001 [00:00<?, ?it/s]