# Data structures

## 1) Matrix

A common approach is to represent vertices with 0s with implicit edges (left, right, down, up). 1s mean "no edge".

*Example:*

In [2]:
graph=[
    [0,1,1,0],
    [0,0,1,0],
    [1,1,0,0],
    [0,0,0,0],
]

## 2) Adjacency matrix

In this case; M[r=i][c=j] means an edge exists from vertice *i* to vertice *j*. This structure is not that common in CS because it has a O(V<sup>2</sup>) space complexity, even though the graph has only *V* vertices.

## 3) Adjacency list

In [3]:
class Vertice:
    def __init__(self, val):
        self.val = val
        self.neighbors = []

# Matrix DFS

Question: count unique paths from top left to bottom right. Only 0s are allowed, no more than 1 visit per node for a given path.

## Architecture:

```
dfs(grid: list[list[int]], row: int, col: int, visited: set[tuple[int, int]]):

    # Base case: invalid
    if <OOB> or <Blocked> or <Visited>:
        return 0
    # Base case: invalid
    if <Reached>:
        return 1

    # Recursive call
    visit.add(row, col)
    count = 0
    count += dfs(<above|left|right|below>)
    visit.remove(row, col)
    
    return count
```

## Complexity

### Time complexity

Intuition: a path can "cover" the entire matrix, adding to `n*m` nodes. At each step of the paths, a recursive call is made to up to 4 nodes. Hence the worst-case time complexity is $O(4^{n.m})$.

### Space complexity

A given path is composed of up to `n*m` nodes, hence the size of the recursive call stack can be $O(n.m)$ in the worst case.

In [3]:
# DFS: implementation

def dfs(grid: list[list[int]], row: int = 0, col: int = 0, visited: set[tuple[int, int]] = set()):
    R, C = len(grid), len(grid[0])
    # OOB or blocked or visited
    base_case_invalid = min(row, col) < 0\
        or row >= R or col >= C\
        or grid[row][col] == 1\
        or (row, col) in visited

    # Reached target
    base_case_valid = ((row, col) == (R - 1, C - 1))

    if base_case_invalid:
        return 0

    if base_case_valid:
        return 1
    
    # Recursive calls
    visited.add((row, col))
    
    count = 0
    count += dfs(grid, row - 1, col, visited)
    count += dfs(grid, row + 1, col, visited)
    count += dfs(grid, row, col - 1, visited)
    count += dfs(grid, row, col + 1, visited)

    visited.remove((row, col))

    return count

from numpy import array
m = array([[0]*4 for _ in range(4)])
m[1,0] = m[1, 1] = m[3, 1] = m[2, 3] = 1
m

array([[0, 0, 0, 0],
       [1, 1, 0, 0],
       [0, 0, 0, 1],
       [0, 1, 0, 0]])

In [4]:
dfs(m)

2

# Matrix BFS

Question: Length of shortest path from top left to bottom right.

## Architecture:

```
bfs():
    queue, visit = deque[(0, 0)], set[(0, 0)]
    length = 1

    while queue:
        length += 1
        for _ in [1..len(queue)]:

            row, col = queue.popleft()

            if <Reached>:
                return length

            for neighbor in neighbors(row, col):
                if <OOB> or <Blocked> or <Visited>:
                    continue
                queue.append(neighbor)
                visit.add(neighbor)         
```

## Complexity

### Time complexity

Worst-case scenario, all the nodes are visited exactly once (thanks to the maintained `set`), so the time complexity is $O(n.m)$

### Space complexity

All the created objects are $O(n.m)$.


In [5]:
# BFS: implementation

from collections import deque

def bfs(grid: list[list[int]]) -> int:
    queue = deque()
    visited = set()
    R, C = len(grid), len(grid[0])

    visited.add((0, 0))
    queue.append((0, 0))
    length = -1
    while queue:
        length += 1
        for _ in range(len(queue)):
            row, col = queue.popleft()
            if (row, col) == (R - 1, C - 1):
                return length
            
            deltas = [(1, 0), (-1, 0), (0, 1), (0, -1)]
            for dr, dc in deltas:
                neighbor = (row + dr, col + dc)

                invalid_neighbor = min(*neighbor) < 0 or neighbor[0] >= R or neighbor[1] >= C \
                    or grid[neighbor] == 1 \
                    or neighbor in visited
                if invalid_neighbor:
                    continue
                
                visited.add(neighbor) # Important to do ASAP
                queue.append(neighbor)

In [6]:
bfs(m)

6

# Adjacency lists

## Implementation

### 1) Graph nodes

In [7]:
class GraphNode:
    def __init__(self, val, neighbors: list = []):
        self.val = val
        self.neighbors = neighbors

### 2) Hash maps

In [12]:
# Example
adj = {"A": ["B", "C"], "B": ["A"]}

# Build an adjacency list from a list of directed edges
edges = [["A", "B"], ["B", "C"], ["B", "E"], ["C", "E"], ["E", "D"]]

# defaultdict is not that useful here (save 1 line)
# because a vertice that is only an arrival (e.g. "C") does not get added as key
adj = {}
for src, dst in edges:
    if src not in adj:
        adj[src] = []
    if dst not in adj:
        adj[dst] = []
    adj[src].append(dst)

adj

{'A': ['B'], 'B': ['C', 'E'], 'C': ['E'], 'E': ['D'], 'D': []}

## Traversal algorithms

In [24]:
# DFS

def dfs(adj: dict[str: list[str]], src: str, dst: str, visited: set[str] = set()) -> int:

    if src == dst:
        return 1
    
    if src in visited:
        return 0
    
    count = 0
    visited.add(src)
    for neighbor in adj[src]:
        count += dfs(adj, neighbor, dst, visited)
    visited.remove(src)
    return count

In [25]:
dfs(adj, "A", "E")

2

In [35]:
# BFS
from collections import deque
def bfs(adj: dict[str: list[str]], src: str, dst: str) -> int:
    queue = deque()
    visited = set()

    queue.append(src)
    visited.add(src)
    length = -1

    while queue:
        length += 1
        for _ in range(len(queue)):
            current = queue.popleft()
            if current == dst:
                return length
            for neighbor in adj[current]:
                if neighbor not in visited:
                    queue.append(neighbor)
                    visited.add(neighbor)
    return -1


In [37]:
bfs(adj, "A", "E")

2

# Shortest path algorithms

## Dijkstra

Implementation: heapq does not support updating priority queue keys, so instead the same node can be pushed several times, with different keys. It is only marked as visited when it is popped from the queue for the first time (and in such a case, the key is guaranteed to be minimal).

In [3]:
import heapq

def dijkstra_with_set(n, edges: list[tuple[str, str, int]], src: str) -> dict[str: int]:
  # Convert edges to adjacency list
  adj_list = {v: [] for v in range(n)}
  for v, neighbor, distance in edges:
     adj_list[v].append((neighbor, distance))
  # Dijkstra
  visited, shortest = set(), {v: float("inf") for v in adj_list}
  heap = [(0, src)]
  while heap:
      path_to_v, v = heapq.heappop(heap)
      if v in visited:
          continue
      visited.add(v)
      shortest[v] = path_to_v
      for neighbor, distance in adj_list[v]:
          # if neighbor not in visited: # Optional, this can just reduce the size of the heap 
          heapq.heappush(heap, (path_to_v + distance, neighbor))
  return shortest

print(dijkstra_with_set(n=5, edges=[[0,1,10],[0,2,3],[1,3,2],[2,1,4],[2,3,8],[2,4,2],[3,4,5]], src=0))

{0: 0, 1: 7, 2: 3, 3: 9, 4: 5}


In [4]:
# Using the dictionary `shortest` as the set of visited nodes:

def dijkstra(n, edges: list[tuple[str, str, int]], src: str) -> dict[str: int]:
  # Convert edges to adjacency list
  adj_list = {v: [] for v in range(n)}
  for v, neighbor, distance in edges:
      adj_list[v].append((neighbor, distance))
  # Dijkstra
  shortest = {}
  heap = [(0, src)]
  while heap:
      path_to_v, v = heapq.heappop(heap)
      if v in shortest:
          continue
      shortest[v] = path_to_v
      for neighbor, distance in adj_list[v]:
          heapq.heappush(heap, (path_to_v + distance, neighbor))
  return shortest

print(dijkstra_with_set(n=5, edges=[[0,1,10],[0,2,3],[1,3,2],[2,1,4],[2,3,8],[2,4,2],[3,4,5]], src=0))

{0: 0, 1: 7, 2: 3, 3: 9, 4: 5}


# Topological Sort

For Directed Acyclic Graphs:
* Directed: who comes first in undirected edge (u, v)?
* Acyclical: same problem, ordering nodes in a cycle is ambiguous

In [4]:
# Input graph
edges = [
    ("A", "B"),
    ("A", "C"),
    ("B", "D"),
    ("C", "E"),
    ("D", "F"),
    ("E", "F"),
    ("G", "H")
]

# Adjacency list
graph = {}
for src, dst in edges:
    if src not in graph:
        graph[src] = []
    if dst not in graph:
        graph[dst] = []
    graph[src].append(dst)

# Connected graph, knowing start node
def top_sort_with_start(graph: dict[str: list[str]], start_node: str) -> list[str]:
    res = []
    visited = set()

    def dfs_postorder(v: str) -> None:
        if v in visited:
            return
        visited.add(v)
        for nei in graph[v]:
            dfs_postorder(nei)
        res.append(v)
    
    dfs_postorder(start_node)
    res.reverse()
    return res

print(top_sort_with_start(graph=graph, start_node="A"))

# General case
def top_sort_no_cycle(graph: dict[str: list[str]]) -> list[str]:

    def dfs_postorder(v: str) -> None:
        if v in visited:
            return
        visited.add(v)
        for nei in graph[v]:
            dfs_postorder(nei)
        res.append(v)

    res, visited = [], set()
    for v in graph:
        dfs_postorder(v)
    res.reverse()
    return res

print(top_sort_no_cycle(graph))

# Topological sort with cycle detection

class CycleDetectedError(Exception):
    pass

def top_sort(graph: dict[str: list[str]]) -> list[str]:
    def dfs(v:str, path: set[str]) -> None:
        if v in path:
            raise CycleDetectedError("Could not run topological sort: cycle detected in input graph")
        if v in visited:
            return
        path.add(v)
        visited.add(v)
        for nei in graph[v]:
            dfs(nei, path)
        path.remove(v)
        topsort.append(v)
        return

    visited = set()
    topsort = []
    for v in graph:
        path = set()
        dfs(v, path)
    topsort.reverse()
    return topsort

print(top_sort(graph))

['A', 'C', 'E', 'B', 'D', 'F']
['G', 'H', 'A', 'C', 'E', 'B', 'D', 'F']
['G', 'H', 'A', 'C', 'E', 'B', 'D', 'F']


# Cycle detection - Directed graphs

*Algrithms tested on leetcode's Course Schedule problem*

With directed graphs, the nodes must be marked two distinct ways, e.g. 'explored' or 'being_explored', as opposed to the usual `visited` set. This is necessary to avoid wrongly identifying a cycle in such a configuration: `{A: [B, C], B: [D], C: [D], D: []}`

## Using Topological Sort

See previous section

## Using standard DFS

In [4]:
# Check `being_explored` before `explored`!!!
# A node can be explored and not being_explored, but not being_explored and not explored!
# To avoid this mistake for sure the current vertex can be marked as explored only at the end of the procedure

def has_cycle_set(graph: dict[str: list[str]]) -> bool:
    
    def dfs(v: str) -> bool:
        if v in being_explored:
            return True
        if v in explored:
            return False
        explored.add(v)
        being_explored.add(v)
        for nei in graph[v]:
            if dfs(nei):
                return True
        being_explored.remove(v)
        return False

    for v in graph:
        explored, being_explored = set(), set()
        if dfs(v):
            return True
    return False

# Feels a bit less natural but some recursive calls are avoided
def has_cycle_set2(graph: dict[str: list[str]]) -> bool:
    
    def dfs(v: str) -> bool:
        explored.add(v)
        being_explored.add(v)
        back_edge = False
        for nei in graph[v]:
            if nei in being_explored:
                return True
            if nei in explored:
                continue
            back_edge = back_edge or dfs(nei)
        being_explored.remove(v)
        return back_edge

    for v in graph:
        explored, being_explored = set(), set()
        if dfs(v):
            return True
    return False

In [None]:
from enum import Enum

class Status(Enum):
    NOT_EXPLORED, EXPLORED, BEING_EXPLORED = range(1, 4)

def has_cycle_map(raph: dict[str: list[str]]) -> bool:

    def dfs(v: str) -> bool:
        status[v] = Status.BEING_EXPLORED
        res = False
        for neighbor in adj[v]:
            if status[neighbor] == Status.EXPLORED:
                continue
            if status[neighbor] == Status.BEING_EXPLORED:
                return True
            res = res or dfs(neighbor, status)
        status[v] = Status.EXPLORED
        return res

    for v in adj:
        status = {node: Status.NOT_EXPLORED for node in adj}
        if dfs(v, status):
            return True
    return False

# Cycle detection - Undirected graphs

If the graph is connected, just count the number of edges E. If $E \geq V$, then there must be a cycle.  

Otherwise, the DFS method is simpler than for directed graphs because any encounter with a visited node means a cycle, except if the already visited node is the parent of the current node.

In [6]:
# Adjacency list
def build_graph(edges: list[tuple[str, str]]) -> dict[str: list[str]]:
    graph = {}
    for src, dst in edges:
        if src not in graph:
            graph[src] = []
        if dst not in graph:
            graph[dst] = []
        graph[src].append(dst)
        graph[dst].append(src)
    return graph
    
def has_cycle_undirected(graph: dict[str: list[str]]) -> bool:
    def dfs(v: str, p:str):
        if v in visited:
            return True
        visited.add(v)
        for nei in graph[v]:
            if nei == p:
                continue
            if dfs(nei, v):
                return True
        return False

    visited =set()
    for v in graph:
        if v in visited:
            continue
        if dfs(v, None):
            return True
    return False

acyclic = [
    ("A", "B"),
    ("A", "C"),
    ("C", "D"),
    ("D", "E"),
    ("D", "G"),
    ("E", "F")
]

cyclic_connected = [
    ("A", "B"),
    ("A", "C"),
    ("C", "D"),
    ("D", "E"),
    ("D", "F"),
    ("D", "G"),
    ("E", "F"),
    ("X", "Y"),
    ("Y", "Z")
]

cyclic_unconnected = [
    ("A", "B"),
    ("A", "C"),
    ("C", "D"),
    ("D", "E"),
    ("D", "F"),
    ("D", "G"),
    ("E", "F"),
    ("X", "Y"),
    ("Y", "Z")
]
graphs = [build_graph(edges) for edges in (acyclic, cyclic_connected, cyclic_unconnected)]
print([has_cycle_undirected(graph) for graph in graphs])

[False, True, True]
