# Graphs 

Graphs are a nonlinear data structure that consisting of a collection of nodes connected by edges. Graphs can have a very symmetrical and balanced structure (a BST is actually a type of graph), or they can be very convoluted and contain a lot of circular connections (cycles). Graphs have many applications, for example in social networks, all users are vertices and they are connected to their friends via edges. Mapping software also uses graphs to represent locations and routes connecting those locations. 

Graphs can either be directed (edges go one way) or undirected (edges go both ways). 

There are two main ways in which you can implement a graph under the hood:

1. adjacency matrix
2. adjacency list 



## Adjacency Matrix

Adjacency matrix consists of an V x V matrix where there is an index on each side of the matrix that corresponds to each vertex in the graph. To check if two nodes have an edge between them, you go to the place where the row for one intersects the column for the other. That value will be a 1 if their is an edge between the two, or a 0 if not. 

The adjacency matrix implmentation also allows you to represent a weighted graph (where some edges are weighted higher than others). To do this, instead of putting in a 1 or 0 for an edge, you put in the weight of that edge. 

In an undirected graph, the two sides of the matrix should mirror each other over the diagonal because if vertex *u* point to vertex *v*, the reverse must also be true in an undirected graph. However in a directed graph, the matrix can keep track of one-way edges, so if a edge goes from (u, v) but not (v, u), the matrix will show a "1" at arr[u][v] but a "0" at arr[v][u].

Adjacency matrices enable you to check whether or not an edge exists in $O(1)$ time. It also lets you add new edges in O(1) time. However, adding or removing vertices requires you to copy over the matrix and increase the storage by 1, and copying it over takes $O(V^2)$ time. The matrix also takes up a lot of memory, since even if the graph has view edges, the space complexity is still $O(V^2)$. 

Below I implement an adjacentcy matrix for an unweighted, directed graph using pandas:


In [144]:
import pandas as pd
from collections import deque


class Graph_Adj_Matrix():
    
    def __init__(self, vertices):
        
        self.matrix = pd.DataFrame(0, index=vertices, columns=vertices)
        
    def add_vertex(self, v):
        
        self.matrix = self.matrix.append(pd.Series(0, index=self.matrix.columns, name=v))
        self.matrix[v] = 0
        
    def add_edge(self, u, v):
        
        self.matrix[u][v] = 1
        
    def BFS(self, value):
        
        visited = set()
        queue = deque()
        
        while len(visited) < len(list(self.matrix.columns)):
            
            if not queue:
                for v in list(self.matrix.columns):
                    if v not in visited:
                        curr_vertex = v
            else:
                curr_vertex = queue.popleft()
                
            if curr_vertex == value:
                return True
            
            else:
                visited.add(curr_vertex)
                adjacent_v = list(self.matrix.loc[curr_vertex, :].loc[self.matrix.loc[curr_vertex, :] == 1].index)
                for v in adjacent_v:
                    if v not in visited:
                        queue.append(v)
                        
        return False
    
    def DFS(self, value):
        
        visited = set()
        curr_vertex = self.matrix.columns[0]
        
        return self.DFS_r(value, curr_vertex, visited)
        
        
        
    def DFS_r(self, value, curr_vertex, visited):
        
        if len(visited) == len(self.matrix.columns):
            return False
        
        if curr_vertex == value:
            return True
        
        visited.add(curr_vertex)
        print(curr_vertex)
        
        adjacent_v = list(self.matrix.loc[curr_vertex, :].loc[self.matrix.loc[curr_vertex, :] == 1].index)
        
        for v in adjacent_v:
            if v not in visited:
                return self.DFS_r(value, v, visited)
        
        if len(visited) < len(self.matrix.columns):
            
            for v in list(self.matrix.columns):
                if v not in visited:
                    return self.DFS_r(value, v, visited)
            
        return False
        
    
    def print_graph(self):
        
        print(self.matrix)
        
    
####################################################

vertices = ["A", "B", "C", "D"]

graph = Graph_Adj_Matrix(vertices)

graph.add_vertex("E")
graph.add_vertex("F")
graph.add_vertex("G")
graph.add_vertex("H")


graph.add_edge("A", "B")
graph.add_edge("B", "C")
graph.add_edge("C", "D")
graph.add_edge("D", "B")
graph.add_edge("D", "E")
graph.add_edge("B", "F")
graph.add_edge("F", "G")

graph.print_graph()
                                 
print()
print("BFS --------------------------")
print("Does the graph contain A?")
print(graph.BFS("A"))
assert graph.BFS("A") == True
print("How about Z?")
print(graph.BFS("Z"))
assert graph.BFS("Z") == False
print("How about G? (a disconnected vertex)")
print(graph.BFS("G"))
assert graph.BFS("G") == True

print()
print("DFS---------------------------")
print("Does the graph contain A?")
print(graph.DFS("G"))


        

   A  B  C  D  E  F  G  H
A  0  0  0  0  0  0  0  0
B  1  0  0  1  0  0  0  0
C  0  1  0  0  0  0  0  0
D  0  0  1  0  0  0  0  0
E  0  0  0  1  0  0  0  0
F  0  1  0  0  0  0  0  0
G  0  0  0  0  0  1  0  0
H  0  0  0  0  0  0  0  0

BFS --------------------------
Does the graph contain A?
True
How about Z?
False
How about G? (a disconnected vertex)
True

DFS---------------------------
Does the graph contain A?
A
B
D
C
E
F
True


## Adjacency List

Another way to implement a graph is through adjacency lists. Each vertex is assigned a corresponding list that contains all of the vertices that it is connected to with an edge. 

This method ensurest that you can insert and remove vertices in O(1) time. However, looking up edges is slower than with adjacency matrices, since in the worst case you will have to look through every edge (if one of the vertices connects to every edge). This makes edge lookup O(E). The space complexity is typically a lot less than with a matrix, since you only need to store info for when an edge does exsist. 

In [134]:
from collections import deque

class Graph_Adj_List():
    
    def __init__(self):
        
        self.graph = {}
        
    def add_vertex(self, v):
        
        self.graph[v] = []
        
    def add_edge(self, u, v):
        
        self.graph[u].append(v)
    
    #O(V + E)
    def BFS(self, value):
        
        visited = set()
        queue = deque()
        
        # while not every key has been visited 
        while len(visited) < len(self.graph):
            
            # if there is nothing in the queue, go to the next unvisited key 
            if not queue:
                for key in self.graph:
                    if key not in visited:
                        curr_vertex = key
            else: 
                curr_vertex = queue.popleft()
                            
            if curr_vertex == value:
                return True
            else: # mark node as visited and add adjacent nodes to queue
                visited.add(curr_vertex)
                for element in self.graph[curr_vertex]:
                    if element not in visited:
                        queue.append(element)        
                        
        return False

        
    def DFSrec(self, value, curr_vertex, visited):
        
        if len(visited) == len(self.graph):
            return False
        
        if curr_vertex == value:
            return True 
        
        visited.add(curr_vertex)
        
        # recursively call on adjacent vertices one at a time 
        for element in self.graph[curr_vertex]:
            
            if element not in visited:
                return self.DFSrec(value, element, visited)
        
        # account for disconnected nodes 
        if len(visited) < len(self.graph):
            for key in self.graph:
                    if key not in visited:
                        return self.DFSrec(value, key, visited)
        return False
            
    def DFS(self, value):
        
        visited = set()
        curr_vertex = next(iter(self.graph))
        
        return self.DFSrec(value, curr_vertex, visited)
        
        
    def print_graph(self):
        
        for key, value in self.graph.items():
            
            print(key, value)
            
    
            
##################################################3

graph = Graph_Adj_List()

graph.add_vertex("A")
graph.add_vertex("B")
graph.add_vertex("C")
graph.add_vertex("D")
graph.add_vertex("E")
graph.add_vertex("F")
graph.add_vertex("G")
graph.add_vertex("H")


graph.add_edge("A", "B")
graph.add_edge("B", "C")
graph.add_edge("C", "D")
graph.add_edge("D", "B")
graph.add_edge("D", "E")
graph.add_edge("B", "F")
graph.add_edge("F", "G")

graph.print_graph()
print()
print("BFS --------------------------")
print("Does the graph contain A?")
print(graph.BFS("A"))
assert graph.BFS("A") == True
print("How about Z?")
print(graph.BFS("Z"))
assert graph.BFS("Z") == False
print("How about G? (a disconnected vertex)")
print(graph.BFS("G"))
assert graph.BFS("G") == True

print()
print("DFS---------------------------")
print("Does the graph contain A?")
print(graph.DFS("A"))
assert graph.DFS("A") == True
print("How about Z?")
print(graph.DFS("Z"))
assert graph.DFS("Z") == False
print("How about G? (a disconnected vertex)")
print(graph.DFS("G"))
assert graph.DFS("G") == True

A ['B']
B ['C', 'F']
C ['D']
D ['B', 'E']
E []
F ['G']
G []
H []

BFS --------------------------
Does the graph contain A?
True
How about Z?
False
How about G? (a disconnected vertex)
True

DFS---------------------------
Does the graph contain A?
True
How about Z?
False
How about G? (a disconnected vertex)
True


# Breadth First Search (BFS) vs. Depth First Search (DFS)

There are two main ways to search through a graph to find a particular vertex:

1) Breadth first search starts at a node, then checks all the adjacent nodes, then checks all the adjecent nodes adjacent nodes, etc etc

2) Depth first search starts at a node, then keeps following a linear line of adjacent nodes until an end is reached. 

Both algorithms have a runtime of O(E + V). However one might be better suited to your particular graph structure then the other. For example, if you know the solution is not far from the root of the graph, it might be better to use breadth first search. But if solutions are generally buried deep in a graph DFS might be better in practice. 



