# Graphs

In [1]:
from pathlib import Path

In [2]:
p = Path('/Users/olli/Desktop/PythonNotebooks')
sub_dir = 'pics'
pics = p/sub_dir
pics

WindowsPath('/Users/olli/Desktop/PythonNotebooks/pics')

### Overview 
* Graphs are a more general structure then trees
* Nodes/ vertex can have a key and a payload 
* Edges may be one-way or two-way 
* Edges may be weighted representing the cost to go from one node to another
* A graph with no cycles is an acyclic graph 

$G = (V,E)$

Where V is a set of vertices and E is a set of edges 
Each edge is a tuple (v,w) where w,v $\in$ V. Each edge can also have a weight that can be attached to this tuple

![Image](md_images/graph.png)

### Adjacency matrix 

* When there are few edges matrices are not the best implementation for a graph
![Image](md_images/adjacency-matrix.png)

* An adjacency list maintains a list of the other connected vertices

![Image](md_images/adjacency-list.png)

# Implementation of a Graph as an Adjacency List


Using dictionaries, it is easy to implement the adjacency list in Python. In our implementation of the Graph abstract data type we will create two classes: **Graph**, which holds the master list of vertices, and **Vertex**, which will represent each vertex in the graph.

Each Vertex uses a dictionary to keep track of the vertices to which it is connected, and the weight of each edge. This dictionary is called **connectedTo**. The constructor simply initializes the id, which will typically be a string, and the **connectedTo** dictionary. The **addNeighbor** method is used add a connection from this vertex to another. The **getConnections** method returns all of the vertices in the adjacency list, as represented by the **connectedTo** instance variable. The **getWeight** method returns the weight of the edge from this vertex to the vertex passed as a parameter.

In [3]:
class Vertex:
    def __init__(self,key):
        self.id = key
        self.connectedTo = {}

    def addNeighbor(self,nbr,weight=0):
        self.connectedTo[nbr] = weight

    def __str__(self):
        return str(self.id) + ' connectedTo: ' + str([x.id for x in self.connectedTo])

    def getConnections(self):
        return self.connectedTo.keys()

    def getId(self):
        return self.id

    def getWeight(self,nbr):
        return self.connectedTo[nbr]

In order to implement a Graph as an Adjacency List what we need to do is define the methods our Adjacency List object will have:

* **Graph()** creates a new, empty graph.
* **addVertex(vert)** adds an instance of Vertex to the graph.
* **addEdge(fromVert, toVert)** Adds a new, directed edge to the graph that connects two vertices.
* **addEdge(fromVert, toVert, weight)** Adds a new, weighted, directed edge to the graph that connects two vertices.
* **getVertex(vertKey)** finds the vertex in the graph named vertKey.
* **getVertices()** returns the list of all vertices in the graph. 
* **in** returns True for a statement of the form vertex in graph, if the given vertex is in the graph, False otherwise.

In [4]:
# advantage of using a dictionary 

# 1. This can be very useful for weighted graphs 
# 2. useful methods to get values and keys

a = {'a' : 1}

print(a.values())
print(a.keys())

dict_values([1])
dict_keys(['a'])


In [5]:
# Adjacency List implementation 
class Graph:
    def __init__(self):
        self.vertList = {}
        self.numVertices = 0

    def addVertex(self,key):
        self.numVertices = self.numVertices + 1
        newVertex = Vertex(key)
        self.vertList[key] = newVertex
        return newVertex

    def getVertex(self,n):
        if n in self.vertList:
            return self.vertList[n]
        else:
            return None

    def __contains__(self,n):
        return n in self.vertList

    def addEdge(self,f,t,cost=0):
        if f not in self.vertList:
            nv = self.addVertex(f)
        if t not in self.vertList:
            nv = self.addVertex(t)
        # add connecting edge from f to t
        self.vertList[f].addNeighbor(self.vertList[t], cost)

    def getVertices(self):
        return self.vertList.keys()

    def __iter__(self):
        return iter(self.vertList.values())

Let's see a simple example of how to use this:

In [6]:
g = Graph()
for i in range(6):
    g.addVertex(i)

In [7]:
g.vertList

{0: <__main__.Vertex at 0x2323df1cb48>,
 1: <__main__.Vertex at 0x2323df1cbc8>,
 2: <__main__.Vertex at 0x2323df1cc88>,
 3: <__main__.Vertex at 0x2323df1ccc8>,
 4: <__main__.Vertex at 0x2323df1cd08>,
 5: <__main__.Vertex at 0x2323df1cc48>}

In [8]:
g.addEdge(0,1,2)
g.addEdge(0,2,2)
g.addEdge(1,0,3)
g.addEdge(2,5,6)
g.addEdge(4,1,6)

In [9]:
# check the vertex 0 and what is it connected to 
g.vertList[0].connectedTo

{<__main__.Vertex at 0x2323df1cbc8>: 2, <__main__.Vertex at 0x2323df1cc88>: 2}

In [10]:
for vertex in g:
    print(vertex)
    print(vertex.getConnections())
    print('\n')

0 connectedTo: [1, 2]
dict_keys([<__main__.Vertex object at 0x000002323DF1CBC8>, <__main__.Vertex object at 0x000002323DF1CC88>])


1 connectedTo: [0]
dict_keys([<__main__.Vertex object at 0x000002323DF1CB48>])


2 connectedTo: [5]
dict_keys([<__main__.Vertex object at 0x000002323DF1CC48>])


3 connectedTo: []
dict_keys([])


4 connectedTo: [1]
dict_keys([<__main__.Vertex object at 0x000002323DF1CBC8>])


5 connectedTo: []
dict_keys([])




# Implementation of Graph Overview

In this lecture we will implement a simple graph by focusing on the Node class. Refer to this lecture for the solution to the Interview Problem

___
The graph will be directed and the edges can hold weights.

We will have three classes, a State class, a Node class, and finally the Graph class.

We're going to be taking advantage of two built-in tools here, [OrderDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict) and [Enum](https://docs.python.org/3/library/enum.html)

In [11]:
from enum import Enum  

class State(Enum):
    unvisited = 1 #White
    visited = 2 #Black
    visiting = 3 #Gray

Now for the Node class we will take advantage of the OrderedDict object in case we want to keep track of the order keys are added to the dictionary.

In [12]:
from collections import OrderedDict

class Node:

    def __init__(self, num):
        self.num = num
        self.visit_state = State.unvisited
        self.adjacent = OrderedDict()  # key = node, val = weight

    def __str__(self):
        return str(self.num)

Then finally the Graph:

In [14]:
class Graph:

    def __init__(self):
        self.nodes = OrderedDict()  # key = node id, val = node

    def add_node(self, num):
        node = Node(num)
        self.nodes[num] = node
        return node

    def add_edge(self, source, dest, weight=0):
        if source not in self.nodes:
            self.add_node(source)
        if dest not in self.nodes:
            self.add_node(dest)
        self.nodes[source].adjacent[self.nodes[dest]] = weight

In [15]:
# this an example of a directed graph 
g = Graph()
g.add_edge(0, 1, 5)

In [16]:
g.nodes

OrderedDict([(0, <__main__.Node at 0x2323df31a08>),
             (1, <__main__.Node at 0x2323df31e08>)])

# Word Ladder Example Code

* Transform the word "FOOL" into the word "SAGE"
* In a word ladder puzzle you can only change one letter at a time
* At each step you must transform one word into another viable word
* In a graph a path from one word to another is a solution as words seperated by one letter are seperated by one letter

![Image](md_images/word-ladder.png)

* Use of a bucket of words approach to connect words with one letter different
* pop_ -> pops, pope ... 

Below is the Vertex and Graph class used for the Word Ladder example code:

In [17]:
class Vertex:
    def __init__(self,key):
        self.id = key
        self.connectedTo = {}

    def addNeighbor(self,nbr,weight=0):
        self.connectedTo[nbr] = weight

    def __str__(self):
        return str(self.id) + ' connectedTo: ' + str([x.id for x in self.connectedTo])

    def getConnections(self):
        return self.connectedTo.keys()

    def getId(self):
        return self.id

    def getWeight(self,nbr):
        return self.connectedTo[nbr]

In [18]:
class Graph:
    def __init__(self):
        self.vertList = {}
        self.numVertices = 0

    def addVertex(self,key):
        self.numVertices = self.numVertices + 1
        newVertex = Vertex(key)
        self.vertList[key] = newVertex
        return newVertex

    def getVertex(self,n):
        if n in self.vertList:
            return self.vertList[n]
        else:
            return None

    def __contains__(self,n):
        return n in self.vertList

    def addEdge(self,f,t,cost=0):
        if f not in self.vertList:
            nv = self.addVertex(f)
        if t not in self.vertList:
            nv = self.addVertex(t)
        self.vertList[f].addNeighbor(self.vertList[t], cost)

    def getVertices(self):
        return self.vertList.keys()

    def __iter__(self):
        return iter(self.vertList.values())

Code for buildGraph function:

In [19]:
def buildGraph(wordFile):
    d = {}
    g = Graph()
    
    wfile = open(wordFile,'r')
    # create buckets of words that differ by one letter
    for line in wfile:
        word = line[:-1]
        # wildcard approach for letters
        # p_ll -> pull, poll, pill all in the same bucket
        for i in range(len(word)):
            bucket = word[:i] + '_' + word[i+1:]
            print(bucket)
            if bucket in d:
                d[bucket].append(word)
            else:
                d[bucket] = [word]
                
        print('-'*10)
    # add vertices and edges for words in the same bucket
    for bucket in d.keys():
        for word1 in d[bucket]:
            for word2 in d[bucket]:
                if word1 != word2:
                    g.addEdge(word1,word2)
    return g

Please reference the video for full explanation!

In [20]:
# build my graph 
print('-'*50)
print('buckets of words')
print('-'*50)
g = buildGraph('words.txt')
print('\n')
print('-'*50)
print('graph')
print('-'*50)

for vertex in g:
    print(vertex)
    print(vertex.getConnections())
    print('\n')

--------------------------------------------------
buckets of words
--------------------------------------------------
_ope
p_pe
po_e
pop_
----------
_ope
r_pe
ro_e
rop_
----------
_age
s_ge
sa_e
sag_
----------
_est
b_st
be_t
bes_
----------
_ipe
r_pe
ri_e
rip_
----------
_ip
p_p
pi_
----------


--------------------------------------------------
graph
--------------------------------------------------
pope connectedTo: ['rope']
dict_keys([<__main__.Vertex object at 0x000002323DE95808>])


rope connectedTo: ['pope', 'ripe']
dict_keys([<__main__.Vertex object at 0x000002323DE958C8>, <__main__.Vertex object at 0x000002323DE95D08>])


ripe connectedTo: ['rope']
dict_keys([<__main__.Vertex object at 0x000002323DE95808>])




# Breadth First Search

Makes use of a **queue** to do BFS


An alternative algorithm called Breath-First search provides us with the ability to return the same results as DFS but with the added guarantee to return the **shortest-path** first. 

* Given a graph **G** and a starting vertex **s**, explores edges in the graph to find all the vertices in **G** for which there is a path from **s** 
* Finds all the vertices that are a distance **k** from **s** before it finds any vertices that are a distance **k+1**

A good way to visualize what BFS does is that it **builds a tree structure one level at a time**!

For the word ladder problem BFS would look like :

![Image](md_images/bfs-wl.png)

We'll assume our Graph is in the form:

In [21]:
graph = {'A': set(['B', 'C']),
         'B': set(['A', 'D', 'E']),
         'C': set(['A', 'F']),
         'D': set(['B']),
         'E': set(['B', 'F']),
         'F': set(['C', 'E'])}

## Connected Component
Similar to the iterative DFS implementation the only alteration required is to remove the next item from the beginning of the list structure instead of the stacks last.

In [22]:
def bfs(graph, start):
    visited, queue = set(), [start]
    while queue:
        # use of a queue in contrast to dfs which uses a stack
        vertex = queue.pop(0)
        if vertex not in visited:
            print(vertex)
            visited.add(vertex)
            queue.extend(graph[vertex] - visited)
    return visited

bfs(graph, 'F')

F
C
E
A
B
D


{'A', 'B', 'C', 'D', 'E', 'F'}

## Paths
This implementation can again be altered slightly to instead return all possible paths between two vertices, the first of which being one of the shortest such path.

In [23]:
def bfs_paths(graph, start, goal):
    queue = [(start, [start])]
    while queue:
        (vertex, path) = queue.pop(0)
        for next in graph[vertex] - set(path):
            if next == goal:
                yield path + [next]
            else:
                queue.append((next, path + [next]))

list(bfs_paths(graph, 'A', 'F'))

[['A', 'C', 'F'], ['A', 'B', 'E', 'F']]

Knowing that the shortest path will be returned first from the BFS path generator method we can create a useful method which simply returns the shortest path found or ‘None’ if no path exists. As we are using a generator this in theory should provide similar performance results as just breaking out and returning the first matching path in the BFS implementation.

In [24]:
def shortest_path(graph, start, goal):
    try:
        return next(bfs_paths(graph, start, goal))
    except StopIteration:
        return None

shortest_path(graph, 'A', 'F')

['A', 'C', 'F']

# Knight's Tour Problem

[Knights Tour reference](https://bradfieldcs.com/algos/graphs/knights-tour/)

[Geeks for Geeks reference](https://www.geeksforgeeks.org/the-knights-tour-problem-backtracking-1/)

### Problem statement 

* Chess board with a single knight 
* Find a sequence of moves that allows the knight to visit every square on the board exactly once

* Represent the legal moves of a knight on a chessboard as a graph 
* Use a graph algorithm to find a path of  **$length=rows \times columns -1$** where every vertex on the graph is visited exactly once

![image](md_images/knight-prob.png)

### EXAMPLE OF HOW IT WORKS
![image](md_images/knights-tour.gif)

In [30]:
class Vertex:
    def __init__(self, key):
        self.id = key
        self.connectedTo = {}
#         self.__color = 'white'
        self.setColor('white')

    def addNeighbor(self, nbr, weight=0):
        self.connectedTo[nbr] = weight

    def setColor(self, color):
        self.__color = color

    def getColor(self):
        return self.__color

    def __str__(self):
        return str(self.id) + ' connectedTo: ' + str([x.id for x in self.connectedTo])

    def getConnections(self):
        return self.connectedTo.keys()

    def getId(self):
        return self.id

    def getWeight(self, nbr):
        return self.connectedTo[nbr]


class Graph:
    def __init__(self):
        self.vertList = {}
        self.numVertices = 0

    def addVertex(self, key):
        self.numVertices = self.numVertices + 1
        newVertex = Vertex(key)
        self.vertList[key] = newVertex
        return newVertex

    def getVertex(self, n):
        if n in self.vertList:
            return self.vertList[n]
        else:
            return None

    def __contains__(self, n):
        return n in self.vertList

    def addEdge(self, f, t, cost=0):
        if f not in self.vertList:
            nv = self.addVertex(f)
        if t not in self.vertList:
            nv = self.addVertex(t)
        self.vertList[f].addNeighbor(self.vertList[t], cost)

    def getVertices(self):
        return self.vertList.keys()

    def __iter__(self):
        return iter(self.vertList.values())

In [31]:
def knightGraph(bdSize):
    ktGraph = Graph()
    for row in range(bdSize):
        for col in range(bdSize):
            nodeId = posToNodeId(row,col,bdSize)
            newPositions = genLegalMoves(row,col,bdSize)
            for e in newPositions:
                nid = posToNodeId(e[0],e[1],bdSize)
                ktGraph.addEdge(nodeId,nid)
    return ktGraph

def posToNodeId(row, column, board_size):
    return (row * board_size) + column

def nodeIdToPos(nodeId, board_size):
    return (nodeId//board_size, nodeId%board_size)


In [32]:
def genLegalMoves(x,y,bdSize):
    newMoves = []
    moveOffsets = [(-1,-2),(-1,2),(-2,-1),(-2,1),
                   ( 1,-2),( 1,2),( 2,-1),( 2,1)]
    for i in moveOffsets:
        newX = x + i[0]
        newY = y + i[1]
        if legalCoord(newX,bdSize) and \
                        legalCoord(newY,bdSize):
            newMoves.append((newX,newY))
    return newMoves

def legalCoord(x,bdSize):
    if x >= 0 and x < bdSize:
        return True
    else:
        return False

In [33]:
def knightTour(n,path,u,limit):
    u.setColor('gray')
    path.append(u)
    if n < limit:
        nbrList = list(u.getConnections())
        i = 0
        done = False
        while i < len(nbrList) and not done:
            if nbrList[i].getColor() == 'white':
                done = knightTour(n+1, path, nbrList[i], limit)
            i = i + 1
        if not done:  # prepare to backtrack
            path.pop()
            u.setColor('white')
    else:
        done = True
    return done

In [35]:
dimensions = 5

ktGraph = knightGraph(dimensions)

limit = dimensions**2
start_point = ktGraph.vertList[0]
path = []
depth = 1

knights_tour = knightTour(depth, path, start_point, limit)

print('MOVE SEQUENCE')
print('-'*60)
for node in path:
    print(
        'node id : {:<10d} position : {:<20s}'.format(node.getId(), str(nodeIdToPos(node.id, dimensions))))


MOVE SEQUENCE
------------------------------------------------------------
node id : 0          position : (0, 0)              
node id : 7          position : (1, 2)              
node id : 4          position : (0, 4)              
node id : 13         position : (2, 3)              
node id : 6          position : (1, 1)              
node id : 3          position : (0, 3)              
node id : 14         position : (2, 4)              
node id : 23         position : (4, 3)              
node id : 16         position : (3, 1)              
node id : 5          position : (1, 0)              
node id : 2          position : (0, 2)              
node id : 9          position : (1, 4)              
node id : 12         position : (2, 2)              
node id : 15         position : (3, 0)              
node id : 22         position : (4, 2)              
node id : 19         position : (3, 4)              
node id : 8          position : (1, 3)              
node id : 1          pos

# Depth-First Search

Uses a **stack** for backtracking.
This algorithm we will be discussing is Depth-First search which as the name hints at, explores possible vertices (from a supplied root) down each branch before backtracking. This property allows the algorithm to be implemented succinctly in both iterative and recursive forms. Below is a listing of the actions performed upon each visit to a node.

**Explores one branch of the tree as deeply as possible**

* Mark the current vertex as being visited.
* Explore each adjacent vertex that is not included in the visited set.

We will assume a simplified version of a graph in the following form:

In [36]:
graph = {'A': set(['B', 'C']),
         'B': set(['A', 'D', 'E']),
         'C': set(['A', 'F']),
         'D': set(['B']),
         'E': set(['B', 'F']),
         'F': set(['C', 'E'])}

In [37]:
set(['B','C']) - set('B')

{'C'}

## Connected Component

The implementation below uses the stack data-structure to build-up and return a set of vertices that are accessible within the subjects connected component. Using Python’s overloading of the subtraction operator to remove items from a set, we are able to add only the unvisited adjacent vertices.

In [38]:
def dfs(graph, start):
    visited, stack = set(), [start]
    while stack:
        vertex = stack.pop()
        print(vertex)
        if vertex not in visited:
            visited.add(vertex)
            # add the unvisited adjacent vertices
            stack.extend(graph[vertex] - visited)
    return visited

dfs(graph, 'A') 

A
B
D
E
F
C
C


{'A', 'B', 'C', 'D', 'E', 'F'}

The second implementation provides the same functionality as the first, however, this time we are using the more succinct recursive form. Due to a common Python gotcha with default parameter values being created only once, we are required to create a new visited set on each user invocation. Another Python language detail is that function variables are passed by reference, resulting in the visited mutable set not having to reassigned upon each recursive call.

In [39]:
def dfs(graph, start, visited=None):
    print(start)
    if visited is None:
        visited = set()
    visited.add(start)
    for nxt in graph[start] - visited:
        dfs(graph, nxt, visited)
    return visited

dfs(graph, 'A') 

A
C
F
E
B
D
B


{'A', 'B', 'C', 'D', 'E', 'F'}

## Paths
We are able to tweak both of the previous implementations to return all possible paths between a start and goal vertex. The implementation below uses the stack data-structure again to iteratively solve the problem, yielding each possible path when we locate the goal. Using a generator allows the user to only compute the desired amount of alternative paths.

In [40]:
def dfs_paths(graph, start, goal):
    stack = [(start, [start])]
    while stack:
        (vertex, path) = stack.pop()
        for nxt in graph[vertex] - set(path):
            if nxt == goal:
                yield path + [nxt]
            else:
                stack.append((nxt, path + [nxt]))

list(dfs_paths(graph, 'A', 'F'))

[['A', 'B', 'E', 'F'], ['A', 'C', 'F']]

### Resources
* [Depth-and Breadth-First Search](http://jeremykun.com/2013/01/22/depth-and-breadth-first-search/)
* [Connected component](https://en.wikipedia.org/wiki/Connected_component_(graph_theory))
* [Adjacency matrix](https://en.wikipedia.org/wiki/Adjacency_matrix)
* [Adjacency list](https://en.wikipedia.org/wiki/Adjacency_list)
* [Python Gotcha: Default arguments and mutable data structures](https://developmentality.wordpress.com/2010/08/23/python-gotcha-default-arguments/)
* [Generators](https://wiki.python.org/moin/Generators)