# DAY 2
### Defeating Word GOLF with Graph Search in $O(m+n)$ time


Reminder: Here are the rules of **Word Golf**.  You are given two words of equal length, and you must find a path between those two words by only changing a single letter at a time to create a path of **real words** from the first word to the second.

For example:  If you are given these two words: **CARS** and **DRAT**, you might find a path like this:

    CARS -> CART -> DART -> DAFT -> DEFT -> DEBT -> DEET -> BEET -> BEAT -> BRAT -> DRAT

### COPY your function from Day 1, Part 5 here to allow you to quickly load your text file into an Adjacency List dictionary 


In [1]:
# Function to load a text file into an adjecency list graph in the form of a dictionary

def load_graph(file = "4_letter_graph.txt"):
    '''
    Reads a file in and returns a graph in the form of an adjacency list dictionary
    inputs: file name
    outputs: the adjacency list dictionary

    '''
    letter_dict = {}
    with open(file, 'r') as f:
        for line in f:
            # make line a list and split the list into 2
            letter_dict[line.split()[0]] = line.split()[1::]
    return letter_dict



## Day 2:  Breadth First Search (DFS) to Defeat Word Golf

Today you will write a function that uses the Breadth First Search algorithm for exploring the graph of Scrabble Words Created on Day 1:

### We're going to do this in Three Parts

### Part 1: Searching the Graph


### Part 2: Finding how many hops or layers from start to end word


### Part 3: Solution: Finding the Path from Start Node to End Node


### Part 4:  Putting it all together to solve Word Golf


### Optional Additional Exploration

    1) Finding words with the longest path between them (and not shorter paths possible)
    2) Finding words that can not be reached



### Day 2, Part 1: Searching the Graph

Were going to using a new datastructure called a Queue for this function.  A Queue works on a First In First Out (FIFO) policy.  We could implement this with a list by removing items from the front of the list, and appending them to the end, but a list requires O(n) time to remove from the front, and a Queue can do this in O(1) time while still appending in O(1) time as well.

We'll use the deque class from the collections library like this:

```python
from collections import deque
```
Once you define a Queue, you'll use the `.append` and `.popleft` methods.  [See this page for more detail about using deque](https://www.geeksforgeeks.org/deque-in-python/)

In addition, we'll be using a data structure called a SET to keep track of explored nodes.  A set can be initialized like this:
```python
E = set() # initialized empty set of explored nodes
# or
E = {s} # initialized set of explored nodes with the start node
```

Look-ups like ```'s' in E``` with a Set can be done in O(1) time, so they are very efficient to use when you need to do frequent fast lookups.  To add to a Set, you use the .add method instead of .append as you would do with a list.


For this part, you just want to successfully search through all the nodes you can reach given a single starting point.  You will simply return a set of all the nodes that can be reached from that given starting point.

To do this, you'll implement Breadth First Search (BFS) as [descibed in this Tim Roughgarden video](https://youtu.be/73qCvXsYkfk)


In [8]:
d = set('a')
d

{'a'}

In [2]:
# Basic BFS function to search a graph and return a set of
# the nodes that can be found from given starting node, s 
from collections import deque

def BFS(G, s):
    '''
    Breadth First Search of a linked list Graph
    inputs: graph of words, starting index
    outputs: the set of all explored
    
    '''
    E = set([s])
    # starts with s as the starting node
    Q = deque([s])
    while Q:
        v = Q.popleft()
        for w in G[v]:
            if w not in E:
                Q.append(w)
                E.add(w)


    return E # the set of all the found nodes




# This is the Graph from the Video to use for testing
# your function should return a list of all the nodes reachable from s
# I also added f, g, and h which may not be reachabe from s

G = {'s':['a','b'],
    'a': ['s','b','c'],
    'b':['s','c','d'],
    'c':['a','b','d'],
    'd':['b','c','e'],
    'e':['c','d'],
     'f':['g','h'],
     'g':['f'],
     'h':['f']
    }

BFS(G, 's')


{'a', 'b', 'c', 'd', 'e', 's'}

### Day 2, Part 2: Finding how many hops or layers from start to end word

[Watch this 7 minute video on Calculating Shortest Path](https://youtu.be/AhEZ4yjkVxA)

Note: One difference I recommend in your implementation is to use a python dictionary data structure for calculating the layer that a node is in.  The starting node would be a distance of Zero.  Such as:

```python
dist = {s:0}
```

Copy your BFS code from above but now you return the distance dictionary showing the distance from s to every point reachable by s.  For instance, it should contain the these key_value pairs:

```python
dist = {s: 0,
        a: 1, 
        b: 1,
        c: 2}
```

In [3]:
def BFS_with_distance(G, s):
    '''
    finds the distance between all discovered nodes from starting item
    inputs: graph, start point
    outputs: dictionary of all discovered nodes and distances
    '''
    dist = {s:0}
    # starts with s as the starting node
    E = set([s])  # using list so that it doesn't spilt the strings up
    Q = deque([s])
    while Q:
        v = Q.popleft()
        for w in G[v]:
            if w not in E:
                Q.append(w)
                E.add(w)
                dist[w] = dist[v] + 1
    return dist


BFS_with_distance(G, 's')

{'s': 0, 'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3}

### Day 2, Part 3: Solution: Finding the Path from Start Node to End Node

This is the final step!  This time, you are going to write a version of BFS that finds the shortest path between two nodes s1 and s2.  The function will return a list starting from s1 and ending at s2.

For the sample graph provided, for instance:
```python
BFS_path(G, 's', 'e')
```
might return `['s', 'b', 'd', 'e']` or another viable list of the same length.

This one will require you to use a similar technique to Part 2 where you kept track of the distance to every node.  This time, I would suggest using a dictionary to keep track of each word that has been explored and which node discovered it.  This dictionary could the be used to backtrack from the end node to the start node to reveal the shortest path.

If no path is possible, the function should return and empty list.



In [6]:
def BFS_path(G, s1, s2):
    '''
    finds the optimal path from s1 to s2
    inputs: graph, starting point, end point
    returns: the path from s1 to s2
    '''
    by = {s1: s1}
    dist = {s1:0}
    E = set([s1])
    # starts with s as the starting node
    Q = deque([s1])
    while Q:
        v = Q.popleft()
        for w in G[v]:
            if w not in E:
                # changing the q and explored
                Q.append(w)
                E.add(w)
                # distance keeping track, don't need it, but I wanted to keep it in case future alterations need it
                dist[w] = dist[v] + 1
                # keeps count of what an element is discovered by
                by[w] = v
    # starting an empty path
    path = deque([])  # using deque so I can append left, and not do return path[::-1]
    if s2 in E:  # assuming that the item is found
        path.append(s2)
        appending = by[s2]
        while s1 not in path:  # backtracks until the starting node is in
            path.appendleft(appending)
            appending = by[appending]
    return list(path)

BFS_path(G, 's', 'e')

['s', 'b', 'd', 'e']

### Day 2, Part 4:  Putting it all together to solve Word Golf

Use your functions from Day 1 to load the graph (4 letter, 5 letter, 6 letter, etc) and solve some Word Golf Puzzles.


In [7]:
for length in range(4, 6):
    exec(f"letter_graph_{length} = load_graph('{length}_letter_graph.txt')")

In [8]:
# Here are some 5 and 4 letter test cases to try to solve and the length of their shortest path solutions
# including the start and end word in the path

# you'll use your load_graph function from Day 1 for this

test_cases5 = [('HONAN', 'ICERS', 15),
 ('ACKEE', 'RIGOL', 16),
 ('GAITT', 'IDOLA', 17),
 ('APPEL', 'LUNET', 18),
 ('HERTZ', 'INPUT', 21),
 ('MUSCA', 'UNCOY', 22),
 ('GLAUM', 'UNAIS', 24),
              ("GLAUM", "yeet", 0)]

test_cases4 = [('KLIK', 'OFAY', 9),
 ('DHAK', 'EDHS', 10),
 ('IDOL', 'JEDI', 12),
 ('IDOL', 'JIAO', 13),
 ('ASHY', 'ODIC', 14),
 ('EGAL', 'UNAU', 15)]


for test in test_cases5:
    if len(BFS_path(letter_graph_5, test[0], test[1])) == test[2]:
        print(True, test[0], test[1], BFS_path(letter_graph_5, test[0], test[1]))
    else:
        print(False)
        
for test in test_cases4:
    if len(BFS_path(letter_graph_4, test[0], test[1])) == test[2]:
        print(True, test[0], test[1], BFS_path(letter_graph_4, test[0], test[1]))
    else:
        print(False)

True HONAN ICERS ['HONAN', 'HOGAN', 'HOGEN', 'HOSEN', 'HOSES', 'HOKES', 'HYKES', 'TYKES', 'TYEES', 'TYERS', 'EYERS', 'EGERS', 'AGERS', 'ACERS', 'ICERS']
True ACKEE RIGOL ['ACKEE', 'ACKER', 'OCKER', 'OAKER', 'BAKER', 'BAKES', 'BASES', 'BASIS', 'BASIN', 'BASON', 'BISON', 'VISON', 'VISOR', 'VIGOR', 'RIGOR', 'RIGOL']


True GAITT IDOLA ['GAITT', 'GAITS', 'BAITS', 'BAILS', 'VAILS', 'VRILS', 'ARILS', 'AXILS', 'AXELS', 'AVELS', 'OVELS', 'OVALS', 'ODALS', 'ODYLS', 'IDYLS', 'IDOLS', 'IDOLA']
True APPEL LUNET ['APPEL', 'APPAL', 'APPAY', 'APPLY', 'AMPLY', 'AMPLE', 'AMOLE', 'ANOLE', 'ANILE', 'ANILS', 'ARILS', 'ARIAS', 'ARRAS', 'AURAS', 'AURES', 'AUNES', 'LUNES', 'LUNET']
True HERTZ INPUT ['HERTZ', 'NERTZ', 'NERTS', 'CERTS', 'CERES', 'CEDES', 'CEDER', 'CIDER', 'EIDER', 'ENDER', 'ENDEW', 'ENSEW', 'UNSEW', 'UNSET', 'ONSET', 'ONCET', 'ONCES', 'ONCUS', 'INCUS', 'INCUT', 'INPUT']
True MUSCA UNCOY ['MUSCA', 'MUSHA', 'MUSHY', 'BUSHY', 'BUSKY', 'BUSKS', 'BISKS', 'BISES', 'BIDES', 'BIDER', 'EIDER', 'ENDER', 'ENDEW', 'ENSEW', 'UNSEW', 'UNSET', 'ONSET', 'ONCET', 'ONCES', 'UNCES', 'UNCOS', 'UNCOY']
True GLAUM UNAIS ['GLAUM', 'GLAUR', 'GLAIR', 'FLAIR', 'FLAIL', 'FRAIL', 'BRAIL', 'BRAIN', 'BLAIN', 'SLAIN', 'SPAIN', 'SPAIT', 'SPLIT', 'UPLIT', 'UNLIT', 'UNLET', 'UNSET', 'ONSET', 'ONCET', 'ONCES', 'ONCUS', 'UNCUS', 'UNAUS', '

True KLIK OFAY ['KLIK', 'KAIK', 'KAIS', 'SAIS', 'SKIS', 'SKAS', 'OKAS', 'OKAY', 'OFAY']
True DHAK EDHS ['DHAK', 'DHAL', 'CHAL', 'CHAT', 'SHAT', 'STAT', 'ETAT', 'ETAS', 'ETHS', 'EDHS']
True IDOL JEDI ['IDOL', 'IDYL', 'ODYL', 'ODAL', 'ODAS', 'ODDS', 'OUDS', 'CUDS', 'CADS', 'CADI', 'CEDI', 'JEDI']
True IDOL JIAO ['IDOL', 'IDYL', 'ODYL', 'ODAL', 'ODAS', 'OBAS', 'ABAS', 'AIAS', 'AITS', 'CITS', 'CITO', 'CIAO', 'JIAO']
True ASHY ODIC ['ASHY', 'ACHY', 'ACHE', 'ACME', 'ALME', 'ALAE', 'SLAE', 'SPAE', 'SPIE', 'SPIC', 'EPIC', 'ETIC', 'OTIC', 'ODIC']
True EGAL UNAU ['EGAL', 'EGAD', 'ECAD', 'ECOD', 'ECOS', 'EPOS', 'APOS', 'APTS', 'ANTS', 'ANTE', 'ANCE', 'UNCE', 'UNCI', 'UNAI', 'UNAU']
