<h1>Algorithms Reference Book</h1>
<h2>A personal reference to accompany the <a href="https://www.manning.com/books/grokking-algorithms"> Grokking Algorithms textbook</a></h2>
This jupyter notebook serves a reference to the Grokking Algorithms illustrated guide, and lays out examples of the algorithms discussed within its text, demonstrated in Python. The intended purpose is to be a helpful guide for myself and others to refer too for revision.

<h3>Breadth-first search</h3>
Breadth-first search algorithms are used for finding if a path exists, and the shortest possible path in an unweighted graph. The function below is an implementation of breadth-first search in python.

In [None]:
# %load breadth_first.py

import collections

def search(start_point, query_function, graph):
    """Params:
        start_point: starting node, must be an existing key in graph
        query_function: function to pass search queries, must return boolean value
        graph: graph to search. Expects hash table datatype"""
    queue = collections.deque()
    queue += graph[start_point]
    searched = []

    while queue:
        query = queue.popleft()
        if not query in searched:
            if query_function(query):
                print("Matching node found!")
                return {"query_node": query, "path": searched}
        else:
            queue += graph[query]
            searched.append(query)
    return False


Notes on breadth first search:
* Tells you if there is a path from A to B
* If a path exists, it will find the shortest path
* When relating to distance, try modeling a problem as a graph
* A directed graph has arrows, and the relationship follows the arrows
* Undirected graphs don't have arrows and the relationship goes both ways
* Queues are FIFO (first in first out)
* Stacks are LIFO (last in first out)
* When doing breadth first, use a queue, need to check objects in the order they were added to the search list, otherwise you won't get the shortest path
* Make sure not to check objects twice, or else you will get an infinite loop

<h3>Dijkstra's Algorithm</h3>
Dijkstra's (pronounced deyek-stra) algorithm is for finding the shortest possible path in weighted directed acyclic graphs. NOTE: Dijkstra's algorithm does not work on graphs with negative weighted edges. See the Bellman-ford algorithm.

In [None]:
# %load Dijkstras_Algorithm.py
class Dijkstra:
    def __init__(self, graph):
        """Initialise Dijkstra search.
        Params:
            Graph: should be a nested dictionary detailing nodes, neighbours
            and weights for edges connecting nodes and neighbours"""
        if istype(graph) is dict:
            self.graph = graph
        else:
            print("Graph of invalid datatype")
            break
        self.infinity = float("inf")

    def init_costs(self, start_node):
        #We only know the costs for the neighbours of the starting node
        #Set all other nodes to infinity
        costs = {}
        for node in self.graph:
            if node == start_node:
                for k, v in node.items():
                    costs[k] = v
            else:
                for k, v in node:
                    costs[k] = self.infinity
        return costs

    def init_parents(self, start_node):
        #Set the parent of neighbours to start node
        parents = {}
        for k, v in self.graph[start_node].items():
            parents[k] = "start"
        return parents

    def find_lowest_cost_node(self, costs, processed_nodes):
        lowest_cost = float("inf")
        lowest_cost_node = None
        for node in costs:
            cost = costs[node]
            if cost < lowest_cost_node and node not in processed_nodes:
                lowest_cost = cost
                lowest_cost_node = node
        return lowest_cost_node

    def find_shortest_path(self, start_node):
        costs = self.init_costs(start_node)
        parents = self.parents(start_node)
        processed = []
        node = self.find_lowest_cost_node(costs, processed)

        while node is not None:
            cost = costs[node]
            neighbours = self.graph[node]
            for n in neighbours.keys():
                new_cost = cost + neighbours[n]
                if costs[n] > new_cost:
                    costs[n] = new_cost
                    parents[n] = node
            processed.append(node)
            node = find_lowest_cost_node(costs, processed)
        return parents, costs


Notes on Dijkstra's Algorithm:
* Breadth-first search is used to calculate the shortest path for an unweighted graph
* Dijkstra's algorithm is used to calculate the shortest path for a weighted graph
* Dijkstra's algorithm only works when all weights are positive, otherwise use Bellman-Ford algorithm

<h3>Greedy Algorithms</h3>
Sometime's you cannot solve an algorithm for the general optimal solution, and so you want to create an algorithm that is just good enough, but is also quick. This is where greedy algorithms come into play. Greedy algorithms aim to give an approximation to the optimal solution whilst also being fast.

<h4>The Set-covering Problem</h4>
Say for example: you have a radio show and you want to reach listeners in all 50 of the US states. You need to decide what stations to play on to reach all those listeners. It costs money to be on each station so you want to minimise the number of stations you play on but get the biggest outreach possible.

You could list every possible subset of stations (the power set) and pick the set with the smallest number of stations that covers all 50 states, but there would be a total of 2^n possible subsets, and this could take a long time to calculate. So it would be better to use a greedy algorithm that approximates a reasonible answer. We can do this by picking a station that covers most of the states that haven't been covered yet (even if it overlaps with some other stations in regards to states that have been covered) and then repat until we have coverage over every state


In [5]:
#Setup
states_needed = set(["mt", "wa", "or", "id", "nv", "ut"])
stations = {
    "kone": set(["id", "nv", "ut"]),
    "ktwo": set(["wa", "id", "mt"]),
    "kthree": set(["or", "nv", "ca"]),
    "kfour": set(["nv", "ut"]),
    "kfive": set(["ca", "az"])
}
final_stations = []

In [6]:
while states_needed:
    best_station = None
    states_covered = set()
    for station, states in stations.items():
        covered = states_needed & states #This is called set intersection, see below!
        if len(covered) > len(states_covered):
            best_station = station
            states_covered = covered
    states_needed -= states_covered
    final_stations.append(best_station)

In [7]:
final_stations

['kone', 'ktwo', 'kthree']

**Set union/intersection/difference**

In [8]:
set_one = set([1,3,5,8,19,25,34])
set_two = set([1,3,15,19,23,25,42,68])    

In [9]:
#Set union
set_one | set_two

{1, 3, 5, 8, 15, 19, 23, 25, 34, 42, 68}

In [10]:
#Set intersection
set_one & set_two

{1, 3, 19, 25}

In [11]:
#Set difference
set_one - set_two

{5, 8, 34}

<h4>NP-Complete Problems</h4>

The problem above is very similar to the 'travelling salesperson' problem. The optimal solution involves calculating every possible solution and picking the smallest/shortest one. Problems like this are called NP-complete problems. (Btw, NP stands for "nondeterministic polynomial time").

As soon as you realise that a problem is an NP-problem, you should stop trying to solve it perfectly, and instead focus your efforts on making a good approximation. There is no easy way of telling if a probelm is NP-complete, but here are some giveaways:
* Your algorithm is quick when processing a few items, but soon deterioates when you add more items
* "All combinations of X" usually points to an NP-complete problem
* You can't break something down into sub-problems and so you have to calculate "every possible version of X".
* The problem involves a sequence and is difficult to solve (such as the travelling salesperson)
* If the problem involves a set and is difficult to solve (like the set-covering problem)
* If you can restate the problem like it is the travelling salesperson problem or the set-covering problem, then it is definitely NP-complete

Notes on Greedy Algorithms:
* They optimise locally, hoping to end up with a global optimum
* NP-complete problems have no known fast solution
* If you have an NP-complete problem, your best bet is to use an approximation algorithm
* Greedy algorithms are simple and fast to run, so they make for good approximation algorithms