> Again, you can access [py tutor](http://pythontutor.com/visualize.html#mode=edit) to inspect the details of how algorithm works.

### Chapter 05

Let's talk about *Hash Table*.
- O(1) for access.
- Other names: [*hash-map*, *map*, *dict*, *associative array*].

O for read/insert/delete
- For best: all O(1)
- For worst: all O(N)

In [55]:
book = dict()

book['Stat']    = 'ThinkStats'
book['Crypto']  = 'Crypto 101'
book['Reciple'] = 'Py Cookbook'

book

{'Stat': 'ThinkStats', 'Crypto': 'Crypto 101', 'Reciple': 'Py Cookbook'}

In [56]:
# voting 

voted = dict()

def check_voter(name):
    if voted.get(name):        # got nothin' => False for 'if'
        print("Nope. Leave!")  
    else:
        voted[name] = True     # add'em to dict :)
        print("Let'em vote!")
        

check_voter('Tom')
check_voter('Alice')
check_voter('Alice')

voted

Let'em vote!
Let'em vote!
Nope. Leave!


{'Tom': True, 'Alice': True}

### Chapter 06

Let's talk about *Graph*.
- Then write our first algorithm: *Breadth-first Search*.
- Or we can say like this: "Find the shortest path to XXX".

What are these terms?
- The names are ***Node***.
- The lines are ***Edge***.
- ...

![01](./img/example_of_graph.jpg)

There're two common questions:
- Is there a path from *Node A* to *Node B*?
- What is the ***shortest path*** from *Node A* to *Node B*?

Two more terms:
- *FIFO*: First In, First Out.
- *LIFO*: Last In, First Out.

Here're the code example:
> Well, suppose we need to find a *book seller* among our friends.

In [57]:
from collections import deque 

graph = {}

graph['you']    = ['alice', 'bob', 'claire']

graph['alice']  = ['peggy']
graph['bob']    = ['anuj', 'peggy']
graph['claire'] = ['thom', 'jonny']

graph['anuj']   = []
graph['peggy']  = []
graph['thom']   = []
graph['jonny']  = []

![02](./img/example_of_graph_02.jpg)

In [58]:
def person_is_seller(name): # you can specify your own :)
    return name[-1] == 'm'  # last alphabet of the name ('Thom')


def search(name):
    
    search_queue =  deque()
    search_queue += graph[name]
    
    searched = []  # store the people was searched before 
    
    while search_queue:
        
        person = search_queue.popleft()
        
        if person not in searched:                  # in case we search the same person (infinite-loop!)
               
            if person_is_seller(person):
                print(person, "is a book seller!")  
                return True 
            else:
                search_queue += graph[person]
                searched.append(person)             # mark whom was searched before 
    
    return False

In [59]:
search('you')  # 0729-Todo: Add details 

thom is a book seller!


True

### Chapter 07 

Let's talk about *Dijkstra's algorithm*.

The *Graph* we talked about before: 
- **Only** care about the ***least number of segments***.
- It's the **shortest** path, but may not the **fastest** path.

Let's add some **weight** to the *Graph*'s **Edge**.
> Well, the '*amount of time*' 🙂.

![03](./img/weight_of_graph.jpg)

Here's the code (and the *graph*)!

![04](./img/weight_of_graph_02.jpg)

In [60]:
# Hey, just a reminder:
#   the 'cost(s)' we mentioned is the 'time we spent' (not money!)

''' The Graph '''

graph = {}

graph['start'] = {}
graph['start']['A'] = 6
graph['start']['B'] = 2 

graph['A'] = {}
graph['A']['end'] = 1 

graph['B'] = {}
graph['B']['A'] = 3 
graph['B']['end'] = 5 

graph['end'] = {}


''' the costs table '''

infinity = float('inf')
costs = {}
costs['A'] = 6
costs['B'] = 2 
costs['end'] = infinity


''' the parents table ''' 

parents = {}
parents['A'] = 'start'
parents['B'] = 'start'
parents['end'] = None 

' The Graph '

' the costs table '

' the parents table '

In [61]:
processed = []

def find_lowest_cost_node(costs):
    lowest_cost      = float('inf')
    lowest_cost_node = None 
    
    for node in costs:
        cost = costs[node]
        
        if cost < lowest_cost and node not in processed:
            lowest_cost      = cost 
            lowest_cost_node = node 
    
    return lowest_cost_node

In [62]:
node = find_lowest_cost_node(costs)

while node is not None:
    
    cost      = costs[node]
    neighbors = graph[node]
    
    for n in neighbors.keys():
        new_cost = cost + neighbors[n]
        
        if costs[n] > new_cost:
            
            costs[n]   = new_cost
            parents[n] = node 
    
    processed.append(node)
    
    node = find_lowest_cost_node(costs)

Let me show it again 🙃

![05](./img/weight_of_graph_02.jpg)

*One* more thing:
- I strongly recommended to watch the [viz](http://pythontutor.com/visualize.html#mode=display) (procedures).
- TODO: *add more details*.

In [63]:
# these're all the shortest cases
#   start -> A 
#   start -> B 
#   start -> (blabla) -> finish (end)
costs  

{'A': 5, 'B': 2, 'end': 6}

### Chapter 08

Let's talk about *Greedy Algorithm*.
- And *NP-Complete* (no notes for it XD).

Here's an "real world" example:
- You are starting a radio show.
- You wanna reach listeners in all 50 <del>(U.S.)</del> states. 

And
- You have to decide ***what stations to play on***. 
- You wanna cost less money (***minimize the number of stations***).
- Each station covers a region, and there's **overlap**!

In [64]:
# here's the list of abbreviations:
#   https://simple.wikipedia.org/wiki/List_of_U.S._states

# well, I've changed it to the provinces of Chinese (sorry).

states_needed = set([
    'BeiJing', 'JiangSu',  'ShangHai',  'SiChuan', 
    'NanJing', 'XinJiang', 'ChongQing', 'ShenYang',
])

stations = {}

stations['k_one']   = set(['SiChuan',   'NanJing',  'XinJiang'])
stations['k_two']   = set(['JiangSu',   'SiChuan',  'BeiJing'])
stations['k_three'] = set(['ShangHai',  'NanJing',  'ChongQing'])
stations['k_four']  = set(['NanJing',   'XinJiang']) 
stations['k_five']  = set(['ChongQing', 'ShenYang'])

In [65]:
final_stations = set()


while states_needed:
    
    best_stations  = None 
    states_covered = set()
    
    for station, states in stations.items():
        
        covered = states_needed & states  # = intersection (both have)
        
        if len(covered) > len(states_covered):
            
            best_station   = station 
            states_covered = covered
    
    states_needed = states_needed - states_covered 
    final_stations.add(best_station)

In [66]:
final_stations

{'k_five', 'k_one', 'k_three', 'k_two'}

### Chapter 09

Let's talk about *Dynamic Programming*.

It's a technique to solve problem 
- by **breaking it up** into several problems
- and solving those **subproblems first**.

此章節計劃 于他處學習.<br>(Learn by the other tutorials, not showin here.)
- Here's the tutorial: [Dynamic Programming Made Easy
](https://nbviewer.jupyter.org/github/younlonglin/Top-Ten-Algorithms/blob/master/dp_made_easy.ipynb)

### Chapter 10

#### Let's talk about *KNN* and its applications!

Well, KNN means <q>*k*-***nearest*** neighbors</q>.<br>
( <small>the k could be any numbers: 2, 10, 10000 is totally fine</small> )


Two rules before we begin
- The features u choose should ***directly correlate*** to the thing you're trying to recommend
- Features that don't have a bias. 
    - If you ask the users to only rate comedy movies,
    - It doesn't (and cannot) tell you whether they like sci-fi movies.
    
Three terms to be explained
1. KNN is used for *<u>classification</u>* and *<u>regression</u>* and involves *<u>looking at the k-nearest neighbors</u>*.
    1. classification == categorization into a group 
    2. Regression == predicting a response (like a number)
2. *<u>Features extraction</u>* means ***converting an item*** into a list of ***numbers that can be compared***.
3. Picking good features is an crucial part.

[**knn-example**]<br>
Suppose we're identify a thing (e.g. fruit)

![ ](./img/knn_fruit_01.jpg)

If it's just a single pic, how do we know what it is?
- Well, our brain can *guess*, 
- which can also comparing it with other fruits that we've before

Here's an example

![ ](./img/knn_fruit_02_colrow.jpg)

In normal cases, 
- the options of *what it is* is not that hard to guess (I mean *easy*).
- Then we suppose it's either an orange or a grapefruit <small>(橙子)</small>. 

From the pic we can know
- The neighbor: *oranges*, which are ***more*** than the *grapefruits*.
- So (hmm), it's ***probably*** an *orange* instead a *grapefruit*.

[**knn-example**]<br>
Take a real-world example 

How about a *movie recommendation* system :)

0x03 (reverse)
- A, B, C may all have similar taste in some movies.
- So, if A watched a movie (***and he likes it***), we recommended it B (or C).

0x02
- From *0x03* we only got the *similarity*, but not ***how similar they are***.
- Next, we extract the features of the *thing*.

![ ](./img/knn_fruit_03_feature01.jpg)

Then plot it 

![ ](./img/knn_fruit_03_feature02.jpg)

It's easy to **see** that <q> A and B are *quite similar* </q>
- Or u wanna calc it. 
- We already know the axises of A, B, C. 
    - So it's not hard to calculate all the coordinates. (ommited)

0x01
- From the previous example, <br>we found that we could **apply** this method **to movie recommendations** !!

First we've got the **ratings**

![ ](./img/knn_movie_04_step01.jpg)

Let's recall the previous example

![ ](./img/knn_movie_04_step02.jpg)

Then we calculate it

![ ](./img/knn_movie_04_step03.jpg)

Cuz there's *five* numbers (sort-of-**dimentions**), we couldn't plot it directly
- Even that, we still able to calculate it
- If we calculate, we assume
    - a<small>1</small> is PRIYANKA -- (user1)
    - a<small>2</small> is JUSTIN   -- (user2)
    
The result is the **how-similar-they-are** (P and J)
- Now we can say: <q> PRIYANKA and JUSTIN do have **similar taste**</q>. 

( The movie recommendation system ends here )

[**knn-example**]<br>
Let's talk about *regression*

Suppose you run a bakery <small>(面包店)</small>
- You *make fresh bread* every day.
- You're trying to **predict how many loaves to make** for today.

You've got a set of features

| F | V | 
| :-- | :-- | 
| Weather | 1 ~ 5 (bad->great) | 
| Weekday / Holiday | 1 (either), 0 (other-case) | 
| Is-there-a-game-on | 1 (yes), 0 (no) |

Then this is your stats in the past 

![ ](./img/knn_bread_05_feature.jpg)

status of today
- *Weekend* / *Good Weather* / *No-Game*

Let's use KNN again
- The *similar status* (day) is your neighbors. (You've got A, B, D, E) (<small>300, 225, 200, 150</small>)
- The <u>average of the loaves</u> sold *on those days* is **218.75**. 

Then 
- you should probably make 219 loaves for today, at least.

Of course, it's not that simple.
- You might to make *more loaves* than what u said.
- You might to make *more loaves* <u>specifically on Monday</u>.
- And there's still more things you need to consider, choose yourself :)

#### Intro to Machine Learning (sort of)

OCR works (roughtly) like this
1. take a lot of images
2. extra its features 
3. got a new image -> extra its features 
4. see similar features with old images (nearest neighbors!)

For number, 

![ ](./img/knn_ocrnum_05_feature.jpg)

For the first step of OCR, 
- like <q>*go through images of numbers and extra features*</q>.
- It was called ***training***.

These got a lot more complicated in practice.
- But it's important to understand <br>that ***even complex techs build on simple ideas, like KNN*** .
- The ideas could also be used in *face recognition* or *speech recognition* etc.

***The End***