# Chapter 6 - Questions

### 1. Using the hash table performance formulas given in the chapter, compute the average number of comparisons necessary when the table is

* ### 10% full

* ### 25% full

* ### 50% full

* ### 75% full

* ### 90% full

* ### 99% full

### At what point do you think the hash table is too small? Explain.

Let's define first some helper functions to get us the given formulas.

In [1]:
def successful_search(lf):
    return (1 / 2) * (1 + 1 / (1 - lf))

def unsuccessful_search(lf):
    return (1 / 2) * (1 + ((1 / (1 - lf)) ** 2))

def successful_search_chaining(lf):
    return 1 + (1 / lf)

def unsuccessful_search_chaining(lf):
    return lf

Let's now compute the average number of searches.

In [2]:
for lf in [0.10, 0.25, 0.50, 0.75, 0.90, 0.99]:
    print(f"Successful (no chaining) for load factor {lf}: {successful_search(lf)}")

print("\n")

for lf in [0.10, 0.25, 0.50, 0.75, 0.90, 0.99]:    
    print(f"Unuccessful (no chaining) for load factor {lf}: {unsuccessful_search(lf)}")

print("\n")

for lf in [0.10, 0.25, 0.50, 0.75, 0.90, 0.99]:
    print(f"Successful (with chaining) for load factor {lf}: {successful_search_chaining(lf)}")
    
print("\n")

for lf in [0.10, 0.25, 0.50, 0.75, 0.90, 0.99]:
    print(f"Unsuccessful (with chaining) for load factor {lf}: {unsuccessful_search_chaining(lf)}")


Successful (no chaining) for load factor 0.1: 1.0555555555555556
Successful (no chaining) for load factor 0.25: 1.1666666666666665
Successful (no chaining) for load factor 0.5: 1.5
Successful (no chaining) for load factor 0.75: 2.5
Successful (no chaining) for load factor 0.9: 5.500000000000001
Successful (no chaining) for load factor 0.99: 50.49999999999996


Unuccessful (no chaining) for load factor 0.1: 1.117283950617284
Unuccessful (no chaining) for load factor 0.25: 1.3888888888888888
Unuccessful (no chaining) for load factor 0.5: 2.5
Unuccessful (no chaining) for load factor 0.75: 8.5
Unuccessful (no chaining) for load factor 0.9: 50.50000000000002
Unuccessful (no chaining) for load factor 0.99: 5000.499999999992


Successful (with chaining) for load factor 0.1: 11.0
Successful (with chaining) for load factor 0.25: 5.0
Successful (with chaining) for load factor 0.5: 3.0
Successful (with chaining) for load factor 0.75: 2.333333333333333
Successful (with chaining) for load factor 0

I can see that after a load factor of 0.5 (it means that the hash table is half full), performances degrade (when not using chaining). We note that when the load factor is 0.99, the number of searches is very huge.

### 2. Modify the hash function for strings to use positional weightings.

We just need to multiply each character's ordinal value for its position within the string, and still perform the modulo operation.

In [3]:
def hash_str(a_string, table_size):
    return sum([ord(c) for c in a_string]) % table_size

def hash_str_pos_w(a_string, table_size):
    return sum([ord(c) * i for i, c in enumerate(a_string)]) % table_size


### 3. We used a hash function for strings that weighted the characters by position. Devise an alternative weighting scheme. What are the biases that exist with these functions?

We might weight each character by its position inside the alphabet (so the weight would be `w = ord(ch) - ord("a")`), but this approach would hash anagrams to the same position. An improvement of the positional weighting is to start with `i = 1` and not `i = 0`: in this latter case, strings such as `"what"` and `"that"` will collide, since the contribution of the first character is cancelled by the weight `i = 0`.

In [4]:
def hash_str_pos_w_better(a_string, table_size):
    return sum([ord(c) * (i + 1) for i, c in enumerate(a_string)]) % table_size
    

But still, even though both the letters and their position matter, we can still find combinations of words which will collide. With all these approach, another common problem might be due to a reduced hash table size, so choosing the appropriate size would be beneficial to avoid collisions.

### 4. Research perfect hash functions. Using a list of names (classmates, family members, etc.), generate the hash values using the perfect hash algorithm.

A perfect hash function is a hash function that maps distinct elements from a set to another set of `m` distinct integers, therefore making no collisions. In Python, [Perfect Hash](https://github.com/ilanschnell/perfect-hash) is an example of a perfect hash function generator which generates a perfect hash function, for a given set of words, such that each word maps to a distinct integer. You can install it with pip and try the algorithms by yourself. 

#### Disclaimer

The code I linked above is not mine, so all the credits go to the original creator, i.e. the repository owner.

### 5. Generate a random list of integers. Show how this list is sorted by the following algorithms:

* ### bubble sort

* ### selection sort

* ### insertion sort

* ### shell sort (you decide on the increments)

* ### merge sort

* ### quick sort (you decide on the pivot value)

First let's write the different sorting algorithms.

In [171]:
import random

def bubble_sort(a, debug=False):
    if debug:
        print(a)

    for i in range(len(a)):
        for j in range(i, len(a)):
            if debug:
                print(a)
            if a[i] > a[j]:
                a[i], a[j] = a[j], a[i]

def selection_sort(a, debug=False):
    if debug:
        print(a)
    
    for i, _ in enumerate(a):
        min_idx = i
        for j in range(i, len(a)):
            if debug:
                print(a)
            if a[j] < a[min_idx]:
                min_idx = j 
        if min_idx != i:
            a[min_idx], a[i] = a[i], a[min_idx]

def insertion_sort(a, debug):
    if debug:
        print(a)

    for i in range(1, len(a)):
        j = i
        while j > 0 and a[j] < a[j - 1]:
            if debug:
                print(a)
            a[j], a[j - 1] = a[j - 1], a[j]
            j -= 1

def shell_sort(a, increment=None, debug=False):
    if debug:
        print(a)
    
    increment = max(0, increment) if increment is not None else len(a) // 2
    for gap in range(increment, 0, -1):
        for i in range(gap, len(a), gap):
            if debug:
                print(a)
            j = i
            while j > 0 and a[j] < a[j - gap]:
                a[j], a[j - gap] = a[j - gap], a[j]
                j -= gap

def merge_sort(a, debug=False):
    if len(a) == 1:
        return a
    else:
        mid = len(a) // 2
        return merge(merge_sort(a[:mid], debug), merge_sort(a[mid:], debug), debug)

def merge(a1, a2, debug=False):
    p1 = 0
    p2 = 0
    new_a = []

    while p1 < len(a1) and p2 < len(a2):
        if a1[p1] <= a2[p2]:
            new_a += [a1[p1]]
            p1 += 1
        else:
            new_a += [a2[p2]]
            p2 += 1

    while p1 < len(a1):
        new_a += [a1[p1]]
        p1 += 1

    while p2 < len(a2):
        new_a += [a2[p2]]
        p2 += 1
    
    if debug:
        print(f"Merging {a1} and {a2} into {new_a}")

    return new_a

def quick_sort(a, low=None, high=None, pivot=None, debug=False):
    if low is None:
        low = 0 
    if high is None:
        high = len(a) - 1

    if low < high:
        separator = partition(a, low, high, pivot, debug)
        quick_sort(a, low, separator - 1, pivot, debug) 
        quick_sort(a, separator + 1, high, pivot, debug)

def partition(a, low, high, pivot=None, debug=False):
    if pivot == "middle":
        pivot = (high + low) // 2
    else:
        pivot = random.randint(low, high)

    if debug:
        print(f"Partitioning {a[low:high + 1]}")
        print(f"Pivoting on {a[pivot]}")
        
    la = [n for n in a[low:high + 1] if n < a[pivot]]
    ma = [n for n in a[low:high + 1] if n == a[pivot]]
    ha = [n for n in a[low:high + 1] if n > a[pivot]]

    new_a = la + ma + ha
    a[low:high + 1] = new_a[:]

    if debug:
        print(f"After pivot: {a}")
        print(f"New split point is {low + len(la)}")
    
    return low + len(la)

Now we can see how sorting a list with each of these algorithms works out. Let's first generate a random array of ten elements whose values range from 0 and 99. Then we can run the algorithms, in order, and see each step (by putting the argument `debug = True`).

In [163]:
a = random.sample(range(0, 100), 5)

Now, we can either run the algorithms here (but the output will be collapsed) or we can run the scripts from the command line (we created a file containing the algorithms [here](./sorting_algorithms.py)) and pipe the output to the file [steps_q5.txt](./steps_q5.txt).

In [166]:
! python sorting_algorithms.py >> steps_q5.txt

### 6. Consider the following list of integers: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Show how this list is sorted by the following algorithms:

* ### bubble sort

* ### selection sort

* ### insertion sort

* ### shell sort (you decide on the increments)

* ### merge sort

* ### quick sort (you decide on the pivot value)

Let's reuse the previous strategy, by saving the output in the file [steps_q6.txt](./steps_q6.txt).

In [167]:
! python sorting_algorithms.py >> steps_q6.txt

The list is already sorted and the different algorithms might take more or less steps to "figure it out", but we can see how Insertion Sort is the quickest to figure that out.

### 7. Consider the following list of integers: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]. Show how this list is sorted by the following algorithms:

* ### bubble sort

* ### selection sort

* ### insertion sort

* ### shell sort (you decide on the increments)

* ### merge sort

* ### quick sort (you decide on the pivot value)

Let's again reuse the previous strategy, by saving the output in the file [steps_q7.txt](./steps_q7.txt).

In [168]:
! python sorting_algorithms.py >> steps_q7.txt

It's interesting to notice how, given an input size, Merge Sort performs always the same number of steps.

### 8. Consider the list of characters: ["P", "Y", "T", "H", "O", "N"]. Show how this list is sorted using the following algorithms:

* ### bubble sort

* ### selection sort

* ### insertion sort

* ### shell sort (you decide on the increments)

* ### merge sort

* ### quick sort (you decide on the pivot value)

Let's run the algorithms, and save the output in [steps_q8.txt](./steps_q8.txt).

In [169]:
! python sorting_algorithms.py >> steps_q8.txt

### 9. Devise alternative strategies for choosing the pivot value in quick sort. For example, pick the middle item. Re-implement the algorithm and then execute it on random data sets. Under what criteria does your new strategy perform better or worse than the strategy from this chapter?

We can call the `quick_sort` by explicitly passing a value for the pivot.

In [173]:
! python sorting_algorithms.py >> steps_q9.txt

The only improvement one can get is if the list is already some sorted: in this case picking the first element as a pivot would be very inefficient.