# ADS 2023 spring Week 6 Exercises
Exercises for Algorithms and Data Structures at ITU. The exercises are from Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne unless otherwise specified. Color-coding of difficulty level and alterations to the exercises (if any) are made by the teachers of the ADS course at ITU.

**<span style="background: LimeGreen">3.1.10 - Green</span>**  Give a trace of the process of inserting the keys E A S Y Q U E S T I O N into an initially empty table using SequentialSearchST. How many compares are involved?  

*Solution*:  
E: 0 compares  
A: 1 compare  
S: 2 compares  
Y: 3 compares  
Q: 4 compares  
U: 5 compares  
E: 1 compare  
S: 3 compares  
T: 6 compares  
I: 7 compares  
O: 8 compares  
N: 9 compares  
  
In total, 49 compares.

**<span style="background: LimeGreen">3.1.11 - Green</span>**  Give a trace of the process of inserting the keys E A S Y Q U E S T I O N into an initially empty table using BinarySearchST. How many compares are involved?

*Solution*:  
E: 0 compares  
A: 1 compare  
S: 1 compare  
Y: 2 compares  
Q: 2 compares  
U: 3 compares  
E: 2 compares  
S: 1 compare  
T: 3 compares  
I: 3 compares  
O: 3 compares  
N: 3 compares  
  
In total, 24 compares.

**<span style="background: LimeGreen">3.4.1 - Green</span>**  Insert the keys E A S Y Q U T I O N in that order into an initially empty table of M = 5 lists, using separate chaining. Use the  hash function $11k \% M$ to transform the *k*th letter of the alphabet into a table index. 

*Solution*:  

In [43]:
class Node:
    def __init__(self, key, value) -> None:
        self.next = None
        self.key = key
        self.value = value

        self.__repr__ = self.__str__
    
    def __str__(self) -> str:
        return f"<{self.key}:{self.value}> → {self.next}"

class sepchan:
    def __init__(self, M: int, hash) -> None:
        self.bin = [None] * 10
        self.hash = hash
    
    def insert(self, key, value):
        index = self.hash(key)
        item = self.bin[index]

        if item == None:
            self.bin[index] = Node(key, value)
            return

        while item.next and not item.next.key == key:
            item = item.next
        
        next_item = item.next
        item.next = Node(key, value)
        if not next_item == None:
            item.next.next = next_item.next
    
    def get(self, key):
        index = self.hash(key)
        item = self.bin[index]
        while item.next and not item.next.key == key:
            item = item.next
        
        return item.next.value if item.next else None
    
    def __str__(self) -> str:
        return "{\n" + "\n".join([f"  Bin {i}: " + x.__str__() for i, x in enumerate(self.bin)]) + "\n}"

In [44]:
chan = sepchan(5, lambda x: 11*(ord(x)-64) % 5)
chan.insert("E", "E")
chan.insert("A", "A")
chan.insert("S", "S")
chan.insert("Y", "Y")
chan.insert("Q", "Q")
chan.insert("U", "U")
chan.insert("T", "T")
chan.insert("I", "I")
chan.insert("O", "O")
chan.insert("N", "N")
print(chan)

{
  Bin 0: <E:E> → <Y:Y> → <T:T> → <O:O> → None
  Bin 1: <A:A> → <U:U> → None
  Bin 2: <Q:Q> → None
  Bin 3: None
  Bin 4: <S:S> → <I:I> → <N:N> → None
  Bin 5: None
  Bin 6: None
  Bin 7: None
  Bin 8: None
  Bin 9: None
}


**<span style="background: LimeGreen">3.4.10 - Green</span>**  Insert the keys E A S Y Q U T I O N in that order into an initially empty table of size M = 16 using linear probing. Use the hash  unction 11k % M to transform the kth letter of the alphabet into a table index. Then redo this exercise for M = 10 

*Solution*:  

In [67]:
import math

class linprobe:
    def __init__(self, hash) -> None:
        self.bin = [None]
        self.hash = hash
    
    def insert(self, key, value):
        index = self.hash(key)

        while len(self.bin) < index + 1:
            self.double()

        while not (self.bin[index] == None or self.bin[index][0] == key):
            index += 1
        
        self.bin[index] = (key, value)
    
    def get(self, key):
        index = self.hash(key)

        while len(self.bin) < index + 1:
            self.double()

        while not self.bin[index][0] == key:
            index += 1
        
        return self.bin[index][1]

    def double(self):
        added = [None] * math.ceil(len(self.bin)/2)
        self.bin = self.bin + added

    def __str__(self) -> str:
        return self.bin.__str__()

In [76]:
lin = linprobe(lambda x: 11*(ord(x)-64) % 10)
def hsh(x):
    return 11*(ord(x)-64) % 10
lin.insert("E", hsh("E"))
lin.insert("A", hsh("A"))
lin.insert("S", hsh("S"))
lin.insert("Y", hsh("Y"))
lin.insert("Q", hsh("Q"))
lin.insert("U", hsh("U"))
lin.insert("T", hsh("T"))
lin.insert("I", hsh("I"))
lin.insert("O", hsh("O"))
lin.insert("N", hsh("N"))
print(lin)

[('T', 0), ('A', 1), ('U', 1), None, ('N', 4), ('E', 5), ('Y', 5), ('Q', 7), ('O', 5), ('S', 9), ('I', 9), None]


**<span style="background: LimeGreen">3.1.13 - Green</span>**  Would you use a sequential search ST or a binary search ST for an application that does $10^3$ put() operations and $10^6$ get()  perations, randomly intermixed? Justify your answer.  

*Solution*:  
Sequential search takes $n$ operations for both search and insert, while binary takes $\log_2n$ operations for search and $n$ for insert, so binary will be better in every case.  

**<span style="background: LimeGreen">3.1.14 - Green</span>**  Would you use a sequential search ST or a binary search ST for an application that does $10^6$ put() operations and $10^3$ get()  perations, randomly intermixed? Justify your answer.  

*Solution*:  
Binary, for same reason as above

**<span style="background: Yellow">3.4.3 - Yellow</span>**  Design an algorithm to find values of a and M, with M as small as possible, such that the hash function (a * k) % M for transforming  he kth letter of the alphabet into a table index produces distinct values (no collisions) for the keys S E A R C H X M P L. The result is known as a perfect hash function. 

*Solution*:  

In [107]:
def findM(keys, hash):
    M = len(keys)
    prime = 31
    a = prime
    for i in range (a < M):
        pass
    # add a number to a lower than M, or loop over to adding M.
    # 11*a % 10 = 10*a % 10 + a
    #a=1 go to a=M, then increase M
    # We know there's a solution for 1*x % 24 (24 is the largest value, "X")

    #a*M*x % M + y % M

keys = ["S", "E", "A", "R", "C", "H", "X", "M", "P", "L"]
findM(keys, lambda x, M: 11*(ord(x)-64) % M)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1
[1, 1, 1, 0, 1, 0, 0, 1, 0, 0] 2
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1


1

**<span style="background: Yellow">3.4.5 - Yellow</span>**  Is the following implementation of getting the hash code of an object legal? If so, describe the effect of using it. If not, explain why.
```python
# Python

def __hash__(self):
    return 17
```

*Solution*:  
yes

**<span style="background: Yellow">3.4.13 - Yellow</span>**  Which of the following scenarios leads to expected linear running time for a random search hit in a linear-probing hash table?  
a) All keys hash to the same index.  
b) All keys hash to different indices.  
c) All keys hash to an even-numbered index.  
d) All keys hash to different even-numbered indices.  

*Solution*:  
a)

**<span style="background: Yellow">3.4.15 - Yellow</span>**  How many compares could it take, in the worst case, to insert N keys into an initially empty table of size N, using linear probing with array resizing?

*Solution*:  
$1+2+...+n=\frac{1}{2}n(n+1)$  
When we also account fo resizing, it is $\frac{1}{2}n(n+1)+\frac{1}{4}n\cdot(\frac{1}{2}n+1)$  
  
Resizing:  
We resize when the array is half full, so when we have M/2 items we create a 2M array.  
At every resize, we re-hash every element as the array size influences the hashing function.  
This takes m operations (we also look at the m/2 empty spaces).  
We resize once or twice (depending on implementation).  

**<span style="background: Yellow">3.4.26 - Yellow</span>**  Lazy delete for linear probing. Add to LinearProbingHashST a delete() method that deletes a key-value pair by setting the value to  null (but not removing the key) and later removing the pair from the table in resize() . Your primary challenge is to decide when to call resize() . Note : You should overwrite the null value if   subsequent put() operation associates a new value with the key. Make sure that your program takes into account the number of such tombstone items, as well as the number of empty positions,  n making the decision whether to expand or contract the table. 

*Solution*:  
Delete finds the element and changes it to null  
Get walks over tombstones as bridges, only stops when finding an empty spot (not a tombstone/null)  
Put: go until we see a tombstone. Save the first tombstone and keep going. When we find an empty cell, put in the first tombstone, if we find our key, replace the value.  
  
Reisizing:  
Double size when $n+tombstones=m/2$. (n=empty cells)  
When resizing, we rehash all non-tombstone elements.  
Decrease size when $n-tombstones = 1/8M$

**<span style="background: Red">3.4.6 - Red</span>**  Suppose that keys are binary integers. For a modular hash function with prime m > 2, prove that any two binary integers that differ in  exactly one bit have different hash values. 

*Solution*:  
Changing a bit is adding or subtracting a power of 2.  
If m > 2, and the difference in number is a power of <= 2 (aka. 2), so % m will always give a different number aka. $2^2+2^1 \% 3 = 2$ and $2^2 \% 3 = 1$.  

**<span style="background: Red">3.4.32 - Red</span>**  Hash attack. Find 2^N strings, each of length 2^N , that have the same hashCode() value, supposing that the hashCode() implementation for String is the following:  
```python
# Python
def hash_code(str):
    hash = 0
    for i in range(len(str)):
        hash = (hash * 37) + ord(str[i])
    return hash
```  
Strong hint: ef and fA have the same value.

*Solution*:  
Lowecase f is exactly 37 spots after A.  
e is 1 lower than f.  
Then, any identical strings, with appended ef for one and fA for the other will have the same hashcode.  
e.g. aaef and aafA will have the same value, because "aa" and "aa" have the same hashcode. Therefore, efef == fAfA, or effA == fAef etc.
So, for every string of an even size, we can use any combination of the pairs ef and fA.  
So, with 6 letters (3 pairs) we can create $2^3=8$ strings:  
efefef, efeffA, effAef, effAfA, fAefef, fAeffA, fAfAef, fAfAfA.  
So, for strings of length $2^3$, we can create $2^4$ strings.  

**<span style="background: Red">3.4.33 - Red</span>**  Bad hash function. Consider the following hashCode() implementation for String, which was used in early versions of Java:  
```python
def hash_code(str):
    hash = 0
    skip = max(1, len(str) // 8)
    for i in range(0, len(str), skip):
        hash = (hash * 31) + ord(str[i])
    return hash
```
Explain why you think the designers chose this implementation and then why you think it was abandoned in favor of the one in the previous exercise.  

*Solution*:  
The function hashes it, but only looks at every 8th position. Saves on computation, but isn't as granular.  
So any strings of length $8 \leq x \leq 15$ that share the first letter will have the same hash.  
Similar string also have a higher likelihood to share hashes. ($\leftarrow$ ok this isn't entirely true, but it definitely doesn't look at every letter when hashing, which is problematic)  