# Same hash

You are given a hash table size $N$. Your task is to find two strings that are assigned to the same location in the hash table using the Python hash function.

In other words, find two strings `x` and `y` such that `hash(x) % N == hash(y) % N`.

You may assume that $N$ is at most $100$. Your solution should be efficient for this case.

In a file `samehash.py`, implement a function `find` that returns the two strings as a pair.

In [None]:
def find(N):
    # TODO

if __name__ == "__main__":
    print(find(42)) # e.g. ('abc', 'aybabtu')

Notice that the function hash changes each time the Python interpreter is started. This means that the function must return different strings at different executions.

## Attempt 1

In [16]:
import random
import string

def find(N):
    # This dictionary will store the hash values as keys and the strings as values
    hashes = {}
    
    # We'll try random strings until we find a collision
    while True:
        # Generate a random string
        x = ''.join(random.choices(string.ascii_lowercase, k=5))
        # Calculate its hash modulo N
        hash_value = hash(x) % N
        
        # If the hash is already in the dictionary, we have a collision
        if hash_value in hashes:
            return (hashes[hash_value], x)
        # Otherwise, store this string in the dictionary
        hashes[hash_value] = x

if __name__ == "__main__":
    print(find(100)) # e.g. ('abc', 'aybabtu')

('jbwkr', 'vrtko')


In [24]:
from random import choices, randint
from string import ascii_lowercase

def find(N):
    hashes = {}
    
    while True:
        wordlen = randint(1,10)
        x = ''.join(choices(ascii_lowercase, k=wordlen))
        hash_value = hash(x) % N
        
        if hash_value in hashes:
            return (hashes[hash_value], x)

        hashes[hash_value] = x

if __name__ == "__main__":
    print(find(100))

('olhav', 'caunzjag')


## Attempt 2

In [27]:
def find(N):
    hashes = {}
    
    with open('words.txt') as f:
        for line in f:
            word = line.strip()
            hash_value = hash(word) % N
            if hash_value in hashes:
                return (hashes[hash_value], word)
        
            hashes[hash_value] = word

        
if __name__ == "__main__":
    print(find(20))

('aah', 'aahing')


## Attempt 3

In [19]:
def find(N):
    # This dictionary will store the hash values as keys and the strings as values
    hashes = {}
    
    # Generate strings systematically
    for i in range(N):
        # Convert the integer to a string
        x = str(i)
        # Calculate its hash modulo N
        hash_value = hash(x) % N
        
        # If the hash is already in the dictionary, we have a collision
        if hash_value in hashes:
            return (hashes[hash_value], x)
        # Otherwise, store this string in the dictionary
        hashes[hash_value] = x

if __name__ == "__main__":
    print(find(100)) # esim. ('abc', 'aybabtu')

('0', '11')


## Solution

The following solution goes through strings that contain only `a`s. This is sufficient because at most $N+1$ strings is needed to guarantee that two strings get the same location.

In [28]:
def find(N):
    string = ""
    strings = {}

    while True:
        string += "a"
        place = hash(string) % N

        if place in strings:
            return (string, strings[place])

        strings[place] = string

if __name__ == "__main__":
    print(find(100)) # esim. ('abc', 'aybabtu')

('aaaaaaaaaaaaaaaa', 'aaaaaaa')
