### Applications of Hashing:

- Implementing Dictionaries for lookups
- Database Indexing to find records (via trees or hashing)
- Cryptography (generate a hash out of sensitive information like passwords)
- Caches 
- Symbol tables in Compilers/Interpreters
- Routers (WiFi router to find IP addresses)
    
 ----

### Direct Address Tables as Arrays:

- Can handle $\mbox{search, insert, delete}$ in $\mathcal{O(1)}$ time when your keys are in a small range.
- However, what about keys with large number?
- What about floating point numbers as keys?
- Strings or addresses?
- This clearly motivates the need for hashing!
    
 ----

### Hashing Introduction:
- A function that takes in a large number of keys and convert them to small values that can be used as an index in the array called hast-table.
- Now, how do you come up with such hash functions and what properties shall a typical function follow?
    1. Should always map a large key to the same small key.
    2. Should always generate value from $0$ to $m-1$ if m is the size of the hash-table.
    3. Should be fast, i.e., $\mathcal{O(1)}$ for integers and $\mathcal{O}$(len(string)) for strings.
    4. Should uniformly distribute large keys into hash table slots.
        
### Typical Examples for Hashing functions:

- h(large_key) = large_key $\% m$ (where $m$ is typically chosen as a prime number)


- For strings, weighted sum is a decent hash-function str[] = "abcd"
    1.  (str[0]$\cdot x_{0}$  +  str[1]*$\cdot x_{1}$  +  str[2]*$\cdot x_{2}$  +  str[3]*$\cdot x_{3}$) $\% m$


- Universal Hashing (Group of hash functions)   
----   

### Collisions are bound to happen, how do you handle with collisions?
    - Chaining 
    - Open Addressing
        - Linear Hashing 
        - Quadratic Hashing
        - Double Hashing
    
----  

# Some popular Hash-Set Python methods;
- set.add()
- set.remove()
- set.clear()

# Some popular Dictionary Python methods;
- dict.keys()
- dict.values()
- dict.items()
- del dict["key"]

----  

### Question - 1 (Count Distinct Elements):

- I/P: [10, 10, 10]
- O/P: 1


- I/P: [10, 11, 12]
- O/P: 3


- I/P: [15, 12, 13, 12, 13, 13, 18]
- O/P: 4

In [1]:
# theta(n) time, O(n) space!
def distinct_elements(array):
    cache = set()
    for number in array:
        if number in cache: continue
        else: cache.add(number)
    return len(cache)

array = [10, 10, 10]
print(distinct_elements(array))

array = [10, 11, 12]
print(distinct_elements(array))

array = [15, 12, 13, 12, 13, 13, 18]
print(distinct_elements(array))

1
3
4


### Question - 2 (Frequency of Every Element in the Array):

- I/P: [10, 10, 10, 10]
- O/P: 10: 4


- I/P: [10, 20]
- O/P: 10:1, 20:1


- I/P: [10, 12, 10, 15, 10, 20, 12, 12]
- O/P: 10: 3, 12: 3, 15: 1, 20: 1

In [2]:
# theta(n) time, O(n) space!
def frequencies(array):
    cache = {}
    for number in array:
        if number not in cache:
            cache[number] = 0
        cache[number] += 1   
        
    return cache    

array = [10, 10, 10, 10]
print(frequencies(array))

array = [10, 20]
print(frequencies(array))

array = [10, 12, 10, 15, 10, 20, 12, 12]
print(frequencies(array))

{10: 4}
{10: 1, 20: 1}
{10: 3, 12: 3, 15: 1, 20: 1}


### Question - 3 (No. of elements in the intersection of Two Unsorted Arrays):

- I/P: [10, 15, 20, 5, 30], [30, 5, 30, 80]
- O/P: 2 


- I/P: [10, 20], [20, 30]
- O/P: 1


- I/P: [10, 10, 10], [10, 10, 10]
- O/P: 1

In [3]:
# theta(n*m) time, O(min(m,n))!
def naive_intersection(array1, array2):
    p1, p2 = 0, 0
    answer, cache = 0, set()
    for outerIndex in range(len(array1)):
        for innerIndex in range(len(array2)):
            if (array1[outerIndex] == array2[innerIndex]) and (array1[outerIndex] not in cache):
                answer += 1
                cache.add(array1[outerIndex])
    return answer

array1, array2 = [10, 15, 20, 5, 30], [30, 5, 30, 80]
print(naive_intersection(array1, array2))

array1, array2 = [10, 20], [20, 30]
print(naive_intersection(array1, array2))

array1, array2 = [10, 10, 10], [10, 10, 10]
print(naive_intersection(array1, array2))


print(" ")


# Efficient Solution: 1
# theta(n + m) time, O(m + n)!
def intersection(array1, array2):
    cache1, cache2 = set(), set()
    for number in array1:
        cache1.add(number)  
    for num in array2:
        cache2.add(num)
    
    answer = 0
    for number in cache1:
        if number in cache2:
            answer += 1
            
    return answer 

array1, array2 = [10, 15, 20, 5, 30], [30, 5, 30, 80]
print(intersection(array1, array2))

array1, array2 = [10, 20], [20, 30]
print(intersection(array1, array2))

array1, array2 = [10, 10, 10], [10, 10, 10]
print(intersection(array1, array2))


print(" ")


# Efficient Solution: 2
# theta(n + m) time, O(min(m, n))!
def better_intersection(array1, array2):
    cache1 = set()
    for number in array1:
        cache1.add(number)
    
    count = 0
    for number in array2:
        if number in cache1:
            count += 1
            cache1.remove(number)
    return count

array1, array2 = [10, 15, 20, 5, 30], [30, 5, 30, 80]
print(better_intersection(array1, array2))

array1, array2 = [10, 20], [20, 30]
print(better_intersection(array1, array2))

array1, array2 = [10, 10, 10], [10, 10, 10]
print(better_intersection(array1, array2))

2
1
1
 
2
1
1
 
2
1
1


### Question - 4 (Union of Two Unsorted Arrays):

- I/P: [15, 20, 5, 15], [9, 9, 9, 20, 10]
- O/P: 5


- I/P: [15, 20, 5, 15], [15, 15, 15, 20, 10]
- O/P: 4


- I/P: [10, 12, 15], [18, 12]
- O/P: 4


- I/P: [3, 3, 3], [3, 3]
- O/P: 1

In [4]:
# theta(n + m) time, O(m + n)!
def union(array1, array2):
    cache = set()
    for number in array1:
        cache.add(number)
        
    for number in array2:
        cache.add(number)
            
    return len(cache)

array1, array2 = [15, 20, 5, 15], [9, 9, 9, 20, 10]
print(union(array1, array2))

array1, array2 = [15, 20, 5, 15], [15, 15, 15, 20, 10]
print(union(array1, array2))

array1, array2 = [10, 12, 15], [18, 12]
print(union(array1, array2))

array1, array2 = [3, 3, 3], [3, 3]
print(union(array1, array2))

5
4
4
1


### Question - 5 (Pair with given sum in unsorted array, aka,  Two Sum):

- I/P: [3, 2, 8, 15, -8], 17
- O/P: True


- I/P: [2, 1, 6, 3], 6 
- O/P: False


- I/P: [5, 8, -3, 6], 3
- O/P: True

In [5]:
# O(n) time, O(n) space!
def pair_with_sum(array, total):
    cache = set()
    for number in array:
        if number in cache:
            return True
        cache.add(total - number)
    return False

array1, total = [3, 2, 8, 15, -8], 17
print(pair_with_sum(array1, total))

array1, total = [2, 1, 6, 3], 6 
print(pair_with_sum(array1, total))

array1, total = [5, 8, -3, 6], 3
print(pair_with_sum(array1, total))

True
False
True


### Question - 6 (SubArray with 0 Sum):

- I/P: [1, 4, 13, -3, -10, 5]
- O/P: True


- I/P: [-1, 4, -3, 5, 1]
- O/P: True


- I/P: [3, 1, -2, 5, 6]
- O/P: False


- I/P: [5, 6, 0, 8]
- O/P: True

In [6]:
def subarray_sum(prefix, i, j):
    if i==0:
        return prefix[j]
    else:
        return prefix[j] - prefix[i-1]
    
def prefix_sum(array):
    prefix = []
    total = 0
    for number in array:
        total += number
        prefix.append(total)
        
    return prefix

def somewhatnaive_subarraysum_with_zero(array):
    prefix = prefix_sum(array)
    cache = set()
    for number in prefix:
        if (number==0) or (number in cache):
            return True 
        cache.add(number)
            
    return False    

array = [1, 4, 13, -3, -10, 5]
print(prefix_sum(array))
print(" ")
print(somewhatnaive_subarraysum_with_zero(array))

array = [-1, 4, -3, 5, 1]
print(somewhatnaive_subarraysum_with_zero(array))

array = [3, 1, -2, 5, 6]
print(somewhatnaive_subarraysum_with_zero(array))

array = [5, 6, 0, 8]
print(somewhatnaive_subarraysum_with_zero(array))


print(" ")


def better_subarray_sum(array):
    cache, currPrefix = set(), 0
    for number in array:
        currPrefix += number

        if currPrefix==0: return True
        if currPrefix in cache: return True

        cache.add(currPrefix)
        
    return False     

array = [1, 4, 13, -3, -10, 5]
print(better_subarray_sum(array))

array = [-1, 4, -3, 5, 1]
print(better_subarray_sum(array))

array = [3, 1, -2, 5, 6]
print(better_subarray_sum(array))

array = [5, 6, 0, 8]
print(better_subarray_sum(array))

[1, 5, 18, 15, 5, 10]
 
True
True
False
True
 
True
True
False
True


### Question - 7 (SubArray with givenSum):
- I/P: [5, 8, 6, 13, 3, -1], 22
- O/P: True


- I/P: [15, 2, 8, 10, -5, -8, -6], 3
- O/P: True


- I/P: [3, 2, 5, 6], 10
- O/P: True

In [7]:
def subarray_with_given_sum(array, givenSum):
    cache, currPrefix = set(), 0
    for number in array:
        currPrefix += number
        
        if currPrefix==givenSum: return True
        if currPrefix-givenSum in cache: return True 
        
        cache.add(currPrefix)
        
    return False    

array = [5, 8, 6, 13, 3, -1]
print(subarray_with_given_sum(array, 22))

array = [15, 2, 8, 10, -5, -8, 6]
print(subarray_with_given_sum(array, 3))

array = [3, 2, 5, 6]
print(subarray_with_given_sum(array, 10))

array = [1, 4, 13, -3, -10, 5]
print(subarray_with_given_sum(array, 0))

array = [-1, 4, -3, 5, 1]
print(subarray_with_given_sum(array, 0))

array = [3, 1, -2, 5, 6]
print(subarray_with_given_sum(array, 0))

array = [5, 6, 0, 8]
print(subarray_with_given_sum(array, 0))

True
True
True
True
True
False
True


### Question - 8 (Longest SubArray with givenSum):

Find the length of the longest subarray with given sum;

- I/P: [5, 8, -4, -4, 9, -2, 2], 0
- O/P: 3


- I/P: [3, 1, 0, 1, 8, 2, 3, 6], 5
- O/P: 4


- I/P: [8, 3, 7], 15
- O/P: 0

In [8]:
def naive_longest_subarray(array, givenSum):
    
    maxLen = float("-inf")
    for outerIndex in range(len(array)):
        currPrefix = 0
        for innerIndex in range(outerIndex, len(array)):
            currPrefix += array[innerIndex]
            if currPrefix==givenSum:
                maxLen = max(maxLen, innerIndex-outerIndex+1)
    return maxLen           

array, givenSum = [5, 8, -4, -4, 9, -2, 2], 0
print(naive_longest_subarray(array, givenSum))

array, givenSum = [3, 1, 0, 1, 8, 2, 3, 6], 5
print(naive_longest_subarray(array, givenSum))

array, givenSum = [8, 3, 7], 15
print(naive_longest_subarray(array, givenSum))

array, givenSum = [5, 2, 3], 5
print(naive_longest_subarray(array, givenSum))

array, givenSum = [5, -4, 4, -4, 4, 5], 0
print(naive_longest_subarray(array, givenSum))


print(" ")


def longest_subarray(array, givenSum):
    
    cache, currPrefix, maxLength = {}, 0, float("-inf")
    for index in range(len(array)):
        currPrefix += array[index]
        
        if currPrefix==givenSum: maxLength = max(maxLength, index+1)
            
        if currPrefix not in cache:
            cache[currPrefix] = index
            
        if currPrefix-givenSum in cache:
            maxLength = max(maxLength, index-cache[currPrefix-givenSum])
            
    if maxLength==float("-inf"):
        return 0
    else:
        return maxLength
        
array, givenSum = [5, 8, -4, -4, 9, -2, 2], 0
print(longest_subarray(array, givenSum))

array, givenSum = [3, 1, 0, 1, 8, 2, 3, 6], 5
print(longest_subarray(array, givenSum))

array, givenSum = [8, 3, 7], 15
print(longest_subarray(array, givenSum))

array, givenSum = [5, 2, 3], 5
print(longest_subarray(array, givenSum))

array, givenSum = [5, -4, 4, -4, 4, 5], 0
print(longest_subarray(array, givenSum))

3
4
-inf
2
4
 
3
4
0
2
4


### Question - 9 (Longest SubArray with equal no. of 0's and 1's):
Given a binary array, our task is to find out the  length of the longest subarray with equal number of 0's and 1's.

- I/P: [1, 0, 1, 1, 1, 0, 0]
- O/P: 6


- I/P: [1, 1, 1, 1]
- O/P: 0


- I/P: [0, 0, 1, 1, 1, 1, 1, 0]
- O/P: 4


- I/P: [0, 0, 1, 0, 1, 1]
- O/P: 6

In [9]:
def naive_longest_binary_subarray(array):
    maxValue = 0
    for outerIndex in range(len(array)-1):
        count = {0:0, 1:0}
        count[array[outerIndex]] += 1
        for innerIndex in range(outerIndex+1, len(array)):
            count[array[innerIndex]] += 1
            if count[0]==count[1]:
                maxValue = max(maxValue, 2*count[0])
    return maxValue 

array = [1, 0, 1, 1, 1, 0, 0]
print(naive_longest_binary_subarray(array))

array = [1, 1, 1, 1]
print(naive_longest_binary_subarray(array))

array = [0, 0, 1, 1, 1, 1, 1, 0]
print(naive_longest_binary_subarray(array))

array = [0, 0, 1, 0, 1, 1]
print(naive_longest_binary_subarray(array))


print(" ")


# idea here is to replace the zeros by -1 and find the max subarray 
# with given sum equal to zero!
def better_longest_binary_subarray(array):
    
    for index in range(len(array)):
        if array[index]==0:
            array[index] = -1
            
    cache, maxValue, currPrefix = {}, float("-inf"), 0
    for index in range(len(array)):
        currPrefix += array[index]

        if currPrefix==0:
            maxValue = max(maxValue, index + 1)

        # don't shorten the length if it already exists!    
        if currPrefix not in cache:
            cache[currPrefix] = index
            
        if currPrefix in cache:
            maxValue = max(maxValue, index - cache[currPrefix])
            
    return maxValue

array = [1, 0, 1, 1, 1, 0, 0]
print(better_longest_binary_subarray(array))

array = [1, 1, 1, 1]
print(better_longest_binary_subarray(array))

array = [0, 0, 1, 1, 1, 1, 1, 0]
print(better_longest_binary_subarray(array))

array = [0, 0, 1, 0, 1, 1]
print(better_longest_binary_subarray(array))

6
0
4
6
 
6
0
4
6


### Question - 10 (Longest Common Span with same Sum in Binary Array)!
We are given two binary subarrays of same sizes. Length of the longest common subarray!


- I/P: [0, 1, 0, 0, 0, 0], [1, 0, 1, 0, 0, 1]
- O/P: 4


- I/P: [0, 1, 0, 1, 1, 1, 1], [1, 1, 1, 1, 1, 0, 1]
- O/P: 6


- I/P: [0, 0, 0], [1, 1, 1]
- O/P: 0


- I/P: [0, 0, 1, 0], [1, 1, 1, 1]
- O/P: 1

In [10]:
def naive_longest_common(array1, array2):
    
    maxLen = float("-inf")
    for outerIndex in range(len(array1)):
        currPrefix1, currPrefix2 = 0, 0
        for innerIndex in range(outerIndex, len(array2)):
            currPrefix1 += array1[innerIndex]
            currPrefix2 += array2[innerIndex]
            if currPrefix1==currPrefix2:
                maxLen = max(maxLen, innerIndex - outerIndex + 1)
    return maxLen

array1, array2 = [0, 1, 0, 0, 0, 0], [1, 0, 1, 0, 0, 1]
print(naive_longest_common(array1, array2))

array1, array2 = [0, 1, 0, 1, 1, 1, 1], [1, 1, 1, 1, 1, 0, 1]
print(naive_longest_common(array1, array2))

array1, array2 = [0, 0, 0], [1, 1, 1]
print(naive_longest_common(array1, array2))

array1, array2 = [0, 0, 1, 0], [1, 1, 1, 1]
print(naive_longest_common(array1, array2))


print(" ")


# convert the problem to longest subarray with zero sum
def better_longest_common(array1, array2):
    
    newArray = [array1[index]-array2[index] for index in range(len(array1))]
    
    maxLen, cache = float("-inf"), {}
    currPrefix = 0
    for index in range(len(newArray)):
        currPrefix += newArray[index]
        
        if currPrefix==0: maxLen = max(maxLen, index+1)
            
        if currPrefix not in cache:
            cache[currPrefix] = index
        
        if currPrefix in cache:
            maxLen = max(maxLen, index-cache[currPrefix])
            
    return maxLen    
        
array1, array2 = [0, 1, 0, 0, 0, 0], [1, 0, 1, 0, 0, 1]
print(better_longest_common(array1, array2))

array1, array2 = [0, 1, 0, 1, 1, 1, 1], [1, 1, 1, 1, 1, 0, 1]
print(better_longest_common(array1, array2))

array1, array2 = [0, 0, 0], [1, 1, 1]
print(better_longest_common(array1, array2))

array1, array2 = [0, 0, 1, 0], [1, 1, 1, 1]
print(better_longest_common(array1, array2))

4
6
-inf
1
 
4
6
0
1


### Question - 11 (Longest Consecutive Subsequence)!
Given an array, we need to find the longest subsequence  that has consecutive elements. These consecutive elements may appear in any order in the subsequence.


- I/P: [1, 9, 3, 4, 2, 20]
- O/P: 4


- I/P:[8, 20, 7, 30]
- O/P: 2


- I/P: [20, 30, 40]
- O/P: 1

In [11]:
def naive_longest_cons_subseq(array):
    
    array.sort()
    count, maxLen = 1, float("-inf")
    for index in range(len(array)-1):
        if array[index+1]==1+array[index]:
            count += 1
        else:
            count = 1
        maxLen = max(maxLen, count)
    return maxLen        

array = [1, 9, 3, 4, 2, 20]
print(naive_longest_cons_subseq(array))   

array = [8, 20, 7, 30]
print(naive_longest_cons_subseq(array))  

array = [20, 30, 40]
print(naive_longest_cons_subseq(array))  

print(" ")

def better_longest_cons_subseq(array):
    pass

4
2
1
 


### Question - 12 (Count Distinct Elements In Every Window)!
Given an array, we need to find the longest subsequence that has consecutive elements. These consecutive elements may appear in any order in the subsequence.

- I/P: [10, 20, 20, 10, 30, 40, 10], 4
- O/P: 2, 3, 4, 3


- I/P: [10, 10, 10, 10], 3
- O/P: 1, 1


- I/P: [10, 20, 30, 40], 3
- O/P: 3, 3

In [12]:
def distinct_elements(array, k):
    
    hashmap, answer = {}, []
    left, right = 0, 0
    
    while right<len(array):
        
        if array[right] not in hashmap:
            hashmap[array[right]] = 0 
        hashmap[array[right]] += 1
        
        if right-left+1<k:
            right += 1
        elif right-left+1==k:
            answer.append(len(hashmap))
            
            hashmap[array[left]] -= 1
            if hashmap[array[left]] == 0:
                del hashmap[array[left]]
            
            left  += 1
            right += 1
            
    return answer       
            
array, k = [10, 20, 20, 10, 30, 40, 10], 4
print(distinct_elements(array, k))

array, k = [10, 10, 10, 10], 3
print(distinct_elements(array, k))

array, k = [10, 20, 30, 40], 3
print(distinct_elements(array, k))

[2, 3, 4, 3]
[1, 1]
[3, 3]


### Question - 13 (More than n/k occurences)!

- I/P: 
- O/P: