# Hashing  
A hash table is a data structure used to store keys, optionally, with corresponding values. Inserts, deletes, and lookups run in $O(1)$ time on average.    
   
The underlying idea is to store keys in an array. A key is stored in the array locations ("slots") based on its "hash code". The hash code is an integer computed from the key by a hash function. If the hash function is chosen well, the objects are distributed uniformly across array locations.    
   
If two keys map to the same location, a "collision" has occurred. the standard mechansim to deal with collisions is to maintain a linked list of objects at each array location. If the hash function does a good job of spreading objects across the underlying array and takes $O(1)$ time to compute, on average, lokkups, insertions, and deletions have $O(1 + n/m)$ time complexity, where $n$ is the number of objects and $m$ is the length of the array.   
   
If the "load" $n/m$ grows large, rehashing can be applied, but is expensive $O(n + m)$    


In [9]:
from collections import Counter, defaultdict, namedtuple
import functools
from typing import DefaultDict, List

from utils import run_tests

## Tips
- Hash tables have the **best theoretical and real-world performance** for lookup, insert and delete. Each of these operations has $O(1)$ time complexity. The $O(1)$ time complexity for insertions is for the average case - a single insert can take $O(n)$ if the hash table has to be resized.  
- Consider using a hash code as a **signature** to enhance performance, e.g., to filter out candidates.  
- Consider using a precomputed lookup table instead of boilerplate if-then code for mappings, e.g., from character to value or character to character.
- When defining your own type that will be put in a hash table, be sure you understand the relationship between **logical equality** and the fields the hash function must inspect. Specifically, anytime equality is implemented, it is imperative that the correct hash function is also implemented, o/w when objects are placed in hash tables, logically equivalent objects may appear in different buckets, leading to lookups returning false, even when the searched item is present.
- Somtimes you'll need a **multimap**, i.e., a map that contains multiple values for a single key, or a bi-directional map. 

## Libraries

### String Hash Function

In [7]:
def string_hash(s: str, modulus: int) -> int:
    mult = 997
    return functools.reduce(lambda v, c: (v * mult + ord(c)) & modulus, s, 0)

print(string_hash('cat', 10))
print(string_hash('cats', 10))

2
8


### Finding Anagrams
An anagram is a word formed by rearranging the letters of another word   
Give a set of words, return groups of anagrams of these words   

In [11]:
def find_anagrams(words: List[str]) -> List[List[str]]:
    ''' 
    key idea is to map a strings to a representative
    the representative can be the sorted version of the string since 
    anagrams will have the same sorted representation
    '''
    sorted_string_to_anagram: DefaultDict[str, List[str]] = defaultdict(list)

    for w in words:
        w_sorted = ''.join(sorted(w))       # sorted returns a character array
        sorted_string_to_anagram[w_sorted].append(w)

    return [
        group for group in sorted_string_to_anagram.values() if len(group) >= 2
    ]

words = ['debitcard', 'elvis', 'silent', 'badcredit', 'lives', 'freedom', 'listen', 'levis', 'money']
find_anagrams(words)

[['debitcard', 'badcredit'], ['elvis', 'lives', 'levis'], ['silent', 'listen']]

In [10]:
w = sorted('cat')
w

['a', 'c', 't']