# Hash Tables, Hash Functions, Hash Sets, and Hash Maps

## Hash Tables

A **hash table** (or hash map) is a data structure that implements an associative array—a structure that maps keys to values. It uses a **hash function** to compute an index into an array of buckets or slots, from which the desired value can be found.

### Key Characteristics:
- **Average Time Complexity**: O(1) for search, insert, and delete operations
- **Worst Case Time Complexity**: O(n) when hash collisions occur
- **Space Complexity**: O(n)

## Hash Functions

A **hash function** is an algorithm that maps input data (keys) of arbitrary size to fixed-size values (hash codes). A good hash function should:

- **Deterministic**: Same input always produces the same output
- **Uniform Distribution**: Hash values should be evenly distributed across the range
- **Efficient**: Quick to compute
- **Minimize Collisions**: Reduce instances where different keys produce the same hash value
- **Avalanche Effect**: Small changes in input produce drastically different hash values

### Common Hash Function Methods:
- Division Method: `hash(key) = key mod m`
- Multiplication Method: `hash(key) = floor(m * (A*key mod 1))`
- Folding Method: Divide key into parts and combine them
- Mid-Square Method: Square the key and extract middle digits

## Hash Collisions and Resolution

When two different keys produce the same hash value, a collision occurs. Common resolution techniques:

- **Chaining**: Store colliding elements in a linked list at that index
- **Open Addressing**: Find another empty slot (Linear Probing, Quadratic Probing, Double Hashing)

## Hash Sets

A **hash set** is a data structure that stores unique values without duplicates. It uses a hash table internally.

### Characteristics:
- No duplicate elements
- No guaranteed order of elements
- Average O(1) time for add, remove, and lookup operations
- Useful for membership testing and removing duplicates

### Common Operations:
- `add(element)`: Add an element to the set
- `remove(element)`: Remove an element
- `contains(element)`: Check if element exists
- `size()`: Return the number of elements

## Hash Maps

A **hash map** (or hash dictionary) is a data structure that stores key-value pairs. It uses a hash function to map keys to their corresponding values.

### Characteristics:
- Each key is unique and maps to exactly one value
- Average O(1) time for insertion, deletion, and lookup
- No guaranteed order (though some implementations maintain insertion order)
- More flexible than hash sets—stores associated data with keys

### Common Operations:
- `put(key, value)`: Insert or update a key-value pair
- `get(key)`: Retrieve the value associated with a key
- `remove(key)`: Delete a key-value pair
- `containsKey(key)`: Check if a key exists
- `size()`: Return the number of key-value pairs

## Comparison Table

| Feature | Hash Table | Hash Set | Hash Map |
|---------|-----------|----------|----------|
| **Stores** | Key-value pairs | Unique values only | Key-value pairs |
| **Duplicates** | Keys unique, values can repeat | No duplicates | Keys unique |
| **Lookup** | O(1) average | O(1) average | O(1) average |
| **Use Case** | General key-value storage | Membership testing | Storing related data pairs |

## Advantages and Disadvantages

### Advantages:
- Fast average-case operations (O(1))
- Efficient for large datasets
- Flexible key-value associations

### Disadvantages:
- Poor worst-case performance (O(n)) with many collisions
- Memory overhead due to empty slots
- No ordering guarantees (in most implementations)
- Hash function quality is critical

## Hashsets

In [1]:
s = set()
print("Initial set:", s)

Initial set: set()


In [None]:
# Add items to the set - O(1)
s.add(1)
s.add(2)
s.add(3)
print("Set after adding elements:", s)

Set after adding elements: {1, 2, 3}


In [4]:
# Lookup items in the set - O(1)
if 2 in s:
    print(True)
else:
    print(False)

True


In [5]:
# Remove an item from the set - O(1)
s.remove(2)
print("Set after removing element 2:", s)

Set after removing element 2: {1, 3}


In [14]:
string = 'aaaaaabbbbbbbcccccccccccccccccccccccee'
char_set = set(string) # Set creation O(S) - S is the size of the string
char_set

{'a', 'b', 'c', 'e'}

## Hashmaps

basically dictionaries

In [18]:
d = {'a': 6, 'b': 7, 'c': 21, 'e': 2}
d

{'a': 6, 'b': 7, 'c': 21, 'e': 2}

In [19]:
# Add an item to the dictionary - O(1)
d['d'] = 4

In [20]:
# Check for existence of a key - O(1)
if 'c' in d:
    print(True)

True


In [21]:
# Check the value associated with a key - O(1)
value_c = d['c']
value_c

21

In [22]:
# Defaultdict
from collections import defaultdict
dd = defaultdict(int)
dd

defaultdict(int, {})

In [23]:
# Counter
from collections import Counter
cnt = Counter(string)
cnt

Counter({'c': 23, 'b': 7, 'a': 6, 'e': 2})