# Hash Tables (Dictionaries) - Complete Interview Guide

## Master Hash Tables for Technical Interviews

Hash Tables (called dictionaries in Python) are one of the **most versatile** data structures. They enable O(1) average-case lookups and are used in countless interview problems.

### What You'll Learn
1. Hash table fundamentals and Python dictionaries
2. Hash functions and collision handling
3. Common operations and their complexities
4. Hash map patterns for interviews
5. Common interview problems using hash tables
6. Time/space complexity analysis
7. Interview tips and tricks

---


In [None]:
from typing import List, Dict, Set, Tuple
from collections import defaultdict, Counter, OrderedDict
import time

print("Hash Tables (Dictionaries) - Complete Interview Guide")
print("=" * 60)
print("\nHash tables are your best friend in coding interviews!")
print("Master these patterns to solve problems efficiently.\n")


## 1. Hash Table Fundamentals

### What is a Hash Table?

A **hash table** (dictionary in Python) stores key-value pairs using a hash function to map keys to array indices. This enables O(1) average-case access.

### Key Components:

1. **Hash Function**: Maps keys to array indices
2. **Buckets/Array**: Stores key-value pairs
3. **Collision Handling**: Resolves when multiple keys hash to same index

### Python Dictionary Operations Complexity

| Operation | Time Complexity | Notes |
|-----------|----------------|-------|
| Access (dict[key]) | O(1) average | Hash lookup |
| Insert/Update | O(1) average | Hash and store |
| Delete | O(1) average | Hash and remove |
| Search (key in dict) | O(1) average | Hash lookup |
| Iterate | O(n) | Visit all items |

**Note**: Worst case is O(n) if all keys hash to same bucket, but this is rare.


In [None]:
# Python dictionary basics
hash_map = {}

# Insert/Update - O(1)
hash_map['apple'] = 5
hash_map['banana'] = 3
hash_map['cherry'] = 8
print(f"After inserts: {hash_map}")

# Access - O(1)
print(f"\nValue for 'apple': {hash_map['apple']}")

# Check if key exists - O(1)
print(f"'banana' in dict: {'banana' in hash_map}")
print(f"'orange' in dict: {'orange' in hash_map}")

# Update existing key - O(1)
hash_map['apple'] = 10
print(f"\nAfter update: {hash_map}")

# Delete - O(1)
del hash_map['banana']
print(f"After delete: {hash_map}")

# Get with default value
print(f"\nGet 'orange' with default 0: {hash_map.get('orange', 0)}")

# Iterate
print("\nAll items:")
for key, value in hash_map.items():
    print(f"  {key}: {value}")

# Useful: defaultdict (auto-creates missing keys)
dd = defaultdict(int)
dd['a'] += 1  # No KeyError!
dd['b'] += 2
print(f"\nDefaultDict: {dict(dd)}")

# Useful: Counter (count occurrences)
counts = Counter(['apple', 'banana', 'apple', 'cherry', 'banana', 'apple'])
print(f"Counter: {counts}")
print(f"Most common: {counts.most_common(2)}")


## 2. Common Hash Table Patterns

### Pattern 1: Frequency Counting

One of the most common uses: count occurrences of elements.

**Use Cases:**
- Count character frequencies
- Count element occurrences in arrays
- Group similar items


In [None]:
def count_frequency(arr: List) -> Dict:
    """Count frequency of each element."""
    freq = {}
    for item in arr:
        freq[item] = freq.get(item, 0) + 1
    return freq

# Using Counter (Pythonic way)
def count_frequency_counter(arr: List) -> Counter:
    """Count frequency using Counter."""
    return Counter(arr)

# Test
arr = [1, 2, 3, 2, 1, 1, 4, 2, 2]
freq1 = count_frequency(arr)
freq2 = count_frequency_counter(arr)

print(f"Array: {arr}")
print(f"Frequency (manual): {freq1}")
print(f"Frequency (Counter): {dict(freq2)}")
print(f"\nMost common: {freq2.most_common(2)}")

# Character frequency
text = "hello world"
char_freq = Counter(text)
print(f"\nCharacter frequency in '{text}':")
print(f"  {dict(char_freq)}")
print(f"  Most common: {char_freq.most_common(3)}")


### Pattern 2: Two Sum (Hash Table Solution)

**Problem**: Find two numbers that add up to target (unsorted array).

**LeetCode**: [Two Sum](https://leetcode.com/problems/two-sum/)


In [None]:
def two_sum(nums: List[int], target: int) -> List[int]:
    """
    Two Sum using hash table.
    Time: O(n), Space: O(n)
    """
    seen = {}  # value -> index
    
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
    
    return []

# Test
nums = [2, 7, 11, 15]
target = 9
result = two_sum(nums, target)
print(f"Array: {nums}, Target: {target}")
print(f"Indices: {result}")
print(f"Values: {nums[result[0]]}, {nums[result[1]]}")
print(f"\nTime: O(n) - single pass")
print(f"Space: O(n) - hash table storage")

print("\nKey insight:")
print("- Store seen values as we iterate")
print("- Check if complement exists before storing current")


### Pattern 3: Group Anagrams

**Problem**: Group strings that are anagrams of each other.

**LeetCode**: [Group Anagrams](https://leetcode.com/problems/group-anagrams/)


In [None]:
def group_anagrams(strs: List[str]) -> List[List[str]]:
    """
    Group anagrams together.
    Time: O(n * k log k) where k = max string length
    Space: O(n * k)
    """
    groups = defaultdict(list)
    
    for s in strs:
        # Use sorted string as key
        key = ''.join(sorted(s))
        groups[key].append(s)
    
    return list(groups.values())

# Test
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
result = group_anagrams(words)
print(f"Input: {words}")
print(f"\nGrouped Anagrams:")
for i, group in enumerate(result, 1):
    print(f"  Group {i}: {group}")

print("\nKey insight:")
print("- Use sorted string as hash key")
print("- All anagrams have the same sorted representation")


## 3. Advanced Hash Table Patterns

### Pattern 4: First Unique Character

**Problem**: Find first non-repeating character in string.

**LeetCode**: [First Unique Character](https://leetcode.com/problems/first-unique-character-in-a-string/)


In [None]:
def first_uniq_char(s: str) -> int:
    """
    Find index of first unique character.
    Time: O(n), Space: O(1) - at most 26 characters
    """
    # Count frequency
    count = Counter(s)
    
    # Find first character with count == 1
    for i, char in enumerate(s):
        if count[char] == 1:
            return i
    
    return -1

# Test
test_cases = ["leetcode", "loveleetcode", "aabb"]
for s in test_cases:
    result = first_uniq_char(s)
    char = s[result] if result != -1 else "None"
    print(f"'{s}' -> Index: {result}, Char: '{char}'")


### Pattern 5: Contains Duplicate

**Problem**: Check if array contains duplicates.

**LeetCode**: [Contains Duplicate](https://leetcode.com/problems/contains-duplicate/)


In [None]:
def contains_duplicate(nums: List[int]) -> bool:
    """
    Check for duplicates using hash set.
    Time: O(n), Space: O(n)
    """
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

# Alternative: Compare lengths
def contains_duplicate_set(nums: List[int]) -> bool:
    """Using set length comparison."""
    return len(nums) != len(set(nums))

# Test
test_cases = [
    [1, 2, 3, 1],
    [1, 2, 3, 4],
    [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]
]

for nums in test_cases:
    result = contains_duplicate(nums)
    print(f"{nums} -> Has duplicates: {result}")


## 4. Common Interview Problems

### Problem 1: Longest Consecutive Sequence

**LeetCode**: [Longest Consecutive Sequence](https://leetcode.com/problems/longest-consecutive-sequence/)


In [None]:
def longest_consecutive(nums: List[int]) -> int:
    """
    Find length of longest consecutive sequence.
    Time: O(n), Space: O(n)
    """
    if not nums:
        return 0
    
    num_set = set(nums)
    max_length = 0
    
    for num in num_set:
        # Only start counting if this is the start of a sequence
        if num - 1 not in num_set:
            current_num = num
            current_length = 1
            
            # Count consecutive numbers
            while current_num + 1 in num_set:
                current_num += 1
                current_length += 1
            
            max_length = max(max_length, current_length)
    
    return max_length

# Test
nums = [100, 4, 200, 1, 3, 2]
result = longest_consecutive(nums)
print(f"Array: {nums}")
print(f"Longest consecutive sequence: {result}")
print(f"Sequence: [1, 2, 3, 4]")

print("\nKey insight:")
print("- Use set for O(1) lookups")
print("- Only start counting from sequence start (num-1 not in set)")


### Problem 2: Top K Frequent Elements

**LeetCode**: [Top K Frequent Elements](https://leetcode.com/problems/top-k-frequent-elements/)


In [None]:
def top_k_frequent(nums: List[int], k: int) -> List[int]:
    """
    Find k most frequent elements.
    Time: O(n log n) with sorting, O(n) with heap
    Space: O(n)
    """
    # Count frequencies
    count = Counter(nums)
    
    # Sort by frequency and get top k
    return [num for num, freq in count.most_common(k)]

# Using bucket sort for O(n) time
def top_k_frequent_bucket(nums: List[int], k: int) -> List[int]:
    """O(n) solution using bucket sort."""
    count = Counter(nums)
    n = len(nums)
    
    # Buckets: index = frequency, value = list of numbers
    buckets = [[] for _ in range(n + 1)]
    for num, freq in count.items():
        buckets[freq].append(num)
    
    # Collect top k from buckets
    result = []
    for i in range(n, 0, -1):
        result.extend(buckets[i])
        if len(result) >= k:
            break
    
    return result[:k]

# Test
nums = [1, 1, 1, 2, 2, 3]
k = 2
result1 = top_k_frequent(nums, k)
result2 = top_k_frequent_bucket(nums, k)
print(f"Array: {nums}")
print(f"Top {k} frequent (Counter): {result1}")
print(f"Top {k} frequent (Bucket): {result2}")


## 5. Python Collections Module

### Useful Hash Table Variants

1. **defaultdict**: Auto-creates missing keys
2. **Counter**: Specialized for counting
3. **OrderedDict**: Maintains insertion order
4. **set**: Hash set (no duplicates)


In [None]:
# defaultdict - auto-creates missing keys
dd_int = defaultdict(int)
dd_int['a'] += 1  # No KeyError!
print(f"defaultdict(int): {dict(dd_int)}")

dd_list = defaultdict(list)
dd_list['fruits'].append('apple')
dd_list['fruits'].append('banana')
print(f"defaultdict(list): {dict(dd_list)}")

# Counter - counting made easy
counter = Counter(['a', 'b', 'c', 'a', 'b', 'a'])
print(f"\nCounter: {dict(counter)}")
print(f"Most common 2: {counter.most_common(2)}")

# Set - hash set (unique elements)
my_set = {1, 2, 3, 2, 1}
print(f"\nSet (unique elements): {my_set}")
print(f"Union: {my_set | {3, 4, 5}}")
print(f"Intersection: {my_set & {2, 3, 4}}")

# OrderedDict - maintains order (Python 3.7+ dicts already do this)
od = OrderedDict()
od['first'] = 1
od['second'] = 2
od['third'] = 3
print(f"\nOrderedDict: {dict(od)}")


## 6. Interview Tips & Strategies

### When to Use Hash Tables

‚úÖ **Use hash tables when:**
- Need O(1) lookups
- Counting frequencies
- Grouping/partitioning data
- Finding duplicates
- Mapping relationships
- Caching/memoization

‚ùå **Don't use when:**
- Need sorted data (use BST)
- Memory is extremely limited
- Need range queries (use trees)

### Common Patterns

| Pattern | Solution |
|---------|----------|
| Frequency counting | Counter or dict |
| Two Sum | Store complements |
| Group by key | defaultdict |
| Remove duplicates | set |
| Cache/memoization | dict |


## 7. Summary & Key Takeaways

### Essential Concepts:

1. **Hash Tables**:
   - O(1) average-case operations
   - Key-value storage
   - Fast lookups, inserts, deletes

2. **Python Dictionaries**:
   - Built-in hash table implementation
   - Use Counter for frequency
   - Use defaultdict to avoid KeyError
   - Use set for unique elements

3. **Common Patterns**:
   - Frequency counting
   - Two Sum problems
   - Grouping/partitioning
   - Duplicate detection

### Time/Space Complexity:

- **Operations**: O(1) average, O(n) worst case
- **Space**: O(n) to store n key-value pairs

### Interview Checklist:

- [ ] Can use Counter for frequency problems
- [ ] Can solve Two Sum with hash table
- [ ] Know when to use set vs dict
- [ ] Understand defaultdict usage
- [ ] Can optimize problems with hash tables

### Practice Problems:

**Easy:**
- Two Sum, Contains Duplicate, First Unique Character

**Medium:**
- Group Anagrams, Top K Frequent, Longest Consecutive

**Hard:**
- Substring with Concatenation, Design HashMap

---

**Resources:**
- LeetCode Hash Table Tag
- "Cracking the Coding Interview" - Hash Tables Chapter

---

**Hash tables are powerful! Use them wisely. üöÄ**
