# 🔐 Hashing in Data Structures

📌 What is Hashing?

Hashing is a technique that maps data (keys) to a fixed-size index (hash code) using a hash function.
The data is then stored in a hash table, allowing O(1) average time complexity for:

🔍 Search

➕ Insert

➖ Delete

📦 Why Use Hashing?

Efficient lookup and modification

Avoids linear or binary search overhead

Excellent for large datasets and real-time lookups

🧮 Hash Table = Array + Hash Function

In [1]:
# Example using Python dictionary (built-in hash table)

hash_table = {}
hash_table["apple"] = 10
print(hash_table["apple"])  # Output: 10


10


# 🧠 Applications of Hashing
Checking for duplicates

Frequency counting

Caching (LRU Cache)

Dictionaries and sets

Cryptography (not in basic DSA)

Implementing map, set, unordered_map (C++), dict (Python)



# 🧰 Types of Hashing Collision Resolution Techniques
1. Separate Chaining

Uses linked lists at each index to handle collisions

Multiple keys can map to the same bucket



In [2]:
# Conceptual structure in Python:
hash_table = [[] for _ in range(10)]
index = hash("key") % 10
hash_table[index].append(("key", "value"))


# 2. Open Addressing
If a collision occurs, find another open slot using probing

# Types:

Linear Probing

Quadratic Probing

Double Hashing






In [3]:
# 🔧 Custom Hash Function (Trivial Example)

def simple_hash(key, table_size):
    return sum(ord(c) for c in key) % table_size

⚠️ Limitations of Hashing

Hash collisions (multiple keys → same index)

Needs good hash function for uniform distribution

Doesn’t maintain order (unlike TreeMap in Java)

Hashing is not suitable for range queries (use BST or Segment Tree instead)

In [4]:
# Frequency count
from collections import Counter
arr = [1, 2, 2, 3, 1, 2]
freq = Counter(arr)
print(freq)  # Output: {2: 3, 1: 2, 3: 1}


Counter({2: 3, 1: 2, 3: 1})


In [5]:
# Frequency count
from collections import Counter
arr = [1, 2, 2, 3, 1, 2]
freq = Counter(arr)
print(freq)  # Output: {2: 3, 1: 2, 3: 1}


Counter({2: 3, 1: 2, 3: 1})


#  1️⃣ Hash Function

📌 Definition: A function that maps input data of arbitrary size (like a string) to a fixed-size integer (usually an index in a hash table).

💡 Example:

hash("Apple") → 18

hash("Appmillers") → 22

✅ Used to determine the position where the data (key) will be stored in the hash table.

# 2️⃣ Key

📌 Definition: The input data provided by the user to the hash function.

🧠 Examples:

"Apple", "Appmillers" are keys in our previous lecture.

In dict = {"apple": 10}, "apple" is the key.

# 3️⃣ Hash Value (or Hash Code)

📌 Definition: The integer returned by the hash function after processing the key.

🧠 Also called:

Hash Code

Digest

Hash

💡 Example:

hash("Application") → 20

# 4️⃣ Hash Table

📌 Definition: A data structure that stores data in key-value pairs using the index returned by the hash function.

🧠 Implemented as:

Array or list

Backed by hash function for index mapping

💡 In Python, a dict is a built-in implementation of a hash table.

my_dict = {"apple": 10}

# internally uses hashing to store and retrieve

# 5️⃣ Collision

📌 Definition: A situation where two different keys generate the same hash value (i.e., the same index in the hash table).

💥 Example:

hash("Appmillers") → 20

hash("AnotherKey") → 20

🔁 Both try to occupy index 20 → this causes a collision.

⚠️ In this case, one key cannot be inserted directly without handling the collision.




# 🔑 What is a Hash Function?
A hash function converts input data (like numbers or strings) into a fixed-size index value, typically used to store or retrieve data in a hash table efficiently.

# 🧮 Two Sample Hash Functions:

1. Mod Hash Function (For Integers):

Input: integer, number of cells (size of hash table)

Formula: index = number % number_of_cells

Example:

400 % 24 = 16 → Store 400 at index 16

700 % 24 = 4 → Store 700 at index 4

Benefit: Simple and fast.

Time Complexity to access: O(1)

2. ASCII-Based Hash Function (For Strings):

Input: string, number of cells

Steps:

Convert each character to its ASCII value using Python’s ord() function.

Sum up all ASCII values.

Use mod function to find index: sum % number_of_cells

Example:

"ABC" → ASCII: 65 + 66 + 67 = 198

198 % 24 = 6 → Store "ABC" at index 6

# ✅ Characteristics of a Good Hash Function:

Uniform Distribution:

Hash values should spread data evenly across all indices.

Prevents clustering (many items going to same index → collisions).

Uses All Input Data:

Should consider all characters in the input, not just a subset.

Example: If only first 3 characters are used, "HELLO123" and "HELLO999" may hash to the same value.

Result: More collisions → Bad performance.

# 🚫 What is a Collision?
A collision occurs when two different inputs map to the same index in the hash table.

Example: Both "CAT" and "CAR" return index 20 → collision occurs.


# 🧠 Scenario Setup:
Suppose we have a hash table of size 7 (indexes: 0 to 6)

# simple hash function:

hash(key) = key % 7

# 🎯 Keys to Insert: 10, 17, 24
Now let’s apply each collision resolution technique

# 1. ✅ Direct Chaining (with Linked List)


hash(10) = 10 % 7 = 3   → insert 10 at index 3

hash(17) = 17 % 7 = 3   → collision! insert 17 in linked list at index 3

hash(24) = 24 % 7 = 3   → collision! insert 24 in linked list at index 3


🧾 Table:

Index | Values
--------------
  3   | 10 → 17 → 24

# 📌 Example:

Keys: [10, 20, 30], Size: 10

Hash Function: key % 10

10 % 10 = 0 → Bucket 0: [10]

20 % 10 = 0 → Bucket 0: [10, 20]

30 % 10 = 0 → Bucket 0: [10, 20, 30]

In [9]:
# Direct Chaining using dictionary of lists
size = 10
hash_table = {i: [] for i in range(size)}
keys = [10, 20, 30]

for key in keys:
    index = key % size
    hash_table[index].append(key)

# Print hash table
print("Index | Chain")
print("-------------")
for i in range(size):
    if hash_table[i]:
        print(f"{i:>5} | {hash_table[i]}")


Index | Chain
-------------
    0 | [10, 20, 30]


# 2. ✅ Linear Probing

hash(10) = 3 → index 3 empty → insert 10

hash(17) = 3 → collision! → try 4 → empty → insert 17

hash(24) = 3 → collision! → 4 taken → try 5 → empty → insert 24

🧾 Table:

Index | Values
--------------
  3   | 10
  4   | 17
  5   | 24


# 📌 Example:

Keys: [10, 20, 30], Size: 10

Hash Function: key % 10

10 % 10 = 0 → index 0: [10]

20 % 10 = 0 → collision → check 1 → [20]

30 % 10 = 0 → collision → check 1 (filled) → check 2 → [30]

In [10]:
# Linear Probing
size = 10
hash_table = [None] * size
keys = [10, 20, 30]

for key in keys:
    index = key % size
    while hash_table[index] is not None:
        index = (index + 1) % size
    hash_table[index] = key

# Print hash table
print("Index | Value")
print("--------------")
for i in range(size):
    if hash_table[i] is not None:
        print(f"{i:>5} | {hash_table[i]}")


Index | Value
--------------
    0 | 10
    1 | 20
    2 | 30


# 3. ✅ . Quadratic Probing
📌 Idea: Jab collision hota hai, toh agla index hum square jumps se dhoondhte hain:

Jump by: 1², 2², 3², ...

Formula:

index = (hash(key) + i²) % table_size

🔍 Example:

Table size: 7, keys: [10, 17, 24]

Hash function: key % 7

10 % 7 = 3 → Put at index 3

17 % 7 = 3 → Collision at 3 → try (3 + 1²) % 7 = 4 → OK → Put at 4

24 % 7 = 3 → Collision at 3 → try 4 (collision) → (3 + 2²) % 7 = 7 → 0 → Put at 0

👉 Final Table:

Index | Values
--------------
  0   | 24
  3   | 10
  4   | 17

In [7]:
# Quadratic Probing
size = 7
hash_table = [None] * size
keys = [10, 17, 24]

for key in keys:
    index = key % size
    i = 1
    while hash_table[index] is not None:
        index = (key + i * i) % size
        i += 1
    hash_table[index] = key

# Print table
print("Index | Values")
print("--------------")
for i in range(size):
    if hash_table[i] is not None:
        print(f"  {i}   | {hash_table[i]}")


Index | Values
--------------
  0   | 24
  3   | 10
  4   | 17


# 4. ✅ Double Hashing

📌 Idea: Collision ke time ek second hash function se step size nikaalte hain.

Formula:

h1 = key % size

h2 = prime - (key % prime)   ← second hash

index = (h1 + i*h2) % size

🔍 Example:

Keys: [10, 17, 24], Size: 7, Prime: 5

10 % 7 = 3 → OK → Put at 3

17 % 7 = 3 → Collision → h2 = 5 - (17 % 5) = 3 → (3 + 1×3) % 7 = 6 → Put at 6

24 % 7 = 3 → Collision → h2 = 5 - (24 % 5) = 1 → (3 + 1×1) % 7 = 4 → Put at 4

👉 Final Table:

Index | Values
--------------
  3   | 10
  4   | 24
  6   | 17

In [8]:
# Double Hashing
size = 7
prime = 5
hash_table = [None] * size
keys = [10, 17, 24]

for key in keys:
    h1 = key % size
    h2 = prime - (key % prime)
    i = 0
    index = h1
    while hash_table[index] is not None:
        i += 1
        index = (h1 + i * h2) % size
    hash_table[index] = key

# Print table
print("Index | Values")
print("--------------")
for i in range(size):
    if hash_table[i] is not None:
        print(f"  {i}   | {hash_table[i]}")


Index | Values
--------------
  3   | 10
  4   | 24
  6   | 17
