# 12 - Hashing

Welcome to the twelfth notebook in our `dsa-in-python` series! In this notebook, we will learn about:

- What is Hashing?
- Hash Functions
- Collision Handling
- Python Dictionary Internals
- Applications of Hashing

Let's dive into this powerful concept!

## What is Hashing?

**Hashing** is a technique used to map data of arbitrary size to fixed-size values, usually for fast access, insertion, and deletion.

A **hash function** takes an input and returns an integer called the **hash code** or **hash value**.

## Properties of a Good Hash Function

- **Deterministic**: Same input always gives the same output.
- **Uniform distribution**: Spread keys uniformly to avoid clustering.
- **Efficient**: Should compute quickly.
- **Minimize collisions**: Different inputs should have different outputs whenever possible.

## Collision Handling Techniques

Sometimes different inputs produce the same hash value (called a **collision**). Strategies to handle this:

- **Chaining**: Maintain a linked list of elements at each bucket.
- **Open Addressing**: Find another empty bucket (e.g., linear probing, quadratic probing).

## Hashing in Python

Python’s `dict` and `set` are implemented using **hash tables** internally.

- The key’s hash value is computed.
- The hash is used to find the correct index in the internal array.
- Handles collisions using **open addressing** (with probing).

In [3]:
# Simple demonstration
my_dict = {'apple': 1, 'banana': 2, 'cherry': 3}

print(my_dict['apple'])  # Access is O(1) on average

# Under the hood, Python computes hash('apple') to find location.

1


## Example: Custom Hash Table Implementation

In [1]:
class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        idx = self.hash_function(key)
        for pair in self.table[idx]:
            if pair[0] == key:
                pair[1] = value
                return
        self.table[idx].append([key, value])

    def search(self, key):
        idx = self.hash_function(key)
        for pair in self.table[idx]:
            if pair[0] == key:
                return pair[1]
        return None

    def delete(self, key):
        idx = self.hash_function(key)
        for i, pair in enumerate(self.table[idx]):
            if pair[0] == key:
                self.table[idx].pop(i)
                return True
        return False

In [2]:
# Example usage
h = HashTable()
h.insert('apple', 10)
h.insert('banana', 20)

print(h.search('apple'))  # Output: 10
h.delete('apple')
print(h.search('apple'))  # Output: None

10
None


## Applications of Hashing

- Database indexing
- Caching (memoization)
- Cryptography (SHA-256, MD5)
- Load balancing
- Data deduplication

## Summary

- Hashing provides **fast data retrieval**.
- Python dictionaries and sets use hash tables.
- Collisions are handled using chaining or open addressing.

Next up: **13 - Bit Manipulation**! ⚡️ Let's go!