
# CSCI 3143 - Lab 8: Hashing

**Name:** ______  

## Learning goals
- Define *hash table*, *hash function*, *bucket/table size (m)*, *load factor (α)*, and *collision*.
- Compute and interpret load factor and relate it to performance.
- Practice basic hashing with modulo arithmetic and simple separate chaining.
- Perform quick self-checks to confirm understanding.

**Reading:** Miller & Ranum (Runestone) §§ 5.5



## Key definitions (quick read, ~3 min)
- **Hash table:** An array-backed structure that stores `(key → value)` pairs using a hash function to choose an index.
- **Hash function `h(key)`:** Maps a key to an integer, often reduced modulo table size `m` to get an index `i = h(key) mod m`.
- **Table size `m`:** Number of buckets (slots). In Python dicts, this is managed internally; in our demo, we'll choose it.
- **Collision:** When two different keys hash to the same index.
- **Separate chaining:** Each bucket holds a small list of `(key, value)` pairs; collisions append to the list.
- **Open addressing (FYI):** Collisions are resolved by probing alternative indices (linear/quadratic probing, double hashing).
- **Load factor `α` (alpha):** `α = n / m` where `n` = number of stored keys, `m` = number of buckets.  
  - With **chaining**, expected chain length ≈ `α` (on average).  
  - With **open addressing**, high `α` (close to 1) sharply increases probe lengths.

## Key definitions

- **Hash table:** An array-backed structure that stores `(key → value)` pairs using a hash function to choose an index.
- **Hash function `h(key)`:** Maps a key to an integer, often reduced modulo table size `m` to get an index `i = h(key) mod m`.
- **Table size `m`:** Number of buckets (slots). In Python dicts, this is managed internally; in our demo, we'll choose it.
- **Collision:** When two different keys hash to the same index.
- **Linear Probing:** 
- **Separate chaining:** Each bucket holds a small list of `(key, value)` pairs; collisions append to the list.
- **Open addressing:** Collisions are resolved by probing alternative indices (linear/quadratic probing, double hashing).
- **Load factor `α` (alpha):** `α = n / m` where `n` = number of stored keys, `m` = number of buckets.  
  - With **chaining**, expected chain length ≈ `α` (on average).  
  - With **open addressing**, high `α` (close to 1) sharply increases probe lengths
.

## Hash Functions 

**Task 1:** Make a hash function that takes 10-digit phone number, folds, into groups of two, adds groups, then mod by has table size, $k$.

In [29]:
# Example input and processing
number = "436-555-4601"
num_list = [int(i) for i in number if i.isdigit()]
num_groups = [num_list[i : i + 2] for i in range(0, len(num_list), 2)]
print(num_groups)

[[4, 3], [6, 5], [5, 5], [4, 6], [0, 1]]


In [None]:
numbers = list(map(lambda i: 10 * i[0] + i[1], num_groups))  # type: ignore
sum(numbers) % 11

1

In [None]:
def folding_hash(s: str, k: int) -> int:
    num_list = [int(c) for c in s if c.isdigit()]
    num_groups = [num_list[i : i + 2] for i in range(0, len(num_list), 2)]
    nums = list(map(lambda i: 10 * i[0] + i[1], num_groups))
    return sum(nums) % k


number = "436-555-4601"
folding_hash(number, 11)  # should return 1

1

**Task 2:** Modify your function above to take any length of digits, fold into groups of $m$, then mod by hash table size, $k$.

In [None]:
# Example input and processing
m = 1
number = "436-555-4601"
num_list = [i for i in number if i.isdigit()]
num_groups = [num_list[i : i + m] for i in range(0, len(num_list), m)]
num_list = list(map(lambda i: int("".join(i)), num_groups))
print(num_list)
print(sum(num_list) % 8)

[4, 3, 6, 5, 5, 5, 4, 6, 0, 1]
7


In [None]:
def folding_hash2(s: str, m: int, k: int) -> int:
    pass


number = "436-555-4601"
folding_hash2(number, 1, 11)  # should return 1

6

## Collisions
Below is a minimal educational implementation supporting `put`, `get`, and `__len__`.  

**Purpose**: see how collisions land in the same bucket and how `α` grows with inserts.

In [24]:
from typing import Any, List, Tuple, Optional


class HashTableChaining:
    def __init__(self, m: int = 8):
        assert m > 0
        self._m = m
        self._buckets: List[List[Tuple[Any, Any]]] = [[] for _ in range(m)]
        self._n = 0  # number of keys

    @property  # allows methods to act like attributes - keeps attribute-style syntax clean
    def m(self) -> int:
        return self._m

    @property
    def n(self) -> int:
        return self._n

    @property
    def load_factor(self) -> float:
        return self._n / self._m

    def _index(self, key: Any) -> int:
        return hash(key) % self._m

    def put(self, key: Any, value: Any) -> None:
        i = self._index(key)
        bucket = self._buckets[i]
        for idx, (k, _) in enumerate(bucket):
            if k == key:
                bucket[idx] = (key, value)  # update value
                return
        bucket.append((key, value))
        self._n += 1

    def get(self, key: Any) -> Optional[Any]:
        i = self._index(key)
        for k, v in self._buckets[i]:
            if k == key:
                return v
        return None

    def __len__(self) -> int:
        return self._n

    def bucket_lengths(self) -> List[int]:
        return [len(b) for b in self._buckets]

    def __repr__(self) -> str:
        return (
            f"HashTableChaining(m={self._m}, n={self._n}, alpha={self.load_factor:.3f})"
        )

**Demo**: Insert a small set of keys and inspect bucket lengths and `α`.

In [36]:
ht = HashTableChaining(m=8)
for k in ["art", "math", "cs", "bio", "chem", "music", "history", "physics"]:
    ht.put(k, k.upper())
ht, ht.bucket_lengths(), {"α": ht.load_factor}

(HashTableChaining(m=8, n=8, alpha=1.000),
 [3, 2, 0, 0, 1, 1, 1, 0],
 {'α': 1.0})

**Task 3:** Try changing to $m$ to 16, 32, and 4 and reinserting the same keys. How do chain lengths change? Why?

In [26]:
ht = HashTableChaining(m=4)
for k in ["art", "math", "cs", "bio", "chem", "music", "history", "physics"]:
    ht.put(k, k.upper())
ht, ht.bucket_lengths(), {"α": ht.load_factor}

(HashTableChaining(m=4, n=8, alpha=2.000), [4, 3, 1, 0], {'α': 2.0})

## Chain Lengths

**Task 4.** Use our `HashTableChaining` class to write a function that inserts `n` integer keys `0..n-1` into an empty table of size `m`, then returns the maximum chain length observed.  

In [27]:
import random

random.seed(0)

ht = HashTableChaining(10)
keys = list(range(20))
random.shuffle(keys)  # randomize insertion order
for k in keys:
    ht.put(k, k)

list(ht.bucket_lengths())

[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

In [None]:
import random


def max_chain_after_inserts(n: int, m: int, seed: int = 0) -> int:
    pass

In [None]:
max_chain_after_inserts(20, 10)

2

**Task 5.** For fixed `n = 200`, try `m ∈ {50, 100, 200, 400}`. Record `α` and the max chain length. What trend do you see?

**Task 6:** Repeat Tasks 4-5 with randomly sampling `n` integers from a list of `0,...,n-1` and inserting into an empty table of size `m`. What do you observe?

In [None]:
import random

keys = list(range(20))
print(keys)
n = 20
keys = random.choices(range(n), k=n)
print(sorted(keys))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[1, 1, 1, 2, 2, 2, 3, 3, 9, 10, 12, 12, 12, 13, 14, 14, 16, 16, 16, 19]


## Summary Questions

1) Define **load factor** in your own words and explain why it matters.  
2) With chaining, if `m` doubles while `n` stays fixed, what happens to `α` and expected chain length?  
3) Suppose you want to keep `α ≤ 0.75`. If you currently have `m = 80` and `n = 60`, you need to resize before inserting 10 more keys? Explain.

---

## Self‑Assessment
Please mark one option by editing the brackets to `[x]`:

- [ ] **10** – I completed all of this work on my own (learning from in‑class ideas/approaches).
- [ ] **8** – I completed most on my own, with some out‑of‑class help (peers/online).
- [ ] **6** – I needed significant help (peers/online/AI) to complete parts.
- [ ] **4** – I mostly copied code from others/AI and **do not** fully understand it.
- [ ] **2** – I copied almost everything without attempting to understand it.
