In [None]:
### Chapter 10 Data Strucutures and Algorithms 
#### Donovan Manogue

**Table**: Map ADT behaviors and their definitions (consistent with Python’s `dict` class).

| Operation | Description |
|-----------|-------------|
| `M[k]` | Return the value `v` associated with key `k` in map `M`. Raises `KeyError` if `k` not found. Implemented with `__getitem__`. |
| `M[k] = v` | Associate value `v` with key `k`, replacing the existing value if `k` already exists. Implemented with `__setitem__`. |
| `del M[k]` | Remove the item with key `k`. Raises `KeyError` if `k` not found. Implemented with `__delitem__`. |
| `len(M)` | Return the number of items in map `M`. Implemented with `__len__`. |
| `iter(M)` | Iterate over keys in map `M`. Implemented with `__iter__`. Allows `for k in M:` loops. |
| `k in M` | Return `True` if key `k` exists in map `M`. Implemented with `__contains__`. |
| `M.get(k, d=None)` | Return `M[k]` if `k` exists; otherwise return default `d`. Avoids `KeyError`. |
| `M.setdefault(k, d)` | Return `M[k]` if `k` exists; otherwise set `M[k] = d` and return `d`. |
| `M.pop(k, d=None)` | Remove and return the value for `k`. If `k` missing, return default `d` or raise `KeyError` if `d` is `None`. |
| `M.popitem()` | Remove and return an arbitrary `(k, v)` pair. Raises `KeyError` if empty. |
| `M.clear()` | Remove all key-value pairs from the map. |
| `M.keys()` | Return a set-like view of all keys in the map. |
| `M.values()` | Return a set-like view of all values in the map. |
| `M.items()` | Return a set-like view of all `(k, v)` pairs in the map. |
| `M.update(M2)` | Assign `M[k] = v` for every `(k, v)` pair in `M2`. |
| `M == M2` | Return `True` if maps `M` and `M2` have identical key-value pairs. |
| `M != M2` | Return `True` if maps `M` and `M2` differ in any key-value pair. |


**Table**: Example Map Operations and Results

| Operation | Return Value | Map State |
|-----------|--------------|-----------|
| `len(M)` | `0` | `{}` |
| `M['K'] = 2` | – | `{'K': 2}` |
| `M['B'] = 4` | – | `{'K': 2, 'B': 4}` |
| `M['U'] = 2` | – | `{'K': 2, 'B': 4, 'U': 2}` |
| `M['V'] = 8` | – | `{'K': 2, 'B': 4, 'U': 2, 'V': 8}` |
| `M['K'] = 9` | – | `{'K': 9, 'B': 4, 'U': 2, 'V': 8}` |
| `M['B']` | `4` | `{'K': 9, 'B': 4, 'U': 2, 'V': 8}` |
| `M['X']` | `KeyError` | `{'K': 9, 'B': 4, 'U': 2, 'V': 8}` |
| `M.get('F')` | `None` | `{'K': 9, 'B': 4, 'U': 2, 'V': 8}` |
| `M.get('F', 5)` | `5` | `{'K': 9, 'B': 4, 'U': 2, 'V': 8}` |
| `M.get('K', 5)` | `9` | `{'K': 9, 'B': 4, 'U': 2, 'V': 8}` |
| `len(M)` | `4` | `{'K': 9, 'B': 4, 'U': 2, 'V': 8}` |
| `del M['V']` | – | `{'K': 9, 'B': 4, 'U': 2}` |
| `M.pop('K')` | `9` | `{'B': 4, 'U': 2}` |
| `M.keys()` | `B, U` | `{'B': 4, 'U': 2}` |
| `M.values()` | `4, 2` | `{'B': 4, 'U': 2}` |
| `M.items()` | `(B, 4), (U, 2)` | `{'B': 4, 'U': 2}` |
| `M.setdefault('B', 1)` | `4` | `{'B': 4, 'U': 2}` |
| `M.setdefault('A', 1)` | `1` | `{'A': 1, 'B': 4, 'U': 2}` |
| `M.popitem()` | `(B, 4)` | `{'A': 1, 'U': 2}` |


In [3]:
# Create an empty frequency map
freq = {}

# Read file, convert to lowercase, split into pieces
for piece in open(filename).read().lower().split():
    # Only consider alphabetic characters within this piece
    word = ''.join(c for c in piece if c.isalpha())
    
    # Require at least one alphabetic character
    if word:
        freq[word] = 1 + freq.get(word, 0)

# Initialize tracking variables for the most frequent word
maxword = ""
maxcount = 0

# Examine the frequency map to find the most frequent word
for (w, c) in freq.items():  # (key, value) tuples represent (word, count)
    if c > maxcount:
        maxword = w
        maxcount = c

# Output results
print("The most frequent word is", maxword)
print("Its number of occurrences is", maxcount)


NameError: name 'filename' is not defined

In [None]:
Code Fragment 10.2:
Extending the MutableMapping abstract base class to provide a non-public Item class for use in various map implementations.

In [None]:
from collections.abc import MutableMapping

class MapBase(MutableMapping):
    """Our own abstract base class that includes a non-public Item class."""

    # ------------------------------- nested Item class -------------------------------
    class _Item:
        """Lightweight composite to store key-value pairs as map items."""
        __slots__ = '_key', '_value'

        def __init__(self, k, v):
            self._key = k
            self._value = v

        def __eq__(self, other):
            return self._key == other._key  # compare items based on their keys

        def __ne__(self, other):
            return not (self == other)  # opposite of __eq__

        def __lt__(self, other):
            return self._key < other._key  # compare items based on their keys


In [None]:
Code Fragment 10.3:
An implementation of a map using a Python list as an unordered table.
The parent class MapBase is given in Code Fragment 10.2.

In [None]:
class UnsortedTableMap(MapBase):
    """Map implementation using an unordered list."""

    def __init__(self):
        """Create an empty map."""
        self._table = []  # list of _Item's

    def __getitem__(self, k):
        """Return value associated with key k (raise KeyError if not found)."""
        for item in self._table:
            if k == item._key:
                return item._value
        raise KeyError("Key Error: " + repr(k))

    def __setitem__(self, k, v):
        """Assign value v to key k, overwriting existing value if present."""
        for item in self._table:
            if k == item._key:  # Found a match
                item._value = v  # reassign value
                return           # and quit
        # did not find match for key
        self._table.append(self._Item(k, v))

    def __delitem__(self, k):
        """Remove item associated with key k (raise KeyError if not found)."""
        for j in range(len(self._table)):
            if k == self._table[j]._key:  # Found a match
                self._table.pop(j)        # remove item
                return                    # and quit
        raise KeyError("Key Error: " + repr(k))

    def __len__(self):
        """Return number of items in the map."""
        return len(self._table)

    def __iter__(self):
        """Generate iteration of the map’s keys."""
        for item in self._table:
            yield item._key  # yield the KEY


In [None]:
Table 10.1: Comparison of collision behavior for the cyclic-shift hash code applied to a list of 230,000 English words.
The “Total” column records the total number of words that collide with at least one other,
and the “Max” column records the maximum number of words colliding at any one hash code.
A cyclic shift of 0 reverts to a simple sum of all characters.

| Shift | Total Collisions | Max Collisions |
| ----- | ---------------- | -------------- |
| 0     | 234,735          | 623            |
| 1     | 165,076          | 43             |
| 2     | 38,471           | 13             |
| 3     | 7,174            | 5              |
| 4     | 1,379            | 3              |
| 5     | 190              | 3              |
| 6     | 502              | 2              |
| 7     | 560              | 2              |
| 8     | 5,546            | 4              |
| 9     | 393              | 3              |
| 10    | 5,194            | 5              |
| 11    | 11,559           | 5              |
| 12    | 822              | 2              |
| 13    | 900              | 4              |
| 14    | 2,001            | 4              |
| 15    | 19,251           | 8              |
| 16    | 211,781          | 37             |


In [None]:
Table 10.2: Comparison of the running times of the methods of a map realized by means of an unsorted list or a hash table.
We let n denote the number of items in the map, and we assume that the bucket array for the hash table is maintained so that its capacity is proportional to n.

| Operation | Unsorted List | Hash Table (Expected) | Hash Table (Worst Case) |
| --------- | ------------- | --------------------- | ----------------------- |
| `getitem` | O(n)          | O(1)                  | O(n)                    |
| `setitem` | O(n)          | O(1)                  | O(n)                    |
| `delitem` | O(n)          | O(1)                  | O(n)                    |
| `len`     | O(1)          | O(1)                  | O(1)                    |
| `iter`    | O(n)          | O(n)                  | O(n)                    |


In [None]:
HashMapBase Design Summary
The HashMapBase class serves as an abstract base for hash table map implementations.
It provides the common behaviors, leaving collision resolution strategies (e.g., separate chaining or open addressing) to subclasses.

Key features:

Bucket array stored in self.table (Python list), initialized with None.

self.n tracks the number of key-value pairs.

Load factor threshold: If n > capacity / 2, the table resizes to about double the capacity.

Hash function: Uses Python’s built-in hash() with the Multiply-Add-and-Divide (MAD) compression method.

Abstract methods to be implemented by subclasses:

bucket_getitem(j, k)

bucket_setitem(j, k, v)

bucket_delitem(j, k)

__iter__() for iterating keys

In [None]:
Code Fragment 10.4:
A base class for our hash table implementations, extending MapBase (from Code Fragment 10.2).

In [None]:
from random import randrange

class HashMapBase(MapBase):
    """Abstract base class for map using hash-table with MAD compression."""

    def __init__(self, cap=11, p=109345121):
        """Create an empty hash-table map."""
        self.table = [None] * cap       # bucket array
        self.n = 0                      # number of entries in the map
        self.prime = p                  # prime for MAD compression
        self.scale = 1 + randrange(p - 1)  # scale from 1 to p-1
        self.shift = randrange(p)          # shift from 0 to p-1

    def hash_function(self, k):
        """Return hash value for key k."""
        return (hash(k) * self.scale + self.shift) % self.prime % len(self.table)

    def __len__(self):
        """Return number of items in the map."""
        return self.n

    def __getitem__(self, k):
        """Retrieve value for key k (raise KeyError if not found)."""
        j = self.hash_function(k)
        return self.bucket_getitem(j, k)

    def __setitem__(self, k, v):
        """Insert or update key-value pair."""
        j = self.hash_function(k)
        self.bucket_setitem(j, k, v)
        if self.n > len(self.table) // 2:  # maintain load factor <= 0.5
            self.resize(2 * len(self.table) - 1)  # prefer capacity of form 2^x - 1

    def __delitem__(self, k):
        """Remove key k from the map (raise KeyError if not found)."""
        j = self.hash_function(k)
        self.bucket_delitem(j, k)
        self.n -= 1

    def resize(self, c):
        """Resize bucket array to capacity c and rehash all items."""
        old_items = list(self.items())  # copy existing items
        self.table = [None] * c
        self.n = 0
        for (k, v) in old_items:
            self[k] = v  # reinsert into new table


In [None]:
Code Fragment 10.5:
Concrete hash map class with separate chaining for collision resolution.

In [None]:


class ChainHashMap(HashMapBase):
    """Hash map implemented with separate chaining for collisions."""

    # ------------------------- per-bucket operations -------------------------

    def bucket_getitem(self, j, k):
        bucket = self.table[j]
        if bucket is None:
            raise KeyError("Key Error: " + repr(k))  # no match found
        return bucket[k]  # may raise KeyError from the bucket map

    def bucket_setitem(self, j, k, v):
        if self.table[j] is None:
            self.table[j] = UnsortedTableMap()  # create new bucket
        old_size = len(self.table[j])
        self.table[j][k] = v
        if len(self.table[j]) > old_size:       # key was new to this bucket
            self.n += 1                         # increase overall map size

    def bucket_delitem(self, j, k):
        bucket = self.table[j]
        if bucket is None:
            raise KeyError("Key Error: " + repr(k))  # no match found
        del bucket[k]  # may raise KeyError from the bucket map

    # ------------------------------ iteration -------------------------------

    def __iter__(self):
        for bucket in self.table:
            if bucket is not None:              # a nonempty bucket
                for key in bucket:
                    yield key


In [None]:
Code Fragment 10.6: Concrete ProbeHashMap class that uses linear probing for collision resolution.
(Continued in Code Fragment 10.7)

In [None]:

class ProbeHashMap(HashMapBase):
    """Hash map implemented with linear probing for collision resolution."""

    AVAIL = object()  # sentinel marks locations of previous deletions

    # ---------------------------- utilities ----------------------------

    def is_available(self, j):
        """Return True if index j is available in table (either empty or formerly used)."""
        return self.table[j] is None or self.table[j] is ProbeHashMap.AVAIL

    def find_slot(self, j, k):
        """Search for key k starting at index j using linear probing.

        Return (success, index) where:
          - If a match is found, success is True and index is the location.
          - If no match is found, success is False and index is the first available slot.
        """
        first_avail = None
        while True:
            if self.is_available(j):
                if first_avail is None:
                    first_avail = j  # mark first available slot
                if self.table[j] is None:
                    # Reached an empty slot: search has failed; return first available
                    return (False, first_avail)
            elif k == self.table[j]._key:
                return (True, j)  # found a match
            j = (j + 1) % len(self.table)  # continue probing cyclically


In [None]:
Code Fragment 10.7:
Concrete ProbeHashMap class that uses linear probing for collision resolution
(continued from Code Fragment 10.6).

In [None]:

class ProbeHashMap(HashMapBase):
    """Hash map implemented with linear probing for collision resolution."""

    AVAIL = object()  # sentinel to mark a vacated slot

    # ---------------------------- utilities (from 10.6) ----------------------------
    def is_available(self, j):
        """Return True if index j is available (empty or formerly used)."""
        return self.table[j] is None or self.table[j] is ProbeHashMap.AVAIL

    def find_slot(self, j, k):
        """Search for key k starting at index j using linear probing.

        Return (found, s) where:
          - found == True and s is the index with matching key
          - found == False and s is the first available slot
        """
        first_avail = None
        while True:
            if self.is_available(j):
                if first_avail is None:
                    first_avail = j
                if self.table[j] is None:            # empty slot: search fails
                    return (False, first_avail)
            elif k == self.table[j]._key:            # found a match
                return (True, j)
            j = (j + 1) % len(self.table)            # probe cyclically

    # ---------------------------- per-bucket ops ----------------------------

    def bucket_getitem(self, j, k):
        found, s = self.find_slot(j, k)
        if not found:
            raise KeyError("Key Error: " + repr(k))
        return self.table[s]._value

    def bucket_setitem(self, j, k, v):
        found, s = self.find_slot(j, k)
        if not found:                                 # insert new item
            self.table[s] = self._Item(k, v)
            self.n += 1
        else:                                         # overwrite existing
            self.table[s]._value = v

    def bucket_delitem(self, j, k):
        found, s = self.find_slot(j, k)
        if not found:
            raise KeyError("Key Error: " + repr(k))
        self.table[s] = ProbeHashMap.AVAIL            # mark as vacated

    # ---------------------------- iteration ----------------------------

    def __iter__(self):
        for j in range(len(self.table)):              # scan entire table
            if not self.is_available(j):
                yield self.table[j]._key


In [None]:
 In this section, we introduce an extension known as the sorted map ADT that
 includes all behaviors of the standard map, plus the following:

| Operation                   | Description |
|-----------------------------|-------------|
| `M.find_min()`              | Return the `(key, value)` pair with the **minimum key** (or `None` if the map is empty). |
| `M.find_max()`              | Return the `(key, value)` pair with the **maximum key** (or `None` if the map is empty). |
| `M.find_lt(k)`              | Return the `(key, value)` pair with the **greatest key strictly less than `k`** (or `None` if no such item exists). |
| `M.find_le(k)`              | Return the `(key, value)` pair with the **greatest key less than or equal to `k`** (or `None` if no such item exists). |
| `M.find_gt(k)`              | Return the `(key, value)` pair with the **least key strictly greater than `k`** (or `None` if no such item exists). |
| `M.find_ge(k)`              | Return the `(key, value)` pair with the **least key greater than or equal to `k`** (or `None` if no such item exists). |
| `M.find_range(start, stop)` | Iterate all `(key, value)` pairs with `start <= key < stop`. If `start` is `None`, iteration begins with the minimum key; if `stop` is `None`, iteration ends with the maximum key. |
| `iter(M)`                   | Iterate all keys of the map in **natural (ascending) order**. |
| `reversed(M)`               | Iterate all keys of the map in **reverse order**. |


In [None]:
Code Fragments 10.8–10.10:
SortedTableMap — a sorted map implemented with a sorted Python list and binary search.

In [None]:


class SortedTableMap(MapBase):
    """Map implementation using a sorted table (list of _Item), kept ordered by key."""

    # ----------------------------- nonpublic behaviors -----------------------------
    def _find_index(self, k, low, high):
        """Return index of leftmost item with key >= k within table[low:high+1].
        Return high+1 if no such item qualifies.

        That is, j will be returned such that:
          - all items in table[low : j]     have key < k
          - all items in table[j : high+1]  have key >= k
        """
        if high < low:
            return high + 1                       # no element qualifies
        mid = (low + high) // 2
        if k == self._table[mid]._key:
            return mid                            # found exact match
        elif k < self._table[mid]._key:
            return self._find_index(k, low, mid - 1)   # may return mid
        else:
            return self._find_index(k, mid + 1, high)  # answer is right of mid

    # ----------------------------- public behaviors -----------------------------
    def __init__(self):
        """Create an empty map."""
        self._table = []  # list of MapBase._Item, sorted by _key

    def __len__(self):
        """Return number of items in the map."""
        return len(self._table)

    def __getitem__(self, k):
        """Return value associated with key k (raise KeyError if not found)."""
        j = self._find_index(k, 0, len(self._table) - 1)
        if j == len(self._table) or self._table[j]._key != k:
            raise KeyError("Key Error: " + repr(k))
        return self._table[j]._value

    def __setitem__(self, k, v):
        """Assign value v to key k, overwriting existing value if present."""
        j = self._find_index(k, 0, len(self._table) - 1)
        if j < len(self._table) and self._table[j]._key == k:
            self._table[j]._value = v            # overwrite
        else:
            self._table.insert(j, self._Item(k, v))  # add new item at position j

    def __delitem__(self, k):
        """Remove item associated with key k (raise KeyError if not found)."""
        j = self._find_index(k, 0, len(self._table) - 1)
        if j == len(self._table) or self._table[j]._key != k:
            raise KeyError("Key Error: " + repr(k))
        self._table.pop(j)

    def __iter__(self):
        """Generate keys of the map ordered from minimum to maximum."""
        for


In [None]:
 Table 10.3: Performance of a sorted map, as implemented with SortedTableMap.
 We use n to denote the number of items in the map at the time the operation is
 performed. The space requirement is O(n).

| Operation                      | Running Time                                                |
| ------------------------------ | ----------------------------------------------------------- |
| `len(M)`                       | **O(1)**                                                    |
| `k in M`                       | **O(log n)**                                                |
| `M[k] = v`                     | **O(n)** worst case; **O(log n)** if `k` exists             |
| `del M[k]`                     | **O(n)** worst case                                         |
| `M.find_min()`, `M.find_max()` | **O(1)**                                                    |
| `M.find_lt(k)`, `M.find_gt(k)` | **O(log n)**                                                |
| `M.find_le(k)`, `M.find_ge(k)` | **O(log n)**                                                |
| `M.find_range(start, stop)`    | **O(s + log n)**, where `s` is the number of items reported |
| `iter(M)`, `reversed(M)`       | **O(n)**                                                    |


In [None]:
Code Fragment 10.11:
A class maintaining a set of maximal (cost, performance) pairs using a sorted map.

In [None]:


class CostPerformanceDatabase:
    """Maintain a database of maximal (cost, performance) pairs."""

    def __init__(self):
        """Create an empty database."""
        self.M = SortedTableMap()  # or any more efficient sorted map

    def best(self, c):
        """Return (cost, performance) pair with largest cost not exceeding c.

        Return None if there is no such pair.
        """
        return self.M.find_le(c)

    def add(self, c, p):
        """Add new entry with cost c and performance p, maintaining only non-dominated pairs."""
        # Is (c, p) dominated by an existing pair?
        other = self.M.find_le(c)                 # other is at least as cheap as c
        if other is not None and other[1] >= p:   # performance is as good or better
            return                                # (c, p) is dominated; ignore

        # Otherwise, add (c, p) to the database
        self.M[c] = p

        # Remove any pairs that are dominated by (c, p)
        other = self.M.find_gt(c)                 # strictly more expensive than c
        while other is not None and other[1] <= p:
            del self.M[other[0]]
            other = self.M.find_gt(c)


In [None]:
Code Fragment 10.12: Algorthm to search a skip list S for key k.

In [None]:
Algorithm SkipSearch(k):
    Input:  A search key k
    Output: Position p in the bottom list S0 with the largest key such that key(p) ≤ k

    p = start                    # begin at the start position (top-left of skip list)

    while below(p) ≠ None:        # while there is a lower level
        p = below(p)              # drop down one level
        while k ≥ key(next(p)):   # scan forward while next node's key ≤ k
            p = next(p)

    return p                      # position in S0 where key(p) ≤ k


In [None]:
Explanation:

start refers to the top-left sentinel of the skip list.

The outer loop moves vertically down the skip list level-by-level (below(p)), starting from the highest level until we reach the bottom list S0.

The inner loop moves horizontally (next(p)) as long as the next node's key is ≤ the search key k.

Once you reach the bottom list, you return p, the position with the largest key ≤ k.

In [None]:
 Code Fragment10.13: Insertion in a skip list. Method coinFlip() returns “heads” or
 “tails”, each with probability 1/2. Instance variables n, h,ands hold the number
 of entries, the height, and the start node of the skip list.

In [None]:
Algorithm SkipInsert(k, v):
    Input:  Key k and value v
    Output: Topmost position of the item inserted in the skip list

    p = SkipSearch(k)   # Find position where k should be inserted
    q = None            # q will be the top node in the new item's tower
    i = -1              # Level counter

    repeat:
        i = i + 1

        # If we need to add a new level to the skip list
        if i >= h:
            h = h + 1
            t = next(s)
            s = insertAfterAbove(None, s, (-∞, None))   # Grow leftmost tower
            insertAfterAbove(s, t, (+∞, None))          # Grow rightmost tower

        # Move left until there is a node above
        while above(p) is None:
            p = prev(p)

        # Move up one level
        p = above(p)

        # Insert new node above q in this level
        q = insertAfterAbove(p, q, (k, v))

    until coinFlip() == tails   # Continue with probability 0.5

    n = n + 1   # Increase number of elements
    return q


In [None]:
Explanation of Key Steps
SkipSearch(k): Finds where the new key k should go in the bottom list.

q: Tracks the most recently inserted node so that the next one is stacked above it (forming the “tower” of the skip list node).

h: Current height of the skip list.

insertAfterAbove(p, q, (k, v)): Inserts a new node after p on this level, directly above q from the level below.

Coin flip: Controls how tall the tower for (k, v) will be — with probability 0.5, we go up another level.



In [None]:
Table 10.4 — Performance of a Sorted Map Implemented with a Skip List

| Operation                   | Expected Running Time                                  |
| --------------------------- | ------------------------------------------------------ |
| `len(M)`                    | **O(1)**                                               |
| `k in M`                    | **O(log n)**                                           |
| `M[k] = v`                  | **O(log n)**                                           |
| `del M[k]`                  | **O(log n)**                                           |
| `M.find_min()`              | **O(1)**                                               |
| `M.find_max()`              | **O(1)**                                               |
| `M.find_lt(k)`              | **O(log n)**                                           |
| `M.find_gt(k)`              | **O(log n)**                                           |
| `M.find_le(k)`              | **O(log n)**                                           |
| `M.find_ge(k)`              | **O(log n)**                                           |
| `M.find_range(start, stop)` | **O(s + log n)**, where *s* = number of reported items |
| `iter(M)`                   | **O(n)**                                               |
| `reversed(M)`               | **O(n)**                                               |


In [None]:
Space Requirement:𝑂(𝑛)expected

In [None]:
Fundamental Behaviors of a Set S in Python

| Operation      | Description                                                             | Python Implementation         |
| -------------- | ----------------------------------------------------------------------- | ----------------------------- |
| `S.add(e)`     | Add element `e` to set `S`. Has no effect if `e` is already in `S`.     | `add` method                  |
| `S.discard(e)` | Remove element `e` from set `S` if present. No effect if `e` is absent. | `discard` method              |
| `e in S`       | Check if element `e` is in set `S`. Returns `True` or `False`.          | `__contains__` special method |
| `len(S)`       | Return the number of elements in set `S`.                               | `__len__` special method      |
| `iter(S)`      | Generate an iteration over all elements in set `S`.                     | `__iter__` special method     |


In [None]:
Notes:

Python provides two built-in set types:

set → mutable (backed by hash table)

frozenset → immutable (also hash table–backed)

The collections module provides abstract base classes:

collections.Set ≈ frozenset behavior (immutable)

collections.MutableSet ≈ set behavior (mutable)

In [None]:
Python Set ADT — Complete Behavior Reference

| **Category**                                | **Operation**     | **Description**                                                | **Python Implementation / Special Method** |           |
| ------------------------------------------- | ----------------- | -------------------------------------------------------------- | ------------------------------------------ | --------- |
| **Fundamentals**                            | `S.add(e)`        | Add element `e` to set `S` (no effect if present)              | `add`                                      |           |
|                                             | `S.discard(e)`    | Remove element `e` if present (no error if absent)             | `discard`                                  |           |
|                                             | `e in S`          | Return `True` if `e` in set `S`                                | `__contains__`                             |           |
|                                             | `len(S)`          | Return number of elements in set `S`                           | `__len__`                                  |           |
|                                             | `iter(S)`         | Iterate over all elements in set `S`                           | `__iter__`                                 |           |
| **Element Removal**                         | `S.remove(e)`     | Remove `e` from set; raise `KeyError` if not found             | `remove`                                   |           |
|                                             | `S.pop()`         | Remove and return arbitrary element; raise `KeyError` if empty | `pop`                                      |           |
|                                             | `S.clear()`       | Remove all elements from the set                               | `clear`                                    |           |
| **Set Comparisons**                         | `S == T`          | `True` if `S` and `T` have identical contents                  | `__eq__`                                   |           |
|                                             | `S != T`          | `True` if `S` and `T` differ                                   | `__ne__`                                   |           |
|                                             | `S <= T`          | `True` if `S` is subset of `T`                                 | `__le__`                                   |           |
|                                             | `S < T`           | `True` if `S` is proper subset of `T`                          | `__lt__`                                   |           |
|                                             | `S >= T`          | `True` if `S` is superset of `T`                               | `__ge__`                                   |           |
|                                             | `S > T`           | `True` if `S` is proper superset of `T`                        | `__gt__`                                   |           |
|                                             | `S.isdisjoint(T)` | `True` if `S` and `T` share no elements                        | `isdisjoint`                               |           |
| **Set Theory Operations (Return New Set)**  | \`S               | T\`                                                            | Union of `S` and `T`                       | `__or__`  |
|                                             | `S & T`           | Intersection of `S` and `T`                                    | `__and__`                                  |           |
|                                             | `S ^ T`           | Symmetric difference (elements in exactly one set)             | `__xor__`                                  |           |
|                                             | `S - T`           | Difference (elements in `S` not in `T`)                        | `__sub__`                                  |           |
| **Set Theory Operations (Update In Place)** | \`S               | = T\`                                                          | Update `S` to union with `T`               | `__ior__` |
|                                             | `S &= T`          | Update `S` to intersection with `T`                            | `__iand__`                                 |           |
|                                             | `S ^= T`          | Update `S` to symmetric difference with `T`                    | `__ixor__`                                 |           |
|                                             | `S -= T`          | Update `S` to remove all elements also in `T`                  | `__isub__`                                 |           |


In [None]:
Code Fragment 10.14: A possible implementation of the MutableSet. lt
 method, which tests if one set is a proper subset of another.

In [None]:
def __lt__(self, other):
    """Return True if this set is a proper subset of other."""
    
    # A proper subset must have strictly fewer elements
    if len(self) >= len(other):
        return False
    
    # Check that every element in self is also in other
    for e in self:
        if e not in other:
            return False  # Found an element missing in other
    
    return True  # All conditions for proper subset are met


In [None]:
 Code Fragment 10.15: An implementation of the MutableSet. or method,
 which computes the union of two existing sets.

In [None]:
def __or__(self, other):
    """Return a new set that is the union of two existing sets."""
    
    # Create a new instance of the same concrete class
    result = type(self)()
    
    # Add all elements from self
    for e in self:
        result.add(e)
    
    # Add all elements from other
    for e in other:
        result.add(e)
    
    return result


In [None]:
Code Fragment 10.16: An implementation of the MutableSet. ior method,
 which performs an in-place union of one set with another.

In [None]:
def __ior__(self, other):
    """Modify this set to be the union of itself and another set."""
    
    # Add all elements from other into self
    for e in other:
        self.add(e)
    
    # Return self to satisfy the in-place operator protocol
    return self


In [None]:
 Code Fragment10.17:An implementation of a Multi Map using adict for storage.
 The len method,which returns self. n,is omitted from this listing.

In [None]:
class MultiMap:
    """A multimap class built upon use of an underlying map for storage."""
    
    MapType = dict  # Map type; can be redefined by subclass

    def __init__(self):
        """Create a new empty multimap instance."""
        self.map = self.MapType()  # create map instance for storage
        self.n = 0

    def __iter__(self):
        """Iterate through all (k, v) pairs in the multimap."""
        for k, secondary in self.map.items():
            for v in secondary:
                yield (k, v)

    def add(self, k, v):
        """Add pair (k, v) to multimap."""
        container = self.map.setdefault(k, [])
        container.append(v)
        self.n += 1

    def pop(self, k):
        """Remove and return arbitrary (k, v) with key k (or raise KeyError)."""
        secondary = self.map[k]  # may raise KeyError
        v = secondary.pop()
        if len(secondary) == 0:
            del self.map[k]
        self.n -= 1
        return (k, v)

    def find(self, k):
        """Return arbitrary (k, v) pair with given key (or raise KeyError)."""
        secondary = self.map[k]  # may raise KeyError
        return (k, secondary[0])

    def find_all(self, k):
        """Generate iteration of all (k, v) pairs with given key."""
        secondary = self.map.get(k, [])  # empty list by default
        for v in secondary:
            yield (k, v)

    def __len__(self):
        """Return total number of (k, v) pairs in multimap."""
        return self.n


In [None]:
### Chapter 10 Maps, Hashtables, and skip lists exercises R10.1-R10.27

In [None]:
 R-10.1 Give a concrete implementation of the pop method in the context of the
 MutableMapping class, relying only on the five primary abstract methods
 of that class.


In [None]:
def pop(self, key, default=None):
    try:
        value = self[key]
        del self[key]
        return value
    except KeyError:
        if default is not None:
            return default
        else:
            raise


In [None]:
 R-10.2 Give a concrete implementation of the items() method in the context of
 the MutableMappingclass, relying only on the five primary abstract meth
ods of that class. What would its running time be if directly applied to the
 UnsortedTableMap subclass?
 

In [None]:
def items(self):
    for key in self:             # uses __iter__()
        value = self[key]        # uses __getitem__()
        yield (key, value)


This items() method loops through the keys using the __iter__() method (which is required for all MutableMapping subclasses). Then for each key, it retrieves the corresponding value using __getitem__(), and yields a (key, value) pair.

Even though Python's built-in dictionaries have a super-efficient items() method, here we're pretending we're in a generic abstract mapping class — so we can only use the five base methods.

We do not use any shortcuts like accessing an internal storage structure (self._table or self._data) because that wouldn’t be valid for all possible subclasses.

In UnsortedTableMap, here's what happens:

__iter__() is O(n) (it just yields each key from the list of items)

__getitem__(key) is O(n) in the worst case (since it linearly searches for the key)

So for each of the n keys, we could take up to O(n) time to get the value.

Thus, total time is:
O(n) * O(n) = O(n²)

This makes the items() method quadratic in the worst case if you apply it directly to UnsortedTableMap. Not ideal for performance, but still valid for correctness.

In [None]:
R-10.3 Give a concrete implementation of the items() method directly within the
 UnsortedTableMap class, ensuring that the entire iteration runs in O(n)
 time.


In [None]:
def items(self):
    for item in self._table:
        yield (item._key, item._value)


In [None]:
In UnsortedTableMap, we already store all our entries in a Python list called self._table. Each element in this list is a _Item object, which has .key and .value (or _key, _value) attributes.

This items() method just loops over each element of that list and yields the key-value pair.

There are n entries in the map.

We loop over them once.

We don’t do any searching inside the loop.

So the total time is O(n) — exactly what we want 

In [None]:
 R-10.4 What is the worst-case running time for inserting n key-value pairs into an
 initially empty map M that is implemented with the UnsortedTableMap
 class?


In [None]:
The worst-case running time to insert n key-value pairs into an initially empty UnsortedTableMap is O(n²), because each insertion may require a linear search through the list to check for duplicate keys.

In [None]:
 R-10.5 Reimplement the UnsortedTableMap class from Section 10.1.5, using the
 PositionalList class from Section 7.4 rather than a Python list.


In [None]:
#skipped since we don't go over chapter 7

In [None]:
 R-10.6 Which of the hash table collision-handling schemes could tolerate a load
 factor above 1 and which could not?


In [None]:
Separate Chaining could,
Linear Probing, Quadratic Probing, and Double Hashing, could not

In [None]:
 R-10.7 Our Position classes for lists and trees support the eq method so that
 twodistinct position instances are considered equivalent if they refer to the
 same underlying node in a structure. For positions to be allowed as keys
 in a hash table, there must be a definition for the hash method that
 is consistent with this notion of equivalence. Provide such a hash
 method.
 

In [None]:
def __hash__(self):
    """Hash based on the unique identifier of the node this position wraps."""
    return hash(id(self._node))


In [None]:
R-10.8 What would be a good hash code for a vehicle identification number that
 is a string of numbers and letters of the form “9X9XX99X9XX999999,”
 where a “9” represents a digit and an “X” represents a letter?

In [None]:
def hash_vin(vin, base=36, prime=109345121):
    hash_code = 0
    for char in vin:
        if char.isdigit():
            val = int(char)
        else:
            val = ord(char.upper()) - ord('A') + 10  # A-Z => 10-35
        hash_code = (hash_code * base + val) % prime
    return hash_code


In [None]:
R-10.9 Draw the 11-entry hash table that results from using the hash function,
 h(i)=(3i+5) mod 11, to hash the keys 12, 44, 13, 88, 23, 94, 11, 39, 20,
 16, and 5, assuming collisions are handled by chaining.


In [None]:
Used the hash function h(i) = (3i + 5) mod 11

Handled collisions with chaining

Produced a hash table with 11 slots, each slot potentially holding a linked list of entries

In [None]:
 R-10.10 What is the result of the previous exercise, assuming collisions are han
dled by linear probing?


| Index | Value |
| ----- | ----- |
| 0     | 13    |
| 1     | 94    |
| 2     | 39    |
| 3     | 16    |
| 4     | 5     |
| 5     | 44    |
| 6     | 88    |
| 7     | 11    |
| 8     | 12    |
| 9     | 23    |
| 10    | 20    |


In [None]:
 R-10.11 Show the result of Exercise R-10.9, assuming collisions are handled by
 quadratic probing, up to the point where the method fails.


| Index | Value |
| ----- | ----- |
| 0     | 13    |
| 1     | 94    |
| 2     | 39    |
| 3     | 20    |
| 4     | 23    |
| 5     | 44    |
| 6     | 16    |
| 7     | 5     |
| 8     | 12    |
| 9     | 88    |
| 10    | 11    |


In [None]:
 R-10.12 What is the result of Exercise R-10.9 when collisions are handled by dou
ble hashing using the secondary hash function h(k)=7−(k mod 7)?

| Index | Key |
| ----- | --- |
| 0     | 13  |
| 1     | 94  |
| 2     | 23  |
| 3     | 11  |
| 4     | 39  |
| 5     | 44  |
| 6     | 88  |
| 7     | —   |
| 8     | 12  |
| 9     | 16  |
| 10    | 20  |


In [None]:
 R-10.13 What is the worst-case time for putting n entries in an initially empty hash
 table, with collisions resolved by chaining? What is the best case?

| Case       | Time Complexity | Explanation                                                               |
| ---------- | --------------- | ------------------------------------------------------------------------- |
| Best Case  | O(n)            | All entries go into separate buckets — no collisions.                     |
| Worst Case | O(n²)           | All entries hash to the same bucket — long chain leads to quadratic time. |


In [None]:
 R-10.14 Show the result of rehashing the hash table shown in Figure 10.6 into a
 table of size 19 using the new hash function h(k)=3k mod 17.

In [None]:
I took the keys from the original table (Figure 10.6), and for each key h(k)=3kmod17. I then placed the keys into the new table of size 19 using these indices. For example, for key 5, h(5)=15, so it goes into index 15. I repeated this process for all the keys, ensuring they were rehashed using the new function.

In [None]:
R-10.15 Our HashMapBase class maintains a load factor λ ≤ 0.5. Reimplement
 that class to allow the user to specify the maximum load, and adjust the
 concrete subclasses accordingly.

In [None]:
class HashMapBase(MutableMapping):
    def __init__(self, cap=11, p=109345121, max_load=0.5):
        self._table = [None] * cap
        self._n = 0
        self._prime = p
        self._scale = 1 + randrange(p - 1)
        self._shift = randrange(p)
        self._max_load = max_load  # <-- new attribute
    def _is_full(self):
        return self._n > self._max_load * len(self._table)
    def __setitem__(self, k, v):
        super().__setitem__(k, v)
        if self._is_full():
            self._resize(2 * len(self._table) - 1)


In [None]:
 R-10.16 Give a pseudo-code description of an insertion into a hash table that uses
 quadratic probing to resolve collisions, assuming we also use the trick of
 replacing deleted entries with a special “deactivated entry” object.

In [None]:
Algorithm Insert(key, value):

1. index ← hash(key) mod table_size
2. i ← 0

3. While True:
    a. pos ← (index + i^2) mod table_size

    b. If table[pos] is EMPTY or DEACTIVATED:
        table[pos] ← (key, value)
        return

    c. Else if table[pos].key == key:
        table[pos].value ← value  // update existing key
        return

    d. i ← i + 1

    e. If i == table_size:
        raise Exception("Hash table is full")


In [None]:
 R-10.17 Modify our ProbeHashMap to use quadratic probing.

In [None]:
from .hash_map_base import HashMapBase

class QuadraticProbeHashMap(HashMapBase):
    
    # sentinel for deleted slots
    _AVAIL = object()

    def _is_available(self, j):
        """Return True if index j is available in table."""
        return self._table[j] is None or self._table[j] is QuadraticProbeHashMap._AVAIL

    def _find_slot(self, j, k):
        """Search for key k in bucket at index j using quadratic probing.

        Return (success, index) tuple, where success is True if found, False otherwise.
        """
        first_avail = None
        i = 0
        while True:
            pos = (j + i*i) % len(self._table)
            if self._is_available(pos):
                if first_avail is None:
                    first_avail = pos
                if self._table[pos] is None:
                    return (False, first_avail)
            elif self._table[pos]._key == k:
                return (True, pos)
            i += 1
            if i == len(self._table):
                raise RuntimeError("HashMap is full")

    def __getitem__(self, k):
        j = self._hash_function(k)
        found, s = self._find_slot(j, k)
        if not found:
            raise KeyError('Key Error: ' + repr(k))
        return self._table[s]._value

    def __setitem__(self, k, v):
        j = self._hash_function(k)
        found, s = self._find_slot(j, k)
        if found:
            self._table[s]._value = v
        else:
            self._table[s] = self._Item(k, v)
            self._n += 1
        if self._n > len(self._table) // 2:
            self._resize(2 * len(self._table) - 1)

    def __delitem__(self, k):
        j = self._hash_function(k)
        found, s = self._find_slot(j, k)
        if not found:
            raise KeyError('Key Error: ' + repr(k))
        self._table[s] = QuadraticProbeHashMap._AVAIL
        self._n -= 1


In [None]:
R-10.18 Explain why a hash table is not suited to implement a sorted map.


In [None]:
 I’d say that a hash table isn’t really a good choice for implementing a sorted map because hash tables don’t keep any order of the keys. They’re great when you want to quickly add, get, or remove things by key, but they don’t store the keys in any kind of sorted structure. So, if you wanted to do something like get the smallest or largest key, or go through the keys in order, a hash table would make that really hard or slow. You’d have to manually sort all the keys every time, which isn’t efficient. Instead, if you need to keep things sorted, it’s better to use something like a balanced binary search tree or another structure that keeps keys in order as you go. Hash tables are super useful, just not for sorted data.

In [None]:
 R-10.19 Describe how a sorted list implemented as a doubly linked list could be
 used to implement the sorted map ADT.


In [None]:
I’d say you could use a sorted list implemented as a doubly linked list to build a sorted map by keeping key-value pairs in order by their keys. Since it’s a doubly linked list, you can go forward and backward, which helps with operations like finding the next or previous key. When inserting a new key-value pair, you'd walk through the list to find the correct position to keep everything sorted. That takes O(n) time in the worst case, but once it’s in, everything stays in order. Getting, setting, or deleting a key also takes O(n) time because you might have to search linearly. It’s not the most efficient way for large maps, but it works well if you don’t have a ton of elements and still want to keep them sorted without using a tree structure.

In [None]:
 R-10.20 Whatisthe worst-case asymptotic running time for performing n deletions
 from a SortedTableMap instance that initially contains 2n entries?


In [None]:
The worst-case asymptotic running time is O(n²).

In [None]:
 R-10.21 Consider the following variant of the find index method from Code Frag
ment 10.8, in the context of the SortedTableMap class:
 def find index(self,k, low,high):
 if high < low:
 return high + 1
 else:
 mid = (low + high) // 2
 if self. table[mid]. key < k:
 return self. find index(k, mid + 1, high)
 else:
 return self. find index(k, low, mid − 1)
 Does this always produce the same result as the original version? Justify
 your answer.


In [None]:
No, this version does not always produce the same result. It may fail when there are duplicate keys or when determining the correct insertion index. The original version is more precise and safe for maintaining the sorted order required by SortedTableMap.

In [None]:
 R-10.22 What is the expected running time of the methods for maintaining a max
ima set if we insert n pairs such that each pair has lower cost and perfor
mance than one before it? What is contained in the sorted map at the end
 of this series of operations? What if each pair had a lower cost and higher
 performance than the one before it?


In [None]:
If we insert n pairs into a maxima set where each new pair has a lower cost and lower performance than the previous one, then each new pair is strictly dominated by the one before it. Since the maxima set only keeps non-dominated entries, none of the inserted pairs (after the first) will be added to the map. The expected running time for each insertion is O(log m) where m is the current number of elements in the map, but since the map never grows beyond 1 entry, all insertions are effectively O(log 1) = O(1), making the total expected running time O(n) in this case. The sorted map at the end will contain just one pair — the first one, which dominates all others.

On the other hand, if each new pair has a lower cost but higher performance than the previous one, no pair dominates the next. That means each pair will be added to the map, and each new insertion will likely cause deletion of dominated pairs to the right. In this case, insertions may trigger cascading deletions, leading to an overall expected running time of O(n) for insertions and O(n) for deletions across all operations, still making the total expected time O(n). At the end, the sorted map will contain all n pairs because no pair is dominated — lower cost is balanced by higher performance in every step.

In [None]:
 R-10.23 Draw an example skip list S that results from performing the following
 series of operations on the skip list shown in Figure 10.13: del S[38],
 S[48] = x ,S[24] = y ,del S[55]. Record your coin flips, as well.


| Key | Coin Flips | Height |
| --- | ---------- | ------ |
| 48  | H, H, T    | 2      |
| 24  | H, T       | 1      |


In [None]:

R-10.24 Give a pseudo-code description of the delitem map operation when
 using a skip list.


In [None]:
def __delitem__(self, key):
    # Step 1: Initialize current node as head
    current = self._head
    found = False
    
    # Step 2: Create an update list for each level
    update = [None] * self._height
    
    # Step 3: Start from top level and search down
    for level in reversed(range(self._height)):
        while current._forward[level] is not None and current._forward[level]._key < key:
            current = current._forward[level]
        update[level] = current  # Track the path
    
    # Step 4: Move to level 0 and check if the key exists
    current = current._forward[0]
    if current is not None and current._key == key:
        found = True
        
        # Step 5: Remove the node from each level
        for level in range(self._height):
            if update[level]._forward[level] == current:
                update[level]._forward[level] = current._forward[level]
        
        # Step 6: Reduce height if levels are empty
        while self._height > 1 and self._head._forward[self._height - 1] is None:
            self._height -= 1
    else:
        raise KeyError("Key not found")


In [None]:
 R-10.25 Give a concrete implementation of the pop method, in the context of a
 MutableSet abstract base class, that relies only on the five core set behav
iors described in Section 10.5.2.
 

In [None]:
def pop(self):
    """Remove and return an arbitrary element from the set."""
    for e in self:
        self.discard(e)
        return e
    raise KeyError("pop from an empty set")


In [None]:
R-10.26 Give a concrete implementation of the isdisjoint method in the context
 of the MutableSet abstract base class, relying only on the five primary
 abstract methods of that class. Your algorithm should run inO(min(n,m))
 where n and m denote the respective cardinalities of the two sets.


In [None]:
def isdisjoint(self, other):
    """Return True if self and other have no elements in common."""
    # Loop over the smaller of the two sets
    if len(self) <= len(other):
        for e in self:
            if e in other:
                return False
    else:
        for e in other:
            if e in self:
                return False
    return True


In [None]:
 R-10.27 What abstraction would you use to manage a database of friends’ birth
days in order to support efficient queries such as “find all friends whose
 birthday is today” and “find the friend who will be the next to celebrate a
 birthday”?

In [None]:
birthday_map = SortedMap()
# Key: (month, day)
# Value: ['Alice', 'Bob']
birthday_map[(10, 06)] = ['Donovan']
birthday_map[(01, 20)] = ['Katelin']
