# Maps, Hash Tables, and Skip Lists

Key topics will include:
* Maps and Dictionaries
* Hash Tables
* Sorted Maps
* Skip Lists
* Sets, Multisets, and Multimaps

# Notes

In [1]:
# packages and data
import pandas as pd, numpy as np

## Maps & Dictionaries

* Python’s dict class is arguably the most significant data structure in the language. Because of the relationship they express between keys and values, dictionaries are commonly known as associative arrays or maps
* Example usage of dictionary features is below:

In [2]:
m = {}
print(m)

# add a key-value pair
m['a'] = '1'
m['b'] = '2'
print(m)

# return a value
print(m['a'])

# get length of dictionary
len(m)

# get keys
print(m.keys())

# get values
print(m.values())

{}
{'a': '1', 'b': '2'}
1
dict_keys(['a', 'b'])
dict_values(['1', '2'])


In [3]:
# sample applications of a dictionary
# count number of words in a piece of text
text = 'THE end of the world was near! There was little time and even less hope - what must the humans do? Humans had caused this anyways'

def word_count(t: str):
    counts = {}
    for i in t.lower().split():
        counts[i] = 1 + counts.get(i, 0)

    return counts


word_count(text)

{'the': 3,
 'end': 1,
 'of': 1,
 'world': 1,
 'was': 2,
 'near!': 1,
 'there': 1,
 'little': 1,
 'time': 1,
 'and': 1,
 'even': 1,
 'less': 1,
 'hope': 1,
 '-': 1,
 'what': 1,
 'must': 1,
 'humans': 2,
 'do?': 1,
 'had': 1,
 'caused': 1,
 'this': 1,
 'anyways': 1}

## Hash Tables

* A hash table is a data structure that uses a hash function to map keys to values, allowing for fast lookups, insertions, and deletions
* The goal of a hash function, h, is to map each key k to an integer in the range [0, N −1], where N is the capacity of the bucket array for a hash table
* The hash code for a key k will typically not be suitable for immediate use with a bucket array, because the integer hash code may be negative or may exceed the capacity of the bucket array
    * A good compression function is one that minimizes the number of collisions for a given set of distinct hash codes
    * Compression functions are of 2 types:
        * Division method: which maps an integer i to (i mod N), where N, the size of the bucket array, is a fixed positive integer
        * Multiply-Add-and-Divide (or “MAD”) method: maps an integer i to [(ai+b) mod p] mod N, where N is the size of the bucket array, p is a prime number larger than N, and a and b are integers chosen at random from the interval [0, p−1], with a > 0

## Sets, Multisets, and Multimaps

* A set is an unordered collection of elements, without duplicates, that typically supports efficient membership tests. In essence, elements of a set are like keys of a map, but without any auxiliary values
* A multiset (also known as a bag) is a set-like container that allows duplicates
* A multimap is similar to a traditional map, in that it associates values with keys; however, in a multimap the same key can be mapped to multiple values. For example, the index of this book maps a given term to one or more locations at which the term occurs elsewhere in the book