Hash Tables

	Time Complexity	Note
Worst Case:	O(n)	Linear, depending on collision handling
Best Case:	O(1)	Thanks to hashing

Data

Each item must have a unique key— or the value is converted into a unique key through hashing.
Underlying data structure is either an incredibly large array or a linked list

Operations

Hash (maps a unique key to a position in the list)
Insert
Search
Delete

Hashing

Converts a specified word into a unique key that can be used to determine its position in the underlying array.

The goal is to have a function that maps a unique key for any input value— this is to ensure no 'collisions' occur (that is, two differnt words having the same key). A good hash function requires the following:

type dependent (otherwise it cannot compute a value)
must return a value within the array range (otherwise it will return an index error when inserting)
not have too many operations (otherwise slowness)
use as much of the key as possible (minimise collisions)
avoid common factors (minimise collisions)

Below is a universal hash function implemented in Python. It is best to use prime numbers for TABLE_SIZE, a and b.

def hash_function(word, TABLE_SIZE):
    value = 0
    # random seeds
    a = 10651
    b = 10531
    for i in range(len(word)):
        value = (a*value + ord(word[i])) % TABLE_SIZE
        a = a * b % (TABLE_SIZE-1)
    return value

Insertion

Use the hash function to get a position
Try to insert at position
Otherwise, deal with the collision

Search ("Check if in hash table")

Use hash function to get a position
If the query matches, return True
Otherwise, deal with the collision until found
If not found, return False

Delete

Use the hash function to get a position
Search for the item, deal with collisions
If it exists and is found, remove the item

Collision Resolution

Clustering occurs when elements of similar keys create groups in a hash table array (called a probe chain)

This is a problem because when a search occurs for a particular key and the actual key location has another item that doesn't belong there, it has to traverse over the entire cluster— which would affect performance.

Linear Probing

Checking every item in the array until the position's conditions are perfect
Check query position, then look at the next, then the next... wrapping around the entire table until you're back to the original position

Quadratic Probing

Like Linear Probing, but moving in larger and larger steps, wrapping around the list
A problem with probing is that items with similar hash values tend to 'cluster'

Double Hashing

Using a second hash function to determine the probe 'step' to prevent clustering
Should not hash to 0, and the table size and step size are co-prime

Seperate Chaining

Using linked lists for every item in the large array
If a collision occurs, just append a node to the end of the chain in the array position and traverse

Example: Hash Table Dictionary Object

class Dictionary:

	def __init__(self, table_size, prime):
		self.prime = prime
		self.item_count = 0
		self.table_size = table_size
		self.array = [None] * table_size

	def hash(self, key):
		index = 0
		a = 2692
		b = 8523
		for i in range(len(key)):
			index = (a * index + ord(key[i])) % self.table_size
			a = a * b % (self.table_size-1)
		return index

	def is_full(self):
		return self.item_count >= self.table_size

	def insert(self, key, value):
		index = self.hash(key)
		if self.is_full():
			found = False
			end = (index - 1) % self.table_size
			while index != end:
				if self.array[index][0] == key:
					self.array[index] = [key, value]
					found = True
					break
				index = (index + 1) % self.table_size
			if not found:
				raise Exception("Dictionary is full!")
		else: 
			while not self.is_full():
				if self.array[index] != None and self.array[index][0] == key:
					self.array[index] = [key, value]
					break
				if self.array[index] == None:
					self.array[index] = [key, value]
					self.item_count += 1
					break
				index = (index + 1) % self.table_size

	def search(self, key):
		index = self.hash(key)

		if self.array[index][0] == key:
			return self.array[index][1]
		else:
			end = (index - 1) % self.table_size
			if self.array[end][0] == key:
				return self.array[end][1]
			index = (index + 1) % self.table_size
			while index != end:
				if self.array[index][0] == key:
					return self.array[index][1]
				else:
					index = (index + 1) % self.table_size
			raise KeyError(str(key) + " not found!")

Probability of Collision

Given 100 items, the probability of collision for:

23 items is ~50%
50 items is 97%
70 items is 99.9%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hash-tables.md

hash-tables.md

Hash Tables

Data

Operations

Hashing

Insertion

Search ("Check if in hash table")

Delete

Collision Resolution

Linear Probing

Quadratic Probing

Double Hashing

Seperate Chaining

Example: Hash Table Dictionary Object

Probability of Collision

Files

hash-tables.md

Latest commit

History

hash-tables.md

File metadata and controls

Hash Tables

Data

Operations

Hashing

Insertion

Search ("Check if in hash table")

Delete

Collision Resolution

Linear Probing

Quadratic Probing

Double Hashing

Seperate Chaining

Example: Hash Table Dictionary Object

Probability of Collision