# 🔍 Hash Function Distribution Exploration
This notebook helps you analyze the distribution of various hash functions and reflect on their behavior.

## Setup

In [None]:
import matplotlib.pyplot as plt
from collections import defaultdict
import hashlib
import statistics

## Utility Function: Bucket Distribution

In [None]:
def bucket_distribution(data, num_buckets, hash_fn):
    buckets = defaultdict(list)
    for item in data:
        bucket = hash_fn(item) % num_buckets
        buckets[bucket].append(item)
    return buckets

## Hash Function Definitions

In [None]:
def poor_hash(key):
    return len(key)

def simple_ascii_sum(key):
    return sum(ord(char) for char in key)

def hash_sha256(key):
    return int(hashlib.sha256(key.encode()).hexdigest(), 16)

## Dataset and Plotting Utility

In [None]:
data = [f"key{i}" for i in range(1000)]
num_buckets = 10

def plot_distribution(title, counts):
    plt.bar(range(len(counts)), counts)
    plt.title(title)
    plt.xlabel("Bucket")
    plt.ylabel("# of Keys")
    plt.show()

## Example 1: Python Built-in `hash()`

In [None]:
buckets = bucket_distribution(data, num_buckets, hash)
counts = [len(buckets[i]) for i in range(num_buckets)]
plot_distribution("Bucket Distribution with Built-in hash()", counts)

## Example 2: Poor Hash Function (`len(key)`)

In [None]:
buckets = bucket_distribution(data, num_buckets, poor_hash)
counts = [len(buckets[i]) for i in range(num_buckets)]
plot_distribution("Poor Hash Function: len(key)", counts)

## Example 3: ASCII Sum Hash

In [None]:
buckets = bucket_distribution(data, num_buckets, simple_ascii_sum)
counts = [len(buckets[i]) for i in range(num_buckets)]
plot_distribution("Simple ASCII Sum Hash Function", counts)

## Example 4: SHA-256 Hash

In [None]:
buckets = bucket_distribution(data, num_buckets, hash_sha256)
counts = [len(buckets[i]) for i in range(num_buckets)]
plot_distribution("SHA-256 Hash Function", counts)

## 🧮 Metrics for Last Distribution

In [None]:
std_dev = statistics.stdev(counts)
max_bucket = max(counts)
collisions = sum(count - 1 for count in counts if count > 1)

print("=== Distribution Metrics ===")
print(f"Standard Deviation: {std_dev:.2f}")
print(f"Max Bucket Size: {max_bucket}")
print(f"Collisions: {collisions}")

## ✍️ Reflective Prompt #1

**Observation-based Prompt:**

Looking at the bucket distributions across the different hash functions, which one surprised you the most and why? What patterns or anomalies did you observe that you did not expect?

In [None]:
# Write your reflection here


## ✍️ Reflective Prompt #2

**Curiosity Prompt:**

If you had to design your own hash function, what strategies would you use to ensure a balanced bucket distribution? What kind of inputs might break a poorly designed hash function?

In [None]:
# Write your response here
