# Application of HashTables
Hash tables are a fundamental data structure in computer science, prized for their average constant time (O(1)) performance for insertion, deletion, and retrieval operations. 

## Bloom Filters and Analysis
In computing, a Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". [Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter#:~:text=In%20computing%2C%20a%20Bloom%20filter,a%20member%20of%20a%20set.)

### Example how it works:


In [None]:
# when we have less data we usually get false positive rate 0
bf = BloomFilter(capacity=100, false_positive_rate=0.01)
print("Bloom filter params:", bf.info())

# Insert some items
for word in ["kartal", "savalan", "asena", "arsalan"]:
    bf.add(word)

# Membership checks
for q in ["kartal", "savalan", "asena", "arsalan"]:
    print(f"{q!r} in BloomFilter? ", (q in bf))

print("Updated stats:", bf.info())
# Expected Output:
# Bloom filter params: {'m_bits': 958, 'k_hashes': 6, 'items': 0, 'fp_rate_estimate': 0.0}
# 'kartal' in BloomFilter?  True
# 'savalan' in BloomFilter?  True
# 'asena' in BloomFilter?  True
# 'arsalan' in BloomFilter?  True
# Updated stats: {'m_bits': 958, 'k_hashes': 6, 'items': 4, 'fp_rate_estimate': 0.0}


In [None]:
# when we have huge data we usually get false positive rate meaningful value



import random
import string

# Insert 500 random strings
inserted = set()
def rand_word():
    return ''.join(random.choice(string.ascii_lowercase) for _ in range(8))

for _ in range(500):
    w = rand_word()
    inserted.add(w)
    bf.add(w)

print("Params:", bf.info())

# Now query 10_000 unseen words and measure false positives
tests = 10_000
false_pos = 0
for _ in range(tests):
    w = rand_word()
    if w in inserted:
        continue  # skip truly inserted words
    if w in bf:   # BloomFilter says True for unseen -> false positive
        false_pos += 1

observed_p = false_pos / tests
print(f"Observed false-positive rate ≈ {observed_p:.3f}")

# Expected Output:
# Params: {'m_bits': 958, 'k_hashes': 6, 'items': 351, 'fp_rate_estimate': 0.493679}
# Observed false-positive rate ≈ 0.791


### String Matching Problem

Does a given pattern 'p' if suze nm occur in a string 'S' of size n?
-Example:

p = 'ATGAT'
S = 'GTGTGTATATAT'

If we have big p and s we can have opertaional time to be bigger. because to find it you need: theta(m * (n-m+1))

But we can use hash function to reduce it.

The idea for that is:

#### Using Rolling Hash Function

```
compute pattern hash: r = h(p)
for each s = 1, ..., n-m+1:
    compute: q = h(S[s]...S[s+m-1])
    if r = q:
        compare: S[s]...S[s+m-1] with p
```

If hash collision doesnt happend so theta = (m+n)


