# HLLSet Manifold - Quick Start Guide

This notebook demonstrates the basic functionality of the HLLSet Manifold system using the C/Cython backend.

## 1. Basic HLLSet Operations

HLLSet is an immutable probabilistic set using HyperLogLog for cardinality estimation.

In [1]:
from core import HLLSet

# Create HLLSets from batches of tokens
hll1 = HLLSet.from_batch(['apple', 'banana', 'cherry'])
hll2 = HLLSet.from_batch(['banana', 'cherry', 'date', 'elderberry'])

print(f"HLL1: {hll1}")
print(f"HLL2: {hll2}")
print(f"\nBackend: {hll1.backend}")

HLL1: HLLSet(0c0af08e..., |A|≈3.0, backend=C/Cython)
HLL2: HLLSet(11a37d68..., |A|≈4.0, backend=C/Cython)

Backend: C/Cython


## 2. Set Operations

All operations return new immutable instances.

In [2]:
# Union
union = hll1.union(hll2)
print(f"Union cardinality: {union.cardinality():.2f}")
print(f"Expected: ~5 (apple, banana, cherry, date, elderberry)")

# Similarity (Jaccard)
similarity = hll1.similarity(hll2)
print(f"\nJaccard similarity: {similarity:.2%}")
print(f"(2 common / 5 total = ~40%)")

# Cosine similarity
cosine = hll1.cosine(hll2)
print(f"\nCosine similarity: {cosine:.2%}")

Union cardinality: 5.01
Expected: ~5 (apple, banana, cherry, date, elderberry)

Jaccard similarity: 39.90%
(2 common / 5 total = ~40%)

Cosine similarity: 57.64%


## 3. Batch Processing

Process multiple batches efficiently with optional parallelization.

In [3]:
# Multiple batches
batches = [
    ['token1', 'token2', 'token3'],
    ['token4', 'token5', 'token6'],
    ['token7', 'token8', 'token9']
]

# Sequential processing
hll_seq = HLLSet.from_batches(batches, parallel=False)
print(f"Sequential: {hll_seq.cardinality():.2f} distinct tokens")

# Parallel processing (thread-safe C backend!)
hll_par = HLLSet.from_batches(batches, parallel=True)
print(f"Parallel: {hll_par.cardinality():.2f} distinct tokens")

Sequential: 9.04 distinct tokens
Parallel: 9.04 distinct tokens


## 4. Kernel Operations

The Kernel provides stateless transformation operations.

In [4]:
from core import Kernel

kernel = Kernel()

# Absorb tokens into HLLSet
hll_a = kernel.absorb(['hello', 'world', 'test'])
hll_b = kernel.absorb(['hello', 'python', 'code'])

print(f"HLL A: {hll_a.cardinality():.2f} tokens")
print(f"HLL B: {hll_b.cardinality():.2f} tokens")

# Union via kernel
hll_union = kernel.union(hll_a, hll_b)
print(f"\nUnion: {hll_union.cardinality():.2f} tokens")

# Similarity via kernel
sim = kernel.similarity(hll_a, hll_b)
print(f"Similarity: {sim:.2%}")

HLL A: 3.00 tokens
HLL B: 3.00 tokens

Union: 5.01 tokens
Similarity: 19.88%


## 5. Immutability

HLLSets are immutable - operations always return new instances.

In [5]:
# Base HLLSet
base = HLLSet.from_batch(['a', 'b', 'c'])
print(f"Base: {base.cardinality():.2f}")

# Add more tokens (returns new instance)
modified = HLLSet.add(base, ['d', 'e', 'f'])
print(f"Modified: {modified.cardinality():.2f}")

# Base is unchanged
print(f"Base after 'add': {base.cardinality():.2f} (unchanged!)")

# Content-addressed names
print(f"\nBase name: {base.name}")
print(f"Modified name: {modified.name}")
print(f"Names are different: {base.name != modified.name}")

Base: 3.00
Modified: 6.02
Base after 'add': 3.00 (unchanged!)

Base name: ab84220c03670b6215d0ab3886ddf25a949ec7aa
Modified name: 1545507c0ebf3bc162a9024b6677b3d631c03471
Names are different: True


## Next Steps

Explore other notebooks:
- **02_n_token_algorithm.ipynb** - N-token generation and disambiguation
- **03_adjacency_matrix.ipynb** - Order preservation with adjacency matrices
- **04_kernel_entanglement.ipynb** - Kernel operations and entanglement
- **05_manifold_os.ipynb** - ManifoldOS and driver management