# HashComb — Overview & API
A compact reference notebook: how HashComb works, CLI usage, and a minimal example for every public function.

## How HashComb works
- Build a balanced binary tree over a numeric range.
- Each node represents an interval; leaf nodes are bins.
- A value maps to a root→leaf path; each node can be hashed into a token.
- Three signatures: leaf hash (single token), prefix multihash (first $k$ tokens), full-path multihash (all tokens).

## CLI usage (hashcomb)
Commands:
- encode: build an encoder and output a leaf token, path, or prefix.
- decode: decode a leaf token or path/prefix.

Key flags:
- --channels, --min, --max, --value
- --mode leaf|path|prefix (use --prefix-length for prefix)
- --randomized, --p, --target-level, --seed
- --include-internal, --delta, --salt

Entry point: `python -m hashcomb.cli ...`

In [1]:
from pathlib import Path
import sys
import numpy as np
import pandas as pd

root = Path.cwd()
repo_root = root
for _ in range(6):
    if (repo_root / "pyproject.toml").exists():
        break
    if repo_root.parent == repo_root:
        break
    repo_root = repo_root.parent

src_dir = repo_root / "src"
if src_dir.exists():
    sys.path.append(str(src_dir))

artifacts_dir = repo_root / "notebooks" / "artifacts"
artifacts_dir.mkdir(parents=True, exist_ok=True)

from hashcomb import (
    Encoder,
    RandomizedEncoder,
    Decoder,
    Tree,
    Node,
    RoundContext,
    PklIO,
    CsvIO,
)
from hashcomb.core.hash import Hash
from hashcomb.addons import aggregate_ciphertexts, serialize_path, deserialize_path

## Encoder + Decoder
**Encoder**
- `Encoder.__init__`: build tree + config.
- `Encoder.from_pkl`: load from config file.
- `Encoder.from_config`: load from in‑memory dict.
- `encode`, `encodePath`, `encodePrefix`.
- `encodeArray`, `encodePathArray`, `encodePrefixArray`.

**Decoder**
- `Decoder.__init__`: load config.
- `decode`, `decodePath`.
- `decodeArray`, `decodePathArray`.

In [2]:
config_path = artifacts_dir / "api_config.pkl"
enc = Encoder(3, 10.0, 0.0, configPath=str(config_path), includeInternal=True)
dec = Decoder(configPath=str(config_path))

value = 3.7
leaf = enc.encode(value)
path = enc.encodePath(value)
prefix = enc.encodePrefix(value, length=2)

arr_leaf = enc.encodeArray([1.0, 2.0])
arr_path = enc.encodePathArray([1.0, 2.0])
arr_prefix = enc.encodePrefixArray([1.0, 2.0], length=2)

leaf_center = dec.decode(leaf)
prefix_center = dec.decodePath(prefix)
arr_centers = dec.decodeArray(arr_leaf)
arr_path_centers = dec.decodePathArray(arr_path)

enc2 = Encoder.from_pkl(str(config_path))
cfg = PklIO.loadConfig(str(config_path))
enc3 = Encoder.from_config(cfg)

leaf, path, prefix, leaf_center, prefix_center

('15463552',
 ['96646732', '215432105', '15463552'],
 ['96646732', '215432105'],
 3.125,
 3.75)

## RandomizedEncoder
- `RandomizedEncoder.__init__`
- `RandomizedEncoder.from_pkl`, `RandomizedEncoder.from_config`
- `expected_level`, `compute_selection_probability`
- `encode`, `encodePath`, `encodePrefix`
- `encodeArray`, `encodePathArray`, `encodePrefixArray`

In [3]:
rand_config = artifacts_dir / "api_rand.pkl"
ctx = RoundContext(salt="roundA", seed=123)

renc = RandomizedEncoder(
    3,
    10.0,
    0.0,
    selectionProbability=0.6,
    roundContext=ctx,
    configPath=str(rand_config),
)

r_leaf = renc.encode(value)
r_path = renc.encodePath(value)
r_prefix = renc.encodePrefix(value, length=2)

r_arr_leaf = renc.encodeArray([1.0, 2.0])
r_arr_path = renc.encodePathArray([1.0, 2.0])
r_arr_prefix = renc.encodePrefixArray([1.0, 2.0], length=2)

p_est = RandomizedEncoder.compute_selection_probability(3, targetLevel=2.2)
exp_lvl = RandomizedEncoder.expected_level(3, p_est)

renc2 = RandomizedEncoder.from_pkl(str(rand_config))
rcfg = PklIO.loadConfig(str(rand_config))
renc3 = RandomizedEncoder.from_config(rcfg)

r_leaf, r_path, r_prefix, p_est, exp_lvl

('254911924',
 ['86878757', '166901263', '254911924'],
 ['86878757', '166901263'],
 0.5279762856666992,
 2.2000000004552747)

## Tree, Node, Hash, RoundContext
**Tree**: `__init__`, `round`, `insert`, `traverseLevelOrder`, `traverseInOrder`, `getHValues`.

**Node**: `__init__`, `getCenter`, `isLeaf`, `__str__`, `getValue`.

**Hash**: `buildHashTable`, `sha3_256_int64`, `hash_token`.

**RoundContext**: `generate`.

In [4]:
tree = Tree(3, 10.0, 0.0)
rounded = Tree.round(1.2345, 2)
tree.insert(tree.root)
count_level = tree.traverseLevelOrder(True)
count_inorder = tree.traverseInOrder(tree.root, True, 0)
path_tokens = tree.getHValues(3.7, True)

node = Node(0.0, 1.0, 0)
node_center = node.getCenter
node_leaf = node.isLeaf
node_str = str(node)
node_token = node.getValue(True)
node_path = node.getValue(0.2, True)

hash_table = Hash.buildHashTable(tree, include_internal=True)
sha_int = Hash.sha3_256_int64("abc")
hash_tok = Hash.hash_token("abc", "salt")

ctx2 = RoundContext.generate(salt_bytes=2, seed=7)

rounded, count_level, count_inorder, node_center, node_leaf, node_str, node_token

(1.23, 15, 15, 0.5, True, 'Min: 0.0    Max: 1.0', '196342134')

## PklIO + CsvIO
**PklIO**: `savePickle`, `loadPickle`, `saveConfig`, `loadConfig`, `writeLine`.

**CsvIO**: `sniffDialect`, `readCsv`, `encodeCsv`, `decodeCsv`.

In [5]:
obj_path = artifacts_dir / "obj.pkl"
PklIO.savePickle(obj_path, {"a": 1})
obj = PklIO.loadPickle(obj_path)

cfg_path = artifacts_dir / "config_min.pkl"
PklIO.saveConfig(cfg_path, {"schema": "hashcomb.config.v1", "params": {}})
cfg_loaded = PklIO.loadConfig(cfg_path)

lines_path = artifacts_dir / "lines.txt"
PklIO.writeLine(lines_path, ["line1", "line2"], append=False)

csv_in = artifacts_dir / "demo.csv"
PklIO.writeLine(csv_in, ["value", "1.0", "2.5"], append=False)

dialect = CsvIO.sniffDialect(str(csv_in))
rows = CsvIO.readCsv(str(csv_in), skipHeader=1)

csv_enc = artifacts_dir / "demo_encoded.csv"
csv_dec = artifacts_dir / "demo_decoded.csv"
CsvIO.encodeCsv(str(csv_in), str(csv_enc), enc, valueCol="value", hashCol="hash")
CsvIO.decodeCsv(str(csv_enc), str(csv_dec), dec, hashCol="hash", decodedValueCol="decoded_value")

obj, cfg_loaded, rows

({'a': 1}, {'schema': 'hashcomb.config.v1', 'params': {}}, [['1.0'], ['2.5']])

## Add-ons
- `aggregate_ciphertexts`
- `serialize_path`, `deserialize_path`

In [6]:
items = [("a", 1), ("a", 2), ("b", 5)]
agg = aggregate_ciphertexts(items)

ser = serialize_path(["x", "y"])
restored = deserialize_path(ser)

agg, ser, restored

({'a': 3, 'b': 5}, 'v1|2|1:x|1:y', ['x', 'y'])