Yc remove unnecessary by GordonYuanyc · Pull Request #32 · ProjectASAP/ASAPSketchLib

GordonYuanyc · 2026-03-30T06:36:50Z

Summary

Remove unused infrastructure, clean up core sketch APIs, add OctoSketch multi-threaded framework.

Removed: orchestrator, Locher, Microscope, benchmarks

Deleted the entire sketch_framework/orchestrator/ module (6 files, ~1480 lines) — NodeOrchestrator, NodeCatalog, and all node types (EHNode, HashlayerNode, NitroNode, SketchNode).
Deleted sketches/locher.rs and sketches/microscope.rs.
Deleted all benches/ files (9 benchmark harnesses) and removed the criterion dev-dependency.
Added core_affinity and crossbeam-channel dependencies (used by Octo).

HashLayer: user-facing API

HashLayer<H> groups sketches that share a compatible hash. It hashes each input once and fans the result out to every sketch in the layer.

Accepted sketch types (only prehashed fast-path):

CountMin<_, FastPath, _> — Count-Min Sketch
Count<_, FastPath, _> — Count Sketch
HyperLogLog<DataFusion> / HyperLogLog<Regular> / HyperLogLogHIP

All matrix-backed sketches in one layer must have the same dimensions (rows x cols). HLL can coexist because it only consumes the lower 64 bits of the shared hash.

Example usage:

use sketchlib_rust::*;
use sketchlib_rust::sketch_framework::HashSketchEnsemble;

// Two CMS + one HLL sharing one hash per insert
let mut ensemble = HashSketchEnsemble::<DefaultXxHasher>::new(vec![
    CountMin::<Vector2D<i32>, FastPath>::with_dimensions(3, 4096).into(),
    CountMin::<Vector2D<i32>, FastPath>::with_dimensions(3, 4096).into(),
    HyperLogLog::<DataFusion>::default().into(),
]).unwrap();

// Insert — hashes once, updates all 3 sketches
ensemble.insert(&SketchInput::U64(42));

// Query frequency (CMS at index 0)
let freq = ensemble.estimate(0, &SketchInput::U64(42)).unwrap();

// Query cardinality (HLL at index 2)
let card = ensemble.cardinality(2).unwrap();

// Pre-computed hash path for hot loops
let hash = ensemble.hash_input(&SketchInput::U64(42));
ensemble.insert_with_hash(&hash);
let freq = ensemble.estimate_with_hash(0, &hash).unwrap();

Full API:

Method	Description
`new(Vec<HashLayerSketch>)`	Construct with validation
`push(sketch)`	Add a sketch (rejects incompatible)
`insert(&SketchInput)`	Hash once, insert to all
`insert_with_hash(&hash)`	Insert pre-computed hash to all
`insert_at(&[usize], &SketchInput)`	Insert to specific indices
`insert_at_with_hash(&[usize], &hash)`	Same with pre-computed hash
`bulk_insert`, `bulk_insert_with_hashes`	Batch variants
`bulk_insert_at`, `bulk_insert_at_with_hashes`	Batch + index variants
`estimate(index, &SketchInput)`	Frequency query (CMS/Count)
`estimate_with_hash(index, &hash)`	Frequency with pre-hash
`cardinality(index)`	Distinct-count (HLL)
`hash_input(&SketchInput)`	Expose the shared hash
`get` / `get_mut` / `len` / `is_empty`	Accessors

Also cleaned up sketch_catalog.rs: removed all unused catalog enums (FreqSketch, CardinalitySketch, QuantileSketch, etc.). Only the adapter traits (CountMinFastOps, CountFastOps, etc.) used by HashLayer remain.

HLL: adjustable register storage

HllBucketList types are now backed by Box<[u8; N]> with a HllRegisterStorage trait providing PRECISION, REGISTER_BITS, NUM_REGISTERS, and slice access. Three precisions are supported: P12 (4096 registers), P14 (16384), P16 (65536).

KLL: rearranged memory layout for speed

Pre-allocated flat buffer (Box<[f64]>) with level offsets instead of per-level Vecs.
MAX_LEVELS = 61 hard cap with compute_max_capacity sizing.
In-place randomly_halve_up and merge_sorted_runs with a reusable scratch buffer — avoids allocations during compaction.

OctoSketch: multi-threaded sketch framework

New sketch_framework/octo.rs module implementing the OctoSketch pattern — pin workers to cores, each with a local small sketch, and promote deltas to a shared parent sketch when local counters overflow.

OctoWorker / OctoAggregator traits for custom sketch types.
Built-in implementations: CmOctoWorker, CountOctoWorker, HllOctoWorker.
OctoRuntime manages worker threads with core_affinity pinning and crossbeam-channel communication.
OctoReadHandle for lock-free reads of the parent sketch during ingestion.
Delta types in sketches/octo_delta.rs: CmDelta, CountDelta, HllDelta with configurable promotion thresholds.
CMS and Count workers are generic over FastPath / RegularPath.

Made-with: Cursor

zzylol · 2026-03-31T20:55:25Z

src/sketches/kll.rs

 use crate::common::input::sketch_input_to_f64;
 use crate::{SketchInput, Vector1D};

+const MAX_LEVELS: usize = 61;


Is the intent to use const variables, or avoid them?

Yes, this is intentional for speed/performance.
Theoretically, this seems to be large enough (2^60 insertions or more).

What will be the user API of configuring the e.g., K in KLL?

this one:
pub fn init(k: usize, m: usize) -> Self
k is bottom compactor size and m is minimum compactor size

GordonYuanyc added 14 commits March 29, 2026 00:47

remove node orchestrator; TODO: fix hash layer

62cb7b9

remove locher and microscope; TODO: fix hashlayer

d647324

clean up the hash layer

34c88a7

hll can adjust register list length now

0d01032

rearrange kll memory layout for speed

3b287ca

cargo fmt

aa37f18

merge octo contents from another branch

2256b50

Made-with: Cursor

cargo fmt

a3242ab

make Octo CMS and CS generic

9fcbd8a

update insert_emit_delta in octo to be aware of FastPath and RegularPath

bbc79ad

cargo fmt

0a16ef2

make octo threshold smaller

96bc7db

clean up api for hashlayer to make it more user friendly

2d1d229

clean up sketch_catalog.rs

72e13a8

GordonYuanyc marked this pull request as ready for review March 31, 2026 20:37

GordonYuanyc requested review from milindsrivastava1997 and zzylol March 31, 2026 20:37

zzylol reviewed Mar 31, 2026

View reviewed changes

rename: HashLayer -> HashSketchEnsemble

69aeb25

GordonYuanyc merged commit 8a99d57 into main Apr 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yc remove unnecessary#32

Yc remove unnecessary#32
GordonYuanyc merged 15 commits intomainfrom
yc-remove-unnecessary

GordonYuanyc commented Mar 30, 2026 •

edited

Loading

Uh oh!

zzylol Mar 31, 2026

Uh oh!

GordonYuanyc Mar 31, 2026

Uh oh!

zzylol Apr 1, 2026

Uh oh!

GordonYuanyc Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GordonYuanyc commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Removed: orchestrator, Locher, Microscope, benchmarks

HashLayer: user-facing API

HLL: adjustable register storage

KLL: rearranged memory layout for speed

OctoSketch: multi-threaded sketch framework

Uh oh!

zzylol Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

GordonYuanyc Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

zzylol Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

GordonYuanyc Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GordonYuanyc commented Mar 30, 2026 •

edited

Loading