Klearu

A native Rust implementation of the SLIDE paper family (Sub-LInear Deep learning Engine), with extensions for LLM inference, transformer sparsity prediction, and private two-party computation.

License

AGPL-3.0 with additional terms. See LICENSE for details. Commercial use is restricted to the Quilibrium mainnet. Automated reproduction (including LLM-assisted "clean room" reimplementation) for commercial substitutes is expressly prohibited.

Workspace Overview

Klearu is organized as a Cargo workspace with 10 crates:

Crate	Description
klearu-core	Foundation: LSH hash families, sparse tensors, SLIDE network training
klearu-accel	SIMD vectorization (AVX2/NEON/scalar), BF16 quantization, cache-aligned memory
klearu-mongoose	Learnable hash functions, adaptive rebuild scheduling with drift detection
klearu-bolt	LSH hyperparameter autotuning, sparse inference optimizations
klearu-dejavu	Deja Vu transformer sparsity prediction (attention heads + MLP neurons)
klearu-llm	LLaMA-compatible LLM inference with optional sparsity
klearu	Facade crate with feature-gated re-exports
klearu-dpf	Distributed Point Functions (AES-based BGI construction) and DCF
klearu-mpc	2PC building blocks: Q16.16/Q32.32 fixed-point, Beaver triples, additive sharing
klearu-private	Private LLM inference via 2PC with Ferret OT and Ristretto255 OPRF

Building

The core crates build standalone. The klearu-private crate depends on the ferret crate from the Quilibrium monorepo via a relative path. To build the full workspace including private inference, clone both repositories as siblings:

your-workspace/
  klearu/       # this repository
  monorepo/     # git clone https://github.com/quilibriumnetwork/monorepo

Then build from inside klearu/:

# Full workspace (requires monorepo sibling for klearu-private)
cargo build --release

# With specific features via the facade crate
cargo build --release -p klearu --features full

# LLM inference only (no monorepo needed)
cargo build --release -p klearu-llm

# LLM with sparse inference (no monorepo needed)
cargo build --release -p klearu-llm --features sparse

# Private inference (requires monorepo sibling)
cargo build --release -p klearu-private

Testing

cargo test --workspace

Crate Details

klearu-core — SLIDE Primitives

The foundation crate provides LSH-based sub-linear training and inference.

Hash families (HashFamily trait): SimHash, WtaHash, DwtaHash, MinHash, SparseRandomProjection

LSH index (LshIndexTrait): query(), query_union(), query_with_counts() — with FIFO or reservoir-sampled buckets

Network: Full SLIDE training loop with configurable layers, optimizers, and sampling strategies.

use klearu_core::config::*;
use klearu_core::network::Network;

let config = SlideConfig {
    network: NetworkConfig {
        layers: vec![
            LayerConfig::hidden(784, 1024),
            LayerConfig::output(1024, 10),
        ],
        optimizer: OptimizerType::Adam,
        learning_rate: 0.001,
        batch_size: 128,
        num_threads: 4,
    },
    seed: 42,
    hogwild: true,
};

let mut network = Network::new(config);

Configurable Parameters

Parameter	Default	Description
`num_tables` (L)	50	Number of LSH hash tables
`num_hashes` (K)	6	Hash bits per table
`bucket_capacity`	128	Max neurons per bucket
`bucket_type`	FIFO	FIFO or Reservoir sampling
`hash_function`	SimHash	SimHash, WtaHash, DwtaHash, MinHash, SRP
`rebuild_interval_base`	100	Steps between LSH rebuilds
`rebuild_decay`	0.1	Exponential decay for rebuild interval
`optimizer`	Adam	Adam or SGD
`activation`	ReLU	ReLU, Sigmoid, Tanh, Softmax
`sampling`	Vanilla	Vanilla, TopK, Threshold
`hogwild`	false	Lock-free parallel training

klearu-accel — Hardware Acceleration

Platform-adaptive SIMD (AVX2 on x86, NEON on ARM, scalar fallback) for dot products and scatter-add. BF16 quantization with two modes: full BF16 or BF16-storage/FP32-gradient. ContiguousWeightStore provides cache-line-aligned (64-byte) weight layouts.

klearu-mongoose — Learnable Hashing

Trainable hash functions that adapt to data distribution, plus an AdaptiveScheduler that monitors hash-bucket drift via EMA and triggers rebuilds only when needed.

Parameter	Default	Description
`min_interval`	—	Minimum steps between rebuild checks
`max_interval`	—	Forced rebuild interval
`sample_fraction`	—	Fraction of neurons to sample for drift
`drift_threshold`	—	Drift level that triggers a rebuild
`ema_alpha`	0.3	Exponential moving average smoothing

klearu-bolt — Autotuning

Automatic LSH hyperparameter search over K and L to hit a target recall while minimizing query cost.

use klearu_bolt::autotune::LshAutotuner;

let tuner = LshAutotuner::new(0.9)   // target 90% recall
    .with_k_range(4, 16)
    .with_l_range(10, 200)
    .with_num_samples(100)
    .with_speedup_ratio(0.1);

let result = tuner.autotune(&neurons, &queries, 42);
// result.best_k, result.best_l, result.recall, result.query_cost

klearu-dejavu — Transformer Sparsity

Implementation of the Deja Vu paper: lightweight MLP predictors that identify which attention heads and FFN neurons are important for each token, enabling sparse transformer inference.

klearu-llm — LLM Inference

A LLaMA-compatible inference engine supporting GQA, RoPE, RMSNorm, and SwiGLU. Works with any HuggingFace-format model that uses the LLaMA architecture.

LLM Configuration

Parameter	Default	Description
`temperature`	0.7	Sampling temperature (0.0 = greedy)
`top_k`	40	Top-k filtering (0 = disabled)
`top_p`	0.9	Nucleus sampling (1.0 = disabled)
`repetition_penalty`	1.1	Penalize repeated tokens (1.0 = disabled)
`max_new_tokens`	512	Maximum tokens to generate
`template`	auto	Chat template (auto, zephyr, chatml, llama2, llama3, mistral, raw)

Sparse Inference (feature: `sparse`)

Parameter	Default	Description
`head_sparsity`	0.5	Fraction of attention heads to keep
`neuron_sparsity`	0.5	Fraction of FFN neurons to keep

klearu-dpf — Distributed Point Functions

AES-based DPF using the BGI construction, plus DCF (Distributed Comparison Functions) via prefix decomposition into DPFs. Used as a building block for the MPC protocols.

klearu-mpc — Two-Party Computation

Fixed-point arithmetic in Q16.16 (u32 shares) and Q32.32 (u64 shares), additive secret sharing, Beaver triple multiplication, polynomial SiLU approximation, and reveal-and-correct RMSNorm. Provides a Transport trait for abstracting communication.

klearu-private — Private LLM Inference

End-to-end private inference combining Ferret COT (Correlated Oblivious Transfer), Ristretto255 OPRF, and the MPC building blocks. Two security levels:

Level	Communication	Privacy	Speed
Lower	~4.6 KB/token	Server learns nothing; client embedding revealed then plaintext forward	Fast
High	~2 MB/token, ~34K triples	Only norms, queries, and gate values revealed	Slower

Running the LLM Demo

1. Download a Model

Klearu works with any HuggingFace LLaMA-architecture model in safetensors format. SmolLM models are a good starting point for testing:

# Install the HuggingFace CLI if you don't have it
pip install huggingface-hub

# Download SmolLM-135M-Instruct (~270 MB)
huggingface-cli download HuggingFaceTB/SmolLM-135M-Instruct \
    --local-dir SmolLM-135M-Instruct

# Or a larger model — SmolLM-360M-Instruct (~720 MB)
huggingface-cli download HuggingFaceTB/SmolLM-360M-Instruct \
    --local-dir SmolLM-360M-Instruct

# Or SmolLM-1.7B-Instruct (~3.4 GB)
huggingface-cli download HuggingFaceTB/SmolLM-1.7B-Instruct \
    --local-dir SmolLM-1.7B-Instruct

The model directory should contain at minimum:

config.json — HuggingFace model configuration
tokenizer.json — Tokenizer
*.safetensors — Model weights

2. Run the Chat Interface

# Basic chat (auto-detects chat template)
cargo run --release --bin chat -- ./SmolLM-135M-Instruct

# With custom sampling parameters
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
    --temp 0.8 --top-k 50 --top-p 0.95 --max-tokens 256

# With a system prompt
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
    --system "You are a helpful coding assistant."

# Force a specific chat template
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
    --template chatml

The chat binary starts an interactive loop — type your message and press Enter. Use Ctrl-D to quit.

3. Sparse Inference (Optional)

First calibrate sparsity predictors, then run with --sparse:

# Train predictors (requires sparse feature)
cargo run --release --features sparse --bin calibrate -- ./SmolLM-135M-Instruct \
    --samples 16 --epochs 100

# Chat with sparse inference
cargo run --release --features sparse --bin chat -- ./SmolLM-135M-Instruct \
    --sparse --head-sparsity 0.5 --neuron-sparsity 0.5

4. Model Diagnostics

Validate that a model loads and runs correctly:

cargo run --release --bin diagnose -- ./SmolLM-135M-Instruct

This checks config parsing, weight loading, tokenizer functionality, forward pass sanity, and greedy generation.

5. Private Two-Party Inference

Run inference where the server holds the model weights and the client's input tokens remain private:

# Terminal 1 — start the server
cargo run --release --bin private-server -- ./SmolLM-135M-Instruct \
    --port 9000 --security lower

# Terminal 2 — connect the client
cargo run --release --bin private-client -- ./SmolLM-135M-Instruct \
    --host localhost:9000 --security lower

For development and testing, add --dummy-triples to both sides to skip Ferret OT setup. For real security, omit this flag to use actual oblivious transfer.

Feature Flags

The facade crate (klearu) provides feature-gated access to all functionality:

Feature	Enables
`simd`	SIMD-accelerated dot products and scatter-add
`bf16`	BF16 quantization
`mongoose`	Learnable hashing and adaptive scheduling
`bolt`	LSH autotuning
`deja-vu`	Transformer sparsity prediction
`llm`	LLM inference engine
`full`	All of the above

The sparse feature on klearu-llm enables Deja Vu sparse inference and the calibrate binary.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
crates		crates
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Klearu

License

Workspace Overview

Building

Testing

Crate Details

klearu-core — SLIDE Primitives

Configurable Parameters

klearu-accel — Hardware Acceleration

klearu-mongoose — Learnable Hashing

klearu-bolt — Autotuning

klearu-dejavu — Transformer Sparsity

klearu-llm — LLM Inference

LLM Configuration

Sparse Inference (feature: `sparse`)

klearu-dpf — Distributed Point Functions

klearu-mpc — Two-Party Computation

klearu-private — Private LLM Inference

Running the LLM Demo

1. Download a Model

2. Run the Chat Interface

3. Sparse Inference (Optional)

4. Model Diagnostics

5. Private Two-Party Inference

Feature Flags

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Klearu

License

Workspace Overview

Building

Testing

Crate Details

klearu-core — SLIDE Primitives

Configurable Parameters

klearu-accel — Hardware Acceleration

klearu-mongoose — Learnable Hashing

klearu-bolt — Autotuning

klearu-dejavu — Transformer Sparsity

klearu-llm — LLM Inference

LLM Configuration

Sparse Inference (feature: sparse)

klearu-dpf — Distributed Point Functions

klearu-mpc — Two-Party Computation

klearu-private — Private LLM Inference

Running the LLM Demo

1. Download a Model

2. Run the Chat Interface

3. Sparse Inference (Optional)

4. Model Diagnostics

5. Private Two-Party Inference

Feature Flags

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Sparse Inference (feature: `sparse`)

Packages