Public, runnable reference baseline for the resident-memory measurement methodology described in DreamNova's Technical Brief.
What this is. A self-contained benchmark harness measuring the resident-memory footprint and inference latency of a small open-weight language model under three regimes: dense baseline, top-K=50% sparse activation, and top-K=25% sparse activation.
What this is NOT. This script does not implement the DreamNova patent-novel runtime primitive. The top-K activation gating used here is the well-published Switch Transformer / Mixture-of-Experts reference baseline. The DreamNova primitive targets 75% weight-footprint reduction at strict iso-quality through a topology-aware reconstruction scheme — that is the patent claim, currently pre-USPTO Wave 1 (May 2026).
The point of this baseline is to give an external reviewer a runnable, transparent measurement methodology they can reproduce, modify, and apply to their own evaluations.
DreamNova's seed pitch makes a measurable claim: 75% reduction in resident model weight footprint at iso-quality. The Year-1 milestone (Q4 2026) is precisely to publish that measured number, on a Tier-1 inference partner's stack, with reproducible methodology.
Until then, this repository ships the measurement infrastructure publicly so that:
- Reviewers can verify our methodology is sound (PyTorch + transformers · standard tooling · single forward pass · no privileged hooks).
- Reviewers can reproduce the dense baseline numbers themselves on commodity hardware.
- The gap between the simple top-K activation baseline shipped here and the DreamNova design target is visible and quantified.
DreamNova reference baseline · gpt2-medium · mps · torch.float32
==================================================================================
Model · 354.8M params · theoretical weight footprint = 1353.5 MB
Process RSS delta from load = 1401.8 MB
Condition Δ RSS (MB) Latency Overhead
----------------------------------------------------------------------------------
dense_baseline 0.36 266.4ms 1.00x
top_k_50pct 0.11 276.7ms 1.04x
top_k_25pct 0.09 257.6ms 0.97x
----------------------------------------------------------------------------------
DreamNova design target → 75% reduction of WEIGHT footprint at iso-quality, ≤1.10x latency overhead
(this baseline measures activation top-K, NOT weight reduction; gap is the patent claim)
Full results in results.json.
| Metric | What it tells you |
|---|---|
| Theoretical weight footprint (1353.5 MB) | The model's parameter count × dtype byte width. This is the floor: any inference framework must hold at least this much in fast memory. |
| Measured RSS load delta (1401.8 MB) | Process RSS growth after loading the model. The 48 MB overhead vs theoretical is reasonable PyTorch / transformers framework overhead. |
| Δ RSS during generate (~0.1–0.4 MB) | The activation memory delta during a single 24-token generation. Top-K activation gating (rows 2 + 3) does not reduce this meaningfully because the weights still occupy fast memory. |
| Latency overhead (~1.04x for top-K=50%) | The cost of applying the gating hook on every transformer block input. Modest. |
The takeaway: Activation-level top-K, the standard published baseline, does not address the weight footprint — and on modern transformer architectures, weight footprint is the binding constraint. The DreamNova primitive targets this gap directly. The 75% design target is on the weight footprint axis, not the activation axis, which is why this baseline shows essentially zero memory savings.
Requirements: Python 3.10+, PyTorch 2.0+, transformers, psutil. Tested on macOS / Linux.
git clone https://github.com/CodeNoLimits/dreamnova-bench
cd dreamnova-bench
pip install torch transformers psutil
python3 bench.py # default: gpt2-medium on auto-detected device
python3 bench.py --model gpt2 # smaller model
python3 bench.py --device cpu # force CPUThe script downloads the model on first run via Hugging Face (no auth required for public models).
- Process RSS over
torch.mps.current_allocated_memory()— on Apple Silicon's unified memory architecture, RSS is the effective resident footprint. The MPS allocator counter only reflects the cache allocator state, not driver-level allocations, and was observed returning 0 in our measurement window. - Median over 5 runs per condition — reduces noise from background processes.
- Greedy decoding (
do_sample=False) — removes sampling variance. - Fixed prompt + fixed max_new_tokens — output sequence length is deterministic.
- The DreamNova patent-novel routing scheme (pre-USPTO Wave 1)
- Topology parameters (held back until USPTO filing)
- Quantization, vLLM, FlashAttention, or MoE composition (Year-1 measurements compose them on top)
The benchmark harness in this repository is released under the MIT License. The underlying language model weights are subject to their own licenses (see Hugging Face model cards).
David Amor · Founder, DreamNova · Jerusalem, Israel
dreamnovaultimate@gmail.com · +972-58-492-1492 · dreamnova.tech