dreamnova-bench

Public, runnable reference baseline for the resident-memory measurement methodology described in DreamNova's Technical Brief.

What this is. A self-contained benchmark harness measuring the resident-memory footprint and inference latency of a small open-weight language model under three regimes: dense baseline, top-K=50% sparse activation, and top-K=25% sparse activation.

What this is NOT. This script does not implement the DreamNova patent-novel runtime primitive. The top-K activation gating used here is the well-published Switch Transformer / Mixture-of-Experts reference baseline. The DreamNova primitive targets 75% weight-footprint reduction at strict iso-quality through a topology-aware reconstruction scheme — that is the patent claim, currently pre-USPTO Wave 1 (May 2026).

The point of this baseline is to give an external reviewer a runnable, transparent measurement methodology they can reproduce, modify, and apply to their own evaluations.

Why this exists

DreamNova's seed pitch makes a measurable claim: 75% reduction in resident model weight footprint at iso-quality. The Year-1 milestone (Q4 2026) is precisely to publish that measured number, on a Tier-1 inference partner's stack, with reproducible methodology.

Until then, this repository ships the measurement infrastructure publicly so that:

Reviewers can verify our methodology is sound (PyTorch + transformers · standard tooling · single forward pass · no privileged hooks).
Reviewers can reproduce the dense baseline numbers themselves on commodity hardware.
The gap between the simple top-K activation baseline shipped here and the DreamNova design target is visible and quantified.

Sample output (Apple M4 Max · `gpt2-medium` · FP32 · MPS)

DreamNova reference baseline · gpt2-medium · mps · torch.float32
==================================================================================
Model · 354.8M params · theoretical weight footprint = 1353.5 MB
Process RSS delta from load = 1401.8 MB

Condition                       Δ RSS (MB)     Latency   Overhead
----------------------------------------------------------------------------------
dense_baseline                        0.36     266.4ms      1.00x
top_k_50pct                           0.11     276.7ms      1.04x
top_k_25pct                           0.09     257.6ms      0.97x
----------------------------------------------------------------------------------
DreamNova design target → 75% reduction of WEIGHT footprint at iso-quality, ≤1.10x latency overhead
  (this baseline measures activation top-K, NOT weight reduction; gap is the patent claim)

Full results in results.json.

How to interpret these numbers

Metric	What it tells you
Theoretical weight footprint (1353.5 MB)	The model's parameter count × dtype byte width. This is the floor: any inference framework must hold at least this much in fast memory.
Measured RSS load delta (1401.8 MB)	Process RSS growth after loading the model. The 48 MB overhead vs theoretical is reasonable PyTorch / transformers framework overhead.
Δ RSS during generate (~0.1–0.4 MB)	The activation memory delta during a single 24-token generation. Top-K activation gating (rows 2 + 3) does not reduce this meaningfully because the weights still occupy fast memory.
Latency overhead (~1.04x for top-K=50%)	The cost of applying the gating hook on every transformer block input. Modest.

The takeaway: Activation-level top-K, the standard published baseline, does not address the weight footprint — and on modern transformer architectures, weight footprint is the binding constraint. The DreamNova primitive targets this gap directly. The 75% design target is on the weight footprint axis, not the activation axis, which is why this baseline shows essentially zero memory savings.

Run it yourself

Requirements: Python 3.10+, PyTorch 2.0+, transformers, psutil. Tested on macOS / Linux.

git clone https://github.com/CodeNoLimits/dreamnova-bench
cd dreamnova-bench
pip install torch transformers psutil
python3 bench.py                   # default: gpt2-medium on auto-detected device
python3 bench.py --model gpt2      # smaller model
python3 bench.py --device cpu      # force CPU

The script downloads the model on first run via Hugging Face (no auth required for public models).

Methodology choices

Process RSS over torch.mps.current_allocated_memory() — on Apple Silicon's unified memory architecture, RSS is the effective resident footprint. The MPS allocator counter only reflects the cache allocator state, not driver-level allocations, and was observed returning 0 in our measurement window.
Median over 5 runs per condition — reduces noise from background processes.
Greedy decoding (do_sample=False) — removes sampling variance.
Fixed prompt + fixed max_new_tokens — output sequence length is deterministic.

What's NOT in this repository

The DreamNova patent-novel routing scheme (pre-USPTO Wave 1)
Topology parameters (held back until USPTO filing)
Quantization, vLLM, FlashAttention, or MoE composition (Year-1 measurements compose them on top)

License

The benchmark harness in this repository is released under the MIT License. The underlying language model weights are subject to their own licenses (see Hugging Face model cards).

Contact

David Amor · Founder, DreamNova · Jerusalem, Israel dreamnovaultimate@gmail.com · +972-58-492-1492 · dreamnova.tech

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
bench.py		bench.py
requirements.txt		requirements.txt
results.json		results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dreamnova-bench

Why this exists

Sample output (Apple M4 Max · `gpt2-medium` · FP32 · MPS)

How to interpret these numbers

Run it yourself

Methodology choices

What's NOT in this repository

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dreamnova-bench

Why this exists

Sample output (Apple M4 Max · gpt2-medium · FP32 · MPS)

How to interpret these numbers

Run it yourself

Methodology choices

What's NOT in this repository

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Sample output (Apple M4 Max · `gpt2-medium` · FP32 · MPS)

Packages