NeuroGRACE-Spec

Grammar- and Resource-Aligned Certifiable Speculative Decoding

NeuroGRACE-Spec is a speculative decoding framework that injects grammatical/semantic constraints and resource costs directly into both the target conditional distribution q_λ and the proposal distribution r_λ. Combined with a per-token certifiable accept–residual sampler, the method preserves exact sampling from q_λ while enabling high-throughput block verification with a small proposal model.

Features

Grammar-constrained sampling: Inject feasibility masks gi(a | h) to enforce grammatical and semantic constraints
Cost-aligned distributions: Incorporate resource costs via cost-to-go estimates Ĵ(h) and incremental costs ∆i(a)
Certifiable accept-residual sampling: Per-token verification that preserves exact sampling from target distribution
High-throughput block verification: Propose K tokens, verify with one teacher-forced pass, sample residual on rejection
Adaptive λ control: Automatically adjust effort/precision weight λ based on cost constraints
Numerically stable: All operations in log-space to prevent underflow/overflow

Installation

pip install -r requirements.txt

Quick Start

from neurograce_spec import (
    TargetDistribution,
    ProposalDistribution,
    BlockVerification,
    CostToGoEstimator,
    GrammarMask
)

# Create your large and small models
large_model = ...  # Your large target model
small_model = ...  # Your small proposal model

# Create grammar mask and cost estimator
grammar_mask = GrammarMask(vocab, vocab_inv)
cost_estimator = CostToGoEstimator(vocab_size=vocab_size)

# Create distributions
target_dist = TargetDistribution(
    large_model=large_model,
    grammar_mask=grammar_mask,
    cost_estimator=cost_estimator,
    lambda_weight=1.0
)

proposal_dist = ProposalDistribution(
    small_model=small_model,
    grammar_mask=grammar_mask,
    cost_estimator=cost_estimator,
    lambda_weight=1.0
)

# Create block verifier
block_verifier = BlockVerification(
    target_dist=target_dist,
    proposal_dist=proposal_dist,
    block_size=4
)

# Generate sequence
prefix = [token1, token2, ...]
emitted_tokens, metadata = block_verifier.verify_block(prefix, verbose=True)

Core Components

Distributions

Target Distribution q_λ (Eq. 1):

q_λ(a | h) = PT(a | h) * gi(a | h) * e^(-λ * ∆i(a))
             / Σ_b PT(b | h) * gi(b | h) * e^(-λ * ∆i(b))

Proposal Distribution r_λ (Eq. 2):

r_λ(a | h) = PS(a | h) * gi(a | h) * e^(-λ * ∆̂i(a))
             / Σ_b PS(b | h) * gi(b | h) * e^(-λ * ∆̂i(b))

Block Verification (Algorithm 1)

Generate K tokens with small model: y₁:K ~ ∏ r_λ(· | h_{t-1})
Run one teacher-forced pass of large model to get {q_λ(· | h^(j))} for j=1...K
Verify sequentially:
- Compute accept probability: α = min(1, q_λ(y_j | h') / r_λ(y_j | h'))
- If accepted: emit token, continue
- If rejected: sample from residual q_res, stop block

Residual distribution (Eq. 4):

q_res(a | h) ∝ q_λ(a | h) - min(q_λ(a | h), r_λ(a | h))

Adaptive λ Control (Eq. 5)

Maintains cost constraint E[C] ≤ τ by updating:

λ ← [λ + η(Ê[C] - τ)]_+

File Structure

neurograce_spec/
├── __init__.py              # Package exports
├── distributions.py         # q_λ and r_λ distributions
├── block_verification.py    # Algorithm 1: Block verification
├── cost_estimator.py        # Cost-to-go estimation
├── grammar.py              # Grammar/semantic masks
├── adaptive_control.py      # Adaptive λ control
└── utils.py                # Numerical stability utilities

examples/
└── example_usage.py         # Example script

requirements.txt            # Dependencies
README.md                   # This file

Mathematical Foundations

Notation

A: Token alphabet
h ∈ H: Prefix/history state
PT(a | h): Large model conditional
PS(a | h): Small model conditional
gi(a | h) ∈ {0, 1}: Grammar/semantic feasibility mask
Ĵ(h): Cost-to-go approximation
∆i(a) = Ĵ(next(h, a)) - Ĵ(h): Incremental cost
λ ≥ 0: Effort/precision weight
K: Speculative block size

Exactness Guarantee

The accept-residual mechanism ensures that the per-position marginals match q_λ exactly, preserving the target distribution while enabling efficient block verification.

Examples

Basic Example

See examples/example_usage.py for a complete working example with dummy models.

python examples/example_usage.py

Real LLM Test

Test with actual transformer models from Hugging Face:

python examples/test_real_llm.py

This will:

Load GPT-2 (large) and DistilGPT-2 (small) models
Generate text using NeuroGRACE-Spec
Show accept rate and compare with baseline generation

Custom Configuration:

from examples.test_real_llm import test_with_real_models

test_with_real_models(
    large_model_name="gpt2",
    small_model_name="distilgpt2",
    prompt="Your prompt here",
    max_tokens=50,
    block_size=4,
    device="cpu"  # or "cuda" for GPU
)

See examples/README.md for more details.

References

Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience.
Lieder, F., & Griffiths, T. L. (2020). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.
Leviathan, Y., Kalman, M., & Matias, Y. (2023). Fast inference from transformers via speculative decoding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeuroGRACE-Spec

Features

Installation

Quick Start

Core Components

Distributions

Block Verification (Algorithm 1)

Adaptive λ Control (Eq. 5)

File Structure

Mathematical Foundations

Notation

Exactness Guarantee

Examples

Basic Example

Real LLM Test

References

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
examples		examples
neurograce_spec		neurograce_spec
README.md		README.md
requirements.txt		requirements.txt

FastLM/NeuroSpec

Folders and files

Latest commit

History

Repository files navigation

NeuroGRACE-Spec

Features

Installation

Quick Start

Core Components

Distributions

Block Verification (Algorithm 1)

Adaptive λ Control (Eq. 5)

File Structure

Mathematical Foundations

Notation

Exactness Guarantee

Examples

Basic Example

Real LLM Test

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages