RD-Net: Drift-Stabilized Inference for Frozen Large Language Models

Overview

This repository contains a minimal inference-time modification that reduces or delays repetition collapse in long-form text generation on frozen large language models.

The method is simple:
A small drift term is injected into an auxiliary fast-weight memory module during inference. No training, fine-tuning, KV cache manipulation, LoRA, or retraining is required. The underlying model remains frozen.

Preliminary results show that this drift mechanism maintains lower repetition entropy substantially longer than baseline generation under identical settings.

This is early research and requires broader replication.

Abstract

Large language models often enter a repetitive attractor state during extended free-running generation, especially without conditioning or resets. This behavior emerges even with large context windows and sampling strategies like temperature, nucleus sampling, or top-k truncation.

We explore a lightweight inference-time perturbation method using a scheduled Gaussian drift applied to an untrained fast-weight memory module. Initial experiments on Llama-3.1-8B show that this approach delays repetition collapse over long generation sequences, preserving novelty beyond 100k tokens.

These findings are preliminary and require independent verification across model families, inference stacks, and settings.

Why This Might Matter

The effect occurs without touching model weights.
The method is architecture-agnostic (tested only on Llama-3.1 so far).
It may offer a lightweight mitigation for collapse modes in:
- agent loops
- long-context narrative models
- streaming or infinite-generation systems

Whether this scales or generalizes remains an open question.

Installation

git clone https://github.com/chazciii/rd-net
cd rd-net
pip install torch transformers accelerate tqdm

Run the Experiment

python rd_demo_final.py

The script generates two log files:

vanilla_log.txt
rdnet_log.txt

Both use identical sampling settings and context constraints. The only difference is whether drift is applied.

Example Output

Click to expand logs

Example Run (RTX 4090 • CUDA 12.1 • Llama-3.1-8B)

Vanilla (no drift applied):

10k tokens  | rep-4 = 0.7421 | drift = 0.0000
20k tokens  | rep-4 = 0.8923 | drift = 0.0000

RD-Net (drift applied):

10k tokens  | rep-4 = 0.2814 | drift = 0.1123
20k tokens  | rep-4 = 0.2931 | drift = 0.0987
30k tokens  | rep-4 = 0.3012 | drift = 0.0876
40k tokens  | rep-4 = 0.3120 | drift = 0.0791
50k tokens  | rep-4 = 0.3198 | drift = 0.0723
60k tokens  | rep-4 = 0.3245 | drift = 0.0668
70k tokens  | rep-4 = 0.3291 | drift = 0.0621
80k tokens  | rep-4 = 0.3317 | drift = 0.0582
90k tokens  | rep-4 = 0.3340 | drift = 0.0549
100k tokens | rep-4 = 0.3356 | drift = 0.0520
110k tokens | rep-4 = 0.3369 | drift = 0.0495
120k tokens | rep-4 = 0.3378 | drift = 0.0473
130k tokens | rep-4 = 0.3385 | drift = 0.0454
140k tokens | rep-4 = 0.3391 | drift = 0.0437
150k tokens | rep-4 = 0.3396 | drift = 0.0422

Summary of Results

Condition	First Collapse Point	Approx. Final rep-4
Vanilla (frozen model)	~20–24k tokens	~0.89
RD-Net Drift (frozen model)	>150k tokens	~0.34 (stable)

These values vary by run and hardware, and should not be treated as benchmarks.

Replication Matrix

Model	Vanilla Collapse (rep-4 threshold > 0.85)	Drift Run Length	Final rep-4
Llama-3.1-8B	~20–24k tokens	>150k tokens	~0.3396
Mistral-7B (initial replication)	~18–22k tokens	≥150k tokens	~0.3288

More model families welcome (Qwen, Mistral-MoE, Falcon, Phi-3, Mixtral, GGUF/GPTQ variants).

Limitations / Caveats

Results currently rely on a single hardware setup and single model family.
No evaluation yet on coherence, semantics, or downstream task performance.
Effect may depend on sampling configuration.
Drift parameters are heuristic and unoptimized.
Unknown behavior on quantized models (GPTQ/GGUF).

Replication Requests

If you test this on:

Qwen
Falcon
Phi-3
GGUF / GPTQ
CPU-only inference
Agent frameworks

…please submit logs or open an issue. Positive, negative, or neutral results are all useful.

Citation

@misc{cook2025rdnet,
  title={RD-Net: Drift-Stabilized Inference for Frozen LLMs},
  author={Cook, Chaz},
  year={2025},
  url={https://github.com/chazciii/rd-net},
  note={Preprint, work in progress}
}

License

MIT License.

Replication and pull requests welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rd_demo_final.py		rd_demo_final.py
rd_wrapper.py		rd_wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RD-Net: Drift-Stabilized Inference for Frozen Large Language Models

Overview

Abstract

Why This Might Matter

Installation

Run the Experiment

Example Output

Summary of Results

Replication Matrix

Limitations / Caveats

Replication Requests

Citation

License

About

Uh oh!

Releases 1

Packages

Languages

License

chazciii/rd-net

Folders and files

Latest commit

History

Repository files navigation

RD-Net: Drift-Stabilized Inference for Frozen Large Language Models

Overview

Abstract

Why This Might Matter

Installation

Run the Experiment

Example Output

Summary of Results

Replication Matrix

Limitations / Caveats

Replication Requests

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages