My own Diffusion Language Model

Free-Range, Organic, Hand-Crafted.

Noteworthy Gibberish

step 67000, loss: 1.2239, it/s: 0.7:

To be, and be of men?



Prown AMEN:

O yout aboars of

Ra':

Un

step 77000, loss: 1.0891, it/s: 0.8:

To be, fo hend!



First her sense ountier to Jupits,

be horse.

Wiser words have never been spoken. Trained on an M2 Air 16GB for... a while, idk. Be horse.

Setup

install dependencies via uv

uv sync

add training corpus (single .txt file) in /data and call it input.txt. For example, the tiny Shakespeare dataset:

curl -o data/input.txt https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt

Running the code

Models are saved in checkpoints/checkpoint.pt by default.

Training

Old models are overwritten during training.

uv run train --device cuda (or mps/cpu)

Sampling

uv run sample --query "To be, "

Export to ONNX

uv run export-onnx --checkpoint checkpoints/checkpoint

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My own Diffusion Language Model

Noteworthy Gibberish

Setup

Running the code

Training

Sampling

Export to ONNX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

My own Diffusion Language Model

Noteworthy Gibberish

Setup

Running the code

Training

Sampling

Export to ONNX

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors