Skip to content

bsmith24/tiny_shakespeare_test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Tiny Shakespeare Model Comparison

Training an RWKV-inspired "Student" model vs GPT on character-level Shakespeare using nanoGPT.

Results (1000 iterations)

Model Parameters Final Val Loss Time/Iter
Student (RWKV-style) 0.83M 1.38 ~155ms
GPT 10.65M 1.32 ~525ms

The Student model is 13x smaller and 3x faster per iteration, with only slightly higher loss.

Model Configs

Student: 5 layers, 128 embed dim, linear attention with exponential decay

GPT: 6 layers, 6 heads, 384 embed dim, standard transformer attention

Usage

Run the notebook in Google Colab with GPU runtime.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors