Skip to content

Hotswapping a llama attention layer with a Hyena convolution

Notifications You must be signed in to change notification settings

calculating/hyena-llama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

By saving the activations on just 400 samples from Llama 7B, a hyena operator can be trained which can be swapped in with a minimal drop in perplexity.

comparison of the attention outputs from the original and hyena op

The minimally trained small Hyena op increases perplexity from 1.55 to 1.58. For comparison, replacing the attention output with a matrix of ones increases perplexity to 10.11, and skipping the attention layer increases perplexity to 1.78

About

Hotswapping a llama attention layer with a Hyena convolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages