Apply SVD to Transformer weights
Based on a Conjecture Publication
[2] https://github.com/BerenMillidge/svd_directions
Examples are provided in svd_directions/examples.py
I created a small blog post on this topic on my blog.
- TopKTable should show information about the layer and head
- maybe make it possible to show different embeddings at the same time
- show a plot of the correlation between singular values and the cosine similarity of the embeddings(action, state, return, ...)