Classification of IMDB data by using model hiddenstates and their sparse encodings
I'm interested in finding out if the sparse autoencoded representations for hidden activations of LLMs are better for classification. The idea was inspired by this paper and my previous work.
You will have to clone the sparse coding repository to access the relevant class definitions. You also need to add the folder to your path.
git clone https://github.com/loganriggs/sparse_coding.git
In addition to that install the requirements
pip install -r requirements.txt
I tested with Python 3.11.5
You can run the notebook with either pythia-70m-deduped
or pythia-410m-deduped
which are the two models for which I had access to pretrained sparse autoencoders.
The sparse representations perform significantly worse on the classification task.