You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this powerful tool for interpretation and it helps me a lot in understanding the mechanics of LLMs. But I found there is slight difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained even for wte.weight. I suppose the two weights should be the same because they are all loaded from EleutherAI/gpt-neo-125m. I am not sure if I am wrong with understanding the source code of HookTransformers.from_pretrained. Could you please try to check if such problem exists or could you please figure out where I did it wrong? Thank you very much!
The text was updated successfully, but these errors were encountered:
If you look at the token probability distribution, they should be the same. TransformerLens performs many operations under the hood to facilitate interpretability (see this explanation).
If you don't want TL to perform this weight processing, use .from_pretrained_no_processing instead of .from_pretrained.
Hi Neel,
Thanks for this powerful tool for interpretation and it helps me a lot in understanding the mechanics of LLMs. But I found there is slight difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained even for wte.weight. I suppose the two weights should be the same because they are all loaded from EleutherAI/gpt-neo-125m. I am not sure if I am wrong with understanding the source code of HookTransformers.from_pretrained. Could you please try to check if such problem exists or could you please figure out where I did it wrong? Thank you very much!
The text was updated successfully, but these errors were encountered: