[Question] Difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained #557

petezone · 2024-04-27T14:57:09Z

Hi Neel,

Thanks for this powerful tool for interpretation and it helps me a lot in understanding the mechanics of LLMs. But I found there is slight difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained even for wte.weight. I suppose the two weights should be the same because they are all loaded from EleutherAI/gpt-neo-125m. I am not sure if I am wrong with understanding the source code of HookTransformers.from_pretrained. Could you please try to check if such problem exists or could you please figure out where I did it wrong? Thank you very much!

Butanium · 2024-05-16T01:40:05Z

Hey,

If you look at the token probability distribution, they should be the same. TransformerLens performs many operations under the hood to facilitate interpretability (see this explanation).

If you don't want TL to perform this weight processing, use .from_pretrained_no_processing instead of .from_pretrained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained #557

[Question] Difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained #557

petezone commented Apr 27, 2024

Butanium commented May 16, 2024

[Question] Difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained #557

[Question] Difference in gpt-neo-125m weights loading with huggingface from_pretrained and HookTransformers.from_pretrained #557

Comments

petezone commented Apr 27, 2024

Butanium commented May 16, 2024