Add a function to convert nanogpt weights #475

adamkarvonen · 2023-12-30T23:43:54Z

Description

I have been using TransformerLens on some models I trained using Karpathy's popular nanogpt repository. There are two complications I had to deal with:

The first is that some state dicts saved after using torch.compile() have an unwanted prefix on keys that needs to be removed. Karpathy deals with it like this. I added this same unwanted prefix removal in the conversion function.

The second is that the nanogpt models can be created with or without bias. By default, there is no bias. This function can handle both cases. To verify that my conversion function works as expected, I created this Colab:

https://colab.research.google.com/drive/1CqMRAezkc2vVJKPiKA3Q7ACdjgzH_y7K?authuser=0#scrollTo=4pB3Ecg7B0X-

In it, I take models that I have created and trained, one with and one without bias. For each stock NanoGPT model, I run a sample input of length 339 and run it through the model. I store the 339 output tokens in expected output. Next, I convert the model to Transformer Lens format using my conversion function. I again forward the same sample input, and check that the outputs exactly match the expected output.

In terms of tests, documentation, comments, code style conventions, etc, I just matched the level of coverage of what shows up when I search 'mingpt' in the codebase. Please let me know if you want additional documentation or testing.

Type of change

[ x] New feature (non-breaking change which adds functionality)

adamkarvonen · 2024-01-05T23:46:15Z

After a second look, it makes a lot more sense to just check if the model uses bias rather than adding a bias parameter that doesn't follow the pattern of other conversion functions. Not sure what I was thinking there.

I also updated the Colab and verified that both models still return the same outputs.

neelnanda-io · 2024-01-16T23:33:22Z

Looks good to me, thanks for adding this!

adamkarvonen added 2 commits December 30, 2023 17:27

Add a function to convert nanogpt weights

739ca08

Remove need for bias parameter

8571e79

neelnanda-io merged commit 5754a0b into TransformerLensOrg:main Jan 16, 2024

adamkarvonen mentioned this pull request Jan 21, 2024

Remove redundant MLP bias assignment #485

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a function to convert nanogpt weights #475

Add a function to convert nanogpt weights #475

Uh oh!

adamkarvonen commented Dec 30, 2023

Uh oh!

adamkarvonen commented Jan 5, 2024

Uh oh!

neelnanda-io commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a function to convert nanogpt weights #475

Add a function to convert nanogpt weights #475

Uh oh!

Conversation

adamkarvonen commented Dec 30, 2023

Description

Type of change

Uh oh!

adamkarvonen commented Jan 5, 2024

Uh oh!

neelnanda-io commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants