Skip to content

Conversation

@adamkarvonen
Copy link
Contributor

Description

I have been using TransformerLens on some models I trained using Karpathy's popular nanogpt repository. There are two complications I had to deal with:

The first is that some state dicts saved after using torch.compile() have an unwanted prefix on keys that needs to be removed. Karpathy deals with it like this. I added this same unwanted prefix removal in the conversion function.

The second is that the nanogpt models can be created with or without bias. By default, there is no bias. This function can handle both cases. To verify that my conversion function works as expected, I created this Colab:

https://colab.research.google.com/drive/1CqMRAezkc2vVJKPiKA3Q7ACdjgzH_y7K?authuser=0#scrollTo=4pB3Ecg7B0X-

In it, I take models that I have created and trained, one with and one without bias. For each stock NanoGPT model, I run a sample input of length 339 and run it through the model. I store the 339 output tokens in expected output. Next, I convert the model to Transformer Lens format using my conversion function. I again forward the same sample input, and check that the outputs exactly match the expected output.

In terms of tests, documentation, comments, code style conventions, etc, I just matched the level of coverage of what shows up when I search 'mingpt' in the codebase. Please let me know if you want additional documentation or testing.

Type of change

  • [ x] New feature (non-breaking change which adds functionality)

@adamkarvonen
Copy link
Contributor Author

After a second look, it makes a lot more sense to just check if the model uses bias rather than adding a bias parameter that doesn't follow the pattern of other conversion functions. Not sure what I was thinking there.

I also updated the Colab and verified that both models still return the same outputs.

@neelnanda-io neelnanda-io merged commit 5754a0b into TransformerLensOrg:main Jan 16, 2024
@neelnanda-io
Copy link
Collaborator

Looks good to me, thanks for adding this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants