Improve SmolGPT #73

bclarkson-code · 2024-06-13T14:41:27Z

SmolGPT (49M) is not very performant. It gave it the following prompt:

def add_one(x):

It completed it as follows:

def add_one(x):
    return 10

I think that there are a number of issues.

First, The model could of course be bigger. With more optimised kernels, modern techniques like rotary embeddings and multi-gpu support, we will hopefully be able to train a larger model in a reasonable amount of time.

Second, the dataset can probably be improved. More evaluation is needed but I think adding some web text has the potential to make using the model easier

bclarkson-code · 2024-07-14T17:24:27Z

This was fixed by training GPT-2(124M)

bclarkson-code closed this as completed Jul 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SmolGPT #73

Improve SmolGPT #73

bclarkson-code commented Jun 13, 2024

bclarkson-code commented Jul 14, 2024

Improve SmolGPT #73

Improve SmolGPT #73

Comments

bclarkson-code commented Jun 13, 2024

bclarkson-code commented Jul 14, 2024