Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SmolGPT #73

Closed
bclarkson-code opened this issue Jun 13, 2024 · 1 comment
Closed

Improve SmolGPT #73

bclarkson-code opened this issue Jun 13, 2024 · 1 comment

Comments

@bclarkson-code
Copy link
Owner

SmolGPT (49M) is not very performant. It gave it the following prompt:

def add_one(x):

It completed it as follows:

def add_one(x):
    return 10

I think that there are a number of issues.

First, The model could of course be bigger. With more optimised kernels, modern techniques like rotary embeddings and multi-gpu support, we will hopefully be able to train a larger model in a reasonable amount of time.

Second, the dataset can probably be improved. More evaluation is needed but I think adding some web text has the potential to make using the model easier

@bclarkson-code
Copy link
Owner Author

This was fixed by training GPT-2(124M)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant