Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Code always produces Max Length Sequences #171

Open
tawheeler opened this issue Mar 10, 2024 · 0 comments
Open

Example Code always produces Max Length Sequences #171

tawheeler opened this issue Mar 10, 2024 · 0 comments

Comments

@tawheeler
Copy link

Thank you for this great package!

I tried modifying the example "Copy Task" code to have a 50% chance of producing a 9-token string and otherwise produce a 10-token string:

sample_data() = (d = join(map(string, rand(1:10, (rand() < 0.5 ? 9 : 10))), ' '); (d,d))

When I train this, the model learns to always produce a 10-token string:
2024-03-09_16-39

I originally noticed this when I changed the code to only produce 1 or 2-token sequences, and there it also would only ever produce 2-token sequences after training. I suspect there is some issue with masking or maybe with the loss function, but I haven't figured it out yet.

FWIW, the loss never gets extremely low (~1e-5) like it does if you only train with 10-token sequences, but reaches about 0.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant