Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would the training code be released? #14

Closed
kingnobro opened this issue Feb 18, 2023 · 2 comments
Closed

Would the training code be released? #14

kingnobro opened this issue Feb 18, 2023 · 2 comments

Comments

@kingnobro
Copy link

Hi, I am interested in your work and want to train a new model based on my specific dataset. Would the code be released soon? Otherwise I have to implement it by myself :(

Also, could you kindly tell me any public available code of the same task? Thanks.

@dpfried
Copy link
Owner

dpfried commented Feb 18, 2023

Hi,

Although the code base we used is an internal version of fairseq and it won't be possible to fully release, I double-checked and Armen's release of the CM3 code in his public fork of fairseq exactly matches the objective that we used: ArmenAg/fairseq@fdc2f7d . Some of it is specific to fairseq, but the causal_masked_dataset.py file ArmenAg/fairseq@fdc2f7d#diff-a27fa7e989dec569c26d7303197cc14280ab08b831b969341a27c96d6f7dbdec has the token masking procedure which should be portable to other frameworks too.

@kingnobro
Copy link
Author

Thanks a lot!!! I think this causal_masked_dataset.py is what I need. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants