You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi and thank you for your amazing work! I would like to train GPT-2 in Colab TPU on non-natural language sequential categorical data like server logs, medical records or weather events. What do I have to change in your code to prepare a dataset with word-level encoding (instead of BPE) and successfully run training?
P.S. I think I would be very useful for the community if we have a quick tutorial section on this in Readme.
Thank you!
The text was updated successfully, but these errors were encountered:
There already is a small tutorial in the README under the heading "Using Your Own Data". Unfortunately, it's not very beginner friendly, due to how shoddy my code is overall. I haven't tested this code in colab, and word-level encoding is not implemented. This project is somewhat in "archive" mode, as I currently have no time or intention to improve on it. I would recommend looking into other more mature LM implementations, such as Hugging Face's Transformers library. Hope that helps!
Hi and thank you for your amazing work! I would like to train GPT-2 in Colab TPU on non-natural language sequential categorical data like server logs, medical records or weather events. What do I have to change in your code to prepare a dataset with word-level encoding (instead of BPE) and successfully run training?
P.S. I think I would be very useful for the community if we have a quick tutorial section on this in Readme.
Thank you!
The text was updated successfully, but these errors were encountered: