-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add encoding params(defaut set to utf-8) for train #265
Conversation
Thanks for the PR! I'll take a closer look later this week. It seems related to #253 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should set utf-8 as the default encoding and allow users to modify which encoding is used. There are just several changes that are required before we merge these changes:
- TCTrainer.train() is called from HappyTextClassification.train(). So, we need to add the "encoding" parameter to HappyTextClassification.train()
- We need to add this functionality for HappyTextClassification.eval() and HappyTextClassification.test()
- We also need to add this functionality for HappyGeneration, HappyQuestionAnswering, HappyTextToText and HappyWordPrediction.
- We need to update the documentation
So, thanks again for creating this PR and feel free to help accomplish any of these tasks. I'll complete them if you're too busy, but this would be a great way to learn more about Happy Transformer and to contribute if you have time.
OK , I will try to fix the encoding problems |
I added a encoding param for |
Thanks for contributing! I just created a new PR (#271) that will be merged and published soon. I decided to go with your initial suggestion of hardcoding the encoding format. I explain why within the description for the new PR. I included all of your commits to give you credit for your contributions, and thanks again for your help. |
Version 4.4.0 is now live and contains these changes! |
The initial
_get_data
function intrainer.py
opens file without setting encoding, which may lead to a codec error.