New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix problem with newline in vocab files #619
Conversation
@@ -256,7 +256,7 @@ Dictionary::Gather(const GatherDictionaryArgs& args, | |||
int token_id = 0; | |||
while (!vocab.eof()) { | |||
std::getline(vocab, str); | |||
if (vocab.eof()) | |||
if (vocab.eof() && str.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
My assumption is that both formats are valid -- both with and without the last newline. Is this the case, or actually "new vocab file must be without the last newline;"? |
@ofrei I crafted it to be without the last newline, and in test I checked both files - the old with newline and the new without newline - to ensure we handle both situations properly. |
Ah, I see - if you mean the new test file ( |
Yes, it's good to keep these two files consistent. Sad that they are copy-pasted like this. |
I only wonder whether it's good idea to utilize this code in separate function |
If you could refactor this code to avoid duplication - sure, that would be great. |
Not in this PR. |
Agree) |
Sorry, I didn't put [PR:review OK]. You are welcome to fix any of the comments, but in either case this PR is good -- feel free to merge. |
Whoops, I clicked the wrong button |
@ofrei
src/artm/core/dicitonary.cc
andsrc/artm/core/collection_parser.cc
?