Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problem with newline in vocab files #619

Merged
merged 1 commit into from Jul 22, 2016

Conversation

JeanPaulShapo
Copy link
Contributor

@ofrei

@@ -256,7 +256,7 @@ Dictionary::Gather(const GatherDictionaryArgs& args,
int token_id = 0;
while (!vocab.eof()) {
std::getline(vocab, str);
if (vocab.eof())
if (vocab.eof() && str.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ofrei
Copy link
Contributor

ofrei commented Jul 22, 2016

new vocab file must be without the last newline;

My assumption is that both formats are valid -- both with and without the last newline. Is this the case, or actually "new vocab file must be without the last newline;"?

@JeanPaulShapo
Copy link
Contributor Author

JeanPaulShapo commented Jul 22, 2016

@ofrei I crafted it to be without the last newline, and in test I checked both files - the old with newline and the new without newline - to ensure we handle both situations properly.
I hope no one will edit this file to add newline.

@ofrei
Copy link
Contributor

ofrei commented Jul 22, 2016

Ah, I see - if you mean the new test file (test_data/vocab.parser_test_no_newline.txt) - sure, it must be without the new line. Thanks for adding a test for this!

@ofrei
Copy link
Contributor

ofrei commented Jul 22, 2016

is it good idea to have identical code in src/artm/core/dicitonary.cc and src/artm/core/collection_parser.cc?

Yes, it's good to keep these two files consistent. Sad that they are copy-pasted like this.

@JeanPaulShapo
Copy link
Contributor Author

I only wonder whether it's good idea to utilize this code in separate function
In that case we only need to fix bugs in one place.

@ofrei
Copy link
Contributor

ofrei commented Jul 22, 2016

If you could refactor this code to avoid duplication - sure, that would be great.

@JeanPaulShapo
Copy link
Contributor Author

Not in this PR.

@ofrei
Copy link
Contributor

ofrei commented Jul 22, 2016

Agree)

@ofrei
Copy link
Contributor

ofrei commented Jul 22, 2016

Sorry, I didn't put [PR:review OK]. You are welcome to fix any of the comments, but in either case this PR is good -- feel free to merge.

@JeanPaulShapo
Copy link
Contributor Author

Whoops, I clicked the wrong button

@JeanPaulShapo JeanPaulShapo merged commit 1a31180 into bigartm:master Jul 22, 2016
@JeanPaulShapo JeanPaulShapo deleted the parser_fix branch September 11, 2016 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vocab file in UCI bow require \n in the last line
2 participants