Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Tokenizer failed to handle the final line of listed projects when missing newline character #59

Open
Xiaoven opened this issue Nov 23, 2023 · 0 comments

Comments

@Xiaoven
Copy link

Xiaoven commented Nov 23, 2023

Steps to reproduce

When prepare the file assigned to FILE_projects_list in config.ini, DO NOT end with an empty line:

path/to/project1.zip
path/to/project2.zip

Run python tokenizer.py zip, then check the log file and see:

[INFO] (MainThread) Starting zip project <1, path/to/project1.zip> (process 0)
...
[INFO] (MainThread) Starting zip project <2, path/to/project2.zi> (process 0)

The path of the last project is handled incorrectly which results in project not found.


This may caused by proj_paths.append(line[:-1]) in tokenizers/file-level/tokenizer.py .

Recommend to use line.strip() instead of line[:-1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant