New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make batchwise adding of evaluation data possible #717
Conversation
All tests pass on my local machine. Not sure why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Would propose a different design though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hello, |
Hey @Rob192 sure, we already use Pathlib in some parts of haystack. Would you like to create a PR with the proposed changes? I guess the change will be minor. Otherwise we can also take over. |
OK @Timoeller ! will do ! |
@Timoeller Raised ticket #714 to catch these platform dependent issue early. Currently Windows users are 15%-20% of overall haystack downloads. |
When trying to add evaluation data containing millions of documents, my computer ran out of memory.
To solve this problem, this PR adds a method to add eval docs batch-wise so that not all eval docs have to be loaded to memory at once. Furthermore, this PR introduces the method
squad_json_to_jsonl
as the required format foradd_eval_data_batchwise
is jsonl.