-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a Natural Questions Style Question Answering System #334
Conversation
@tholor Before merging, it would also be good to check that the output of the Squad style Inferencer is still compatible with Haystack. I have a suspicion that the output might be nested in another list() layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This is a huge PR. As discussed I am happy to merge this in an initial version and then do some clean-ups later (many of my comments). There's however at least one change required before merge: the inferencer format doesn't comply with latest master (we moved num_processes from the methods to the constructor).
return "", -1 | ||
|
||
def unify_short_answers(self, short_answers, doc_text, tok_to_ch): | ||
""" In cases where an NQ sample has multiple disjoint short answers, this fn generates the single shortest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should verify that this merge of short answers produces meaningful answers. If the disjoint short answers are in very different places of the passage, we will get a very long answer. Not a blocker for merging, but we should test it at some point before we train models for production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree! Will put it on the back log
|
||
answers_clear.append(curr_answer_clear) | ||
answers_tokenized.append(curr_answer_tokenized) | ||
return answers_clear, answers_tokenized | ||
|
||
|
||
def create_samples_qa(dictionary, max_query_len, max_seq_len, doc_stride, n_special_tokens): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the above create_samples_squad()
or is this one used for both SQuAD and NQ now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need it anymore - will delete when all tests pass
* Flatten prediction lists in adaptiive model * Add squad question id to output dict in inference mode
I have thouroughly tested the code, especially changes to squad processing, but also other tasks like ner or doc_classification. This looks good. 2 Things NQ related:
|
…into natural_questions
There is still much missing but I think we could safely merge this branch without destroying too many other tasks : ) We have trained a model on NQ and evaluated on a NQ development set that has been converted into a format useable by haystack (it just comprises short answers or unanswerable questions). I have hijacked SQuAD evaluation scripts to evaluate the NQ model and also a SQuAD model on this NQ development subset. NQ model: SQuAD model: |
This implements a Natural Questions style QA system which can return [span, is_impossible, yes, no] predictions. This is done with two prediction heads (QA and text classification) whose outputs are merged using QuestionAnsweringHead.merge().
It is worth mentioning a few limitations in this implementation:
Predictions Objects are now implemented for QA but not for any other task
The structure of Baskets and Samples is different in SQuAD vs NQ and needs to be cleaned in a future PR
The apply_tokenization() functions of SQuAD and NQ processors are very similar to each other and might need in future to be merged
When performing NQ inference, no probs are returned from the Text Classification Head
While data flows end to end, it has not been tested for performance
Preprocessing of data is not optimized so NQ is still very slow