Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Integrating support for GPT/T5/BART for Question Answering (#4532)
* added class for qa related metrics Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed BLEU code from QA metrics Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added classes for data handling and loading for BERT/T5/BART/GPT Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed unnecassary main function Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added classes for BERT, S2S(T5/BART), GPT question answering models Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * created separate modules for model specific input features Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * moved non-moodel methods to QAMetrics and refactored method names to more intuitive Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * changes classmethods to staticmethods Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed unnecassary copyright Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed deprecated input features file Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * abstracted cache filename, feature loading, feature dumping to QADataset Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed unused imports and added dataclass decorator Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed unused imports and refactored method name Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added base class for QA models and abstracted out common methods Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * moved non-model eval code and predictions file dump to metrics class Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added combined example of train/eval/test/inference for all qa models Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * renamed qa example file Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * fixed trailing whitespaces Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added type casting to float for logger warning Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed unsed import Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * converted cached filename creation to class method Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * moved common code in dataset classes to base class, renamed Features class to Example Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * converted base QA example class to dataclass Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * reduced code repition in prediciton evaluation Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * converted prediction output files to jsonl Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added flag for checking if ground truth present in context spans Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * converted predictions dump to jsonl from json Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * converted nbest predictions dump to jsonl from json Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * removed unused argument to no pad loss method Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added unit tests for qa metrics and dataset utilities Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * applied style fix on new files Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added integration tests Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * restored default values in qa config Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * renamed stage to avoid duplicate Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added init files for new modules Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * applied style fix for module init files Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added inline comments to make concise Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * specified class as abstract Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * specified .json format for output prediction files Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * created separate variable for answer in context check for readability Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * shifted stages to parallel Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * applied style fix Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * restored file modified by linter Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added transformers offline flag to true and moved all stages to parallel Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * moved inference code inside test_ds check Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added script for converting msmarco dataset to squad format Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added tutorial for question answering with generative models Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added copyright header Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * renamed old qa docs with _squad postfix and added docs for new qa modules Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * added generative qa architecture diagram Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * modified tutorial with colab testing changes, improved documentation Signed-off-by: Ameya Mahabaleshwarkar <ameyasm1154@gmail.com> * changed branch name to main in tutorial * deprecated old QA tutorial * deprecated old QA docs * deprecated old QA example * removed deprecated ci test for old qa example * removed additional deprecated ci tests
- Loading branch information