Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I predict on my own dataset? #16

Open
p-null opened this issue Oct 24, 2019 · 2 comments
Open

How can I predict on my own dataset? #16

p-null opened this issue Oct 24, 2019 · 2 comments

Comments

@p-null
Copy link

p-null commented Oct 24, 2019

Suppose I have a document and a question, I'd like to get the answer span and answer string.

What steps should I take to get what I want?

(I tried to format it as multiqa format, that is like

js_obj = [{"id": "HotpotQA_5a85ea095542994775f606a8",
"context": {
  "documents":[{"text": "passage_sentences"
   }
              ]
        },
"qas":["question_sentence?"]}]

and dump it to test.gz and use predict like
python predict.py --model https://multiqa.s3.amazonaws.com/models/BERTBase/SQuAD1-1.tar.gz --dataset test.gz --dataset_name SQuAD --cuda_device 0

@alontalmor
Copy link
Owner

You would need to save your dataset in the MultiQA format. This format is described in the dataset readme https://github.com/alontalmor/MultiQA/tree/master/datasets, and it also comes with a JSON-schema checker for the format you output. I think the fastest approach is just to copy the code for one of the datasets that close, say SQuAD1.1, make the changes needed, and build your dataset using:
python build_dataset.py --dataset_name MyDataset --split train --output_file path/to/output.jsonl.gz --n_processes 10 (as described in the main readme)

Hope this helps.

@p-null
Copy link
Author

p-null commented Oct 29, 2019

Thanks for the info. I follow the MultiQA format to form the dataset.
It seems that predict.py will also call the evaluation function while we usually don't have the golden label for test dataset.
I got the following error when running prediction on my own dataset. I think It is due ot calling the evalution function.

  0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "predict.py", line 110, in <module>
    predict(args)
  File "predict.py", line 38, in predict
    curr_pred, full_predictions = predictor.predict_json(context)
  File "/content/MultiQA/models/multiqa_predictor.py", line 27, in predict_json
    min(offset+20, len(question_instances))])
  File "/usr/local/lib/python3.6/dist-packages/allennlp/predictors/predictor.py", line 213, in predict_batch_instance
    outputs = self._model.forward_on_instances(instances)
  File "/usr/local/lib/python3.6/dist-packages/allennlp/models/model.py", line 153, in forward_on_instances
    outputs = self.decode(self(**model_input))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MultiQA/models/multiqa_bert.py", line 195, in forward
    f1_score = squad_eval.metric_max_over_ground_truths(squad_eval.f1_score, best_span_string, gold_answer_texts)
  File "/usr/local/lib/python3.6/dist-packages/allennlp/tools/squad_eval.py", line 52, in metric_max_over_ground_truths
    return max(scores_for_ground_truths)
ValueError: max() arg is an empty sequence

It's good to have evaluation metric in evaluate command but usually we don't have golden labels in test data.
Because the data I have only have passage, question id and question. So I fill the fields like answers as "" and do not provide span information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants