The evaluation of the reader can be done with the all_predictions object:

https://github.com/fmikaelian/cdQA/blob/3583256cbf73e8f3674f57182e824a5ca7c4f7be/cdqa/reader/bertqa_sklearn.py#L623-L634

https://github.com/fmikaelian/cdQA/blob/3583256cbf73e8f3674f57182e824a5ca7c4f7be/cdqa/reader/bertqa_sklearn.py#L649-L650

Here is a reproducible example on dev SQuAD:

import os
from ast import literal_eval
import pandas as pd
import joblib
import json

from cdqa.utils.filters import filter_paragraphs
from cdqa.reader.bertqa_sklearn import BertProcessor, BertQA
from cdqa.utils.metrics import evaluate

df = pd.read_csv('../data/bnpp_newsroom_v1.1/bnpp_newsroom-v1.1.csv', converters={'paragraphs': literal_eval})

df['paragraphs'] = df['paragraphs'].apply(filter_paragraphs)
df['content'] = df['paragraphs'].apply(lambda x: ' '.join(x))

reader = joblib.load('../models/bert_qa_squad_v1.1_sklearn/bert_qa_squad_v1.1_sklearn.joblib')
reader.output_dir = '../logs/'

processor = BertProcessor(do_lower_case=True, is_training=False)
examples, features = processor.fit_transform(X='../data/dev-v1.1.json') # replace by custom dataset
preds = reader.predict((examples, features))

with open('../data/dev-v1.1.json') as dataset_file:
    dataset_json = json.load(dataset_file)
    dataset = dataset_json['data']

with open('../logs/predictions.json') as all_predictions:
    all_predictions = json.load(all_predictions)

evaluate(dataset, all_predictions) # as json objects

{'exact_match': 81.2488174077578, 'f1': 88.43242225358777}

To evaluate on a custom dataset we just need to replace the input dataset by dev-v1.1.json.

Evaluating the whole pipeline seems to be a different story though... We will need to use our brains for a bit!

Allow for model evaluation directly from cdqa #104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions