# COVID Misinformation QA

- Author: Douglas Raevan Faisal

**NOTES**: This notebook was originally made for a course project. Please use this only as a reference. The source data are available upon request.

****
QA method:
- Haystack
- Elasticsearch
- Dense Passage Retrieval (DPR)
- RoBERTa pre-trained

****
This notebook is adapted from the tutorials provided by Haystack.

## 1. Prepare Dependencies

In [None]:
!pip install -q rouge-score

[0m

In [None]:
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest master of Haystack
!pip install --upgrade pip
!pip install -q git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip
  Downloading pip-22.1.1-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 33.5 MB/s 
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.1.1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.8/321.8 kB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m92.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.7/374.7 kB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━

In [None]:
# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2

import os
from subprocess import Popen, PIPE, STDOUT

es_server = Popen(
    ["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1)  # as daemon
)
# wait until ES has started
! sleep 30

In [None]:
import os
import pandas as pd

from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
from haystack.nodes import FARMReader, TransformersReader, DensePassageRetriever, BM25Retriever

from haystack.document_stores import FAISSDocumentStore
from haystack.document_stores import ElasticsearchDocumentStore

from haystack import Document
from haystack.pipelines import ExtractiveQAPipeline

INFO - haystack.modeling.model.optimization -  apex not found, won't use it. See https://nvidia.github.io/apex/
ERROR - root -  Failed to import 'magic' (from 'python-magic' and 'python-magic-bin' on Windows). FileTypeClassifier will not perform mimetype detection on extensionless files. Please make sure the necessary OS libraries are installed if you need this functionality.


## 2. Import Data

The articles were collected from Poynter's IFCN (https://www.poynter.org/ifcn-covid-19-misinformation/) through web scraping. This website consists of over 10,000 misinformations that have been verified by officials.

In [None]:
BASE_DIR = '.'

Mounted at /content/drive


In [None]:
os.listdir(BASE_DIR)

['poynter_web_scraping_results.csv',
 'poynter_web_scraping_results_v2.csv',
 'misinfo_retrieval_test.csv']

In [None]:
df = pd.read_csv(BASE_DIR + 'poynter_web_scraping_results.csv')
df = df.drop_duplicates('link')
df

Unnamed: 0,title,link,body
0,MISLEADING: A new blood clot warning was added to the country’s tobacco pack...,https://www.poynter.org/?ifcn_misinformation=a-new-blood-clot-warning-was-ad...,"Explanation: The blood clot warning predates the pandemic, a tobacco control..."
1,FALSE: There was no epidemic of H3N2 in Brazil. Those were undercover cases ...,https://www.poynter.org/?ifcn_misinformation=there-was-no-epidemic-of-h3n2-i...,Explanation: Brazilian health agencies did recorded local epidemics of influ...
2,"FALSE: In a Pfizer vaccine clinical trial, all pregnant women lost their bab...",https://www.poynter.org/?ifcn_misinformation=in-a-pfizer-vaccine-clinical-tr...,Explanation: This is a misinterpretation of research data.
3,FALSE: An image of a miles-long line of cars and trucks shows the 2022 “Free...,https://www.poynter.org/?ifcn_misinformation=an-image-of-a-miles-long-line-o...,Explanation: The photographer who shot the aerial image says it was taken in...
4,FALSE: The cardiac unit of a children’s hospital in Toronto was “expanded” i...,https://www.poynter.org/?ifcn_misinformation=the-cardiac-unit-of-a-childrens...,Explanation: SickKids said the hospital has not expanded services within its...
...,...,...,...
12475,False: Saddam Hussein predicted the coronavirus outbreak 40 years ago.,https://www.poynter.org/?ifcn_misinformation=saddam-hussein-predicted-the-co...,Explanation: The video is edited and was shot before the Iraq war. In the vi...
12476,False: Eight COVID-19 patients in Ghana have recovered.,https://www.poynter.org/?ifcn_misinformation=eight-covid-19-patients-in-ghan...,Explanation: The Ghana Health Service has rejected reports that 8 coronaviru...
12477,Unproven: Persons with blood type A are more prone to coronavirus.,https://www.poynter.org/?ifcn_misinformation=a-photo-of-coffins-of-dead-peop...,Explanation: There is no scientific evidence to show that people with blood ...
12478,FALSE: People with type A blood are more prone to get the new coronavirus.,https://www.poynter.org/?ifcn_misinformation=people-with-type-a-blood-are-mo...,Explanation: The study is preliminary and there is no scientific evidence th...


## 3. Document Store

Elasticsearch is used as a document store for the haystack. Alternatively, you can use FAISS (although it is incompatible with BM25 retriever, see Haystack's documentation).

Notes:
- Dalam beberapa kali eksperimen, karena Elasticsearch dijalankan pada server terpisah, kadang ada beberapa masalah terkait HTTP Connection.

In [None]:
# document_store = FAISSDocumentStore(faiss_index_factory_str="Flat")

In [None]:
# Connect to Elasticsearch
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

INFO - haystack.telemetry -  Haystack sends anonymous usage data to understand the actual usage and steer dev efforts towards features that are most meaningful to users. You can opt-out at anytime by calling disable_telemetry() or by manually setting the environment variable HAYSTACK_TELEMETRY_ENABLED as described for different operating systems on the documentation page. More information at https://haystack.deepset.ai/guides/telemetry


**Concatenate The Title and Body elements**

The text content is defined as the concatenation of title and body elements of the scraped articles.

In [None]:
df['content'] = df.apply(lambda row: row['title'] + " " + row['body'], axis=1)
df

Unnamed: 0,title,link,body,content
0,MISLEADING: A new blood clot warning was added to the country’s tobacco pack...,https://www.poynter.org/?ifcn_misinformation=a-new-blood-clot-warning-was-ad...,"Explanation: The blood clot warning predates the pandemic, a tobacco control...",MISLEADING: A new blood clot warning was added to the country’s tobacco pack...
1,FALSE: There was no epidemic of H3N2 in Brazil. Those were undercover cases ...,https://www.poynter.org/?ifcn_misinformation=there-was-no-epidemic-of-h3n2-i...,Explanation: Brazilian health agencies did recorded local epidemics of influ...,FALSE: There was no epidemic of H3N2 in Brazil. Those were undercover cases ...
2,"FALSE: In a Pfizer vaccine clinical trial, all pregnant women lost their bab...",https://www.poynter.org/?ifcn_misinformation=in-a-pfizer-vaccine-clinical-tr...,Explanation: This is a misinterpretation of research data.,"FALSE: In a Pfizer vaccine clinical trial, all pregnant women lost their bab..."
3,FALSE: An image of a miles-long line of cars and trucks shows the 2022 “Free...,https://www.poynter.org/?ifcn_misinformation=an-image-of-a-miles-long-line-o...,Explanation: The photographer who shot the aerial image says it was taken in...,FALSE: An image of a miles-long line of cars and trucks shows the 2022 “Free...
4,FALSE: The cardiac unit of a children’s hospital in Toronto was “expanded” i...,https://www.poynter.org/?ifcn_misinformation=the-cardiac-unit-of-a-childrens...,Explanation: SickKids said the hospital has not expanded services within its...,FALSE: The cardiac unit of a children’s hospital in Toronto was “expanded” i...
...,...,...,...,...
12475,False: Saddam Hussein predicted the coronavirus outbreak 40 years ago.,https://www.poynter.org/?ifcn_misinformation=saddam-hussein-predicted-the-co...,Explanation: The video is edited and was shot before the Iraq war. In the vi...,False: Saddam Hussein predicted the coronavirus outbreak 40 years ago. Expla...
12476,False: Eight COVID-19 patients in Ghana have recovered.,https://www.poynter.org/?ifcn_misinformation=eight-covid-19-patients-in-ghan...,Explanation: The Ghana Health Service has rejected reports that 8 coronaviru...,False: Eight COVID-19 patients in Ghana have recovered. Explanation: The Gha...
12477,Unproven: Persons with blood type A are more prone to coronavirus.,https://www.poynter.org/?ifcn_misinformation=a-photo-of-coffins-of-dead-peop...,Explanation: There is no scientific evidence to show that people with blood ...,Unproven: Persons with blood type A are more prone to coronavirus. Explanati...
12478,FALSE: People with type A blood are more prone to get the new coronavirus.,https://www.poynter.org/?ifcn_misinformation=people-with-type-a-blood-are-mo...,Explanation: The study is preliminary and there is no scientific evidence th...,FALSE: People with type A blood are more prone to get the new coronavirus. E...


In [None]:
# # Clear documents
# document_store.delete_all_documents()

In [None]:
docs = [Document(row['content'], 
                 meta={
                     'title': row['title'], 
                     'link': row['link'], 
                     'explanation': row['body']
                }) for _,row in df.iterrows()]
document_store.write_documents(docs)

## 4. Build Retriever

Two variations:
- Dense Passage Retriever
- BM25 Retriever

In [None]:
dpr_retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
    passage_embedding_model="facebook/dpr-ctx_encoder-single-nq-base",
    max_seq_len_query=64,
    max_seq_len_passage=256,
    batch_size=16,
    use_gpu=True,
    embed_title=True,
    use_fast_tokenizers=True,
)

document_store.update_embeddings(dpr_retriever)

INFO - haystack.modeling.utils -  Using devices: CUDA:0
INFO - haystack.modeling.utils -  Number of GPUs: 1


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/493 [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find facebook/dpr-question_encoder-single-nq-base locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...


Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  Loaded facebook/dpr-question_encoder-single-nq-base


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/492 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizerFast'.
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find facebook/dpr-ctx_encoder-single-nq-base locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...


Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  Loaded facebook/dpr-ctx_encoder-single-nq-base
INFO - haystack.document_stores.elasticsearch -  Updating embeddings for all 12178 docs ...


Updating embeddings:   0%|          | 0/12178 [00:00<?, ? Docs/s]

Create embeddings:   0%|          | 0/10000 [00:00<?, ? Docs/s]

Create embeddings:   0%|          | 0/2192 [00:00<?, ? Docs/s]

In [None]:
bm25_retriever = BM25Retriever(document_store=document_store)

In [None]:
document_store.get_document_count()

12178

## 5. Reader (Extractive QA)

In [None]:
# Load a  local model or any of the QA models on
# Hugging Face's model hub (https://huggingface.co/models)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

INFO - haystack.modeling.utils -  Using devices: CUDA:0
INFO - haystack.modeling.utils -  Number of GPUs: 1
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find deepset/roberta-base-squad2 locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...


Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/473M [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  Loaded deepset/roberta-base-squad2


Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

INFO - haystack.modeling.utils -  Using devices: CUDA
INFO - haystack.modeling.utils -  Number of GPUs: 1
INFO - haystack.modeling.infer -  Got ya 2 parallel workers to do inference ...
INFO - haystack.modeling.infer -   0     0  
INFO - haystack.modeling.infer -  /w\   /w\ 
INFO - haystack.modeling.infer -  /'\   / \ 


## 6. Build a Pipeline

Haystack has provided a default pipeline for combining a retriever and a reader sequentially. However, in this context, I attempted to create a custom pipeline to accommodate the different context contained within the text contents, where the title part describes the misinformation while the body part describes why it is false or how it has been debunked.

In [None]:
pipe = ExtractiveQAPipeline(reader, dpr_retriever)

In [None]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k for retriever, the better (but also the slower) your answers.
prediction = pipe.run(
    query="Can Ivermectin replace vaccines?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)

print_answers(prediction, details="minimum")

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  3.42 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 10.16 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 29.16 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 29.18 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 30.13 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 36.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 36.79 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 40.71 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 45.37 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 40.37 Batches/s]


Query: Can Ivermectin replace vaccines?
Answers:
[   {   'answer': 'Japan is now replacing vaccines',
        'context': 'FALSE: Japan is now replacing vaccines with ivermectin. '
                   'Explanation: Japanese health authorities do not recommend '
                   'ivermectin as a treatment against CO'},
    {   'answer': 'Ivermectin can prevent COVID-19',
        'context': 'MISLEADING: Ivermectin can prevent COVID-19. Explanation: '
                   'Ivermectin was included by Resolution No. 259, in the '
                   'National List of Essential Medicines 2'},
    {   'answer': 'Ivermectin cures the new coronavirus',
        'context': 'FALSE: Ivermectin cures the new coronavirus. Explanation: '
                   'As of July 23, Ivermectin had not been tested in humans '
                   'with the new coronavirus.'},
    {   'answer': 'it is not possible at this time to claim that ivermectin is '
                  'a cure for COVID-19',
        'context': 'c




In [None]:
class CustomMisinformationQAPipeline():

  def __init__(self, retriever, reader):
    self.retriever = retriever
    self.reader = reader
  
  def run(self, query, params):
    default_params = {"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}
    retrieved_docs = self.retriever.retrieve(query, top_k=params['Retriever']['top_k'])
    explanations = [doc.meta['explanation'] for doc in retrieved_docs]
    result = self.reader.predict_on_texts(
        question=query,
        texts=explanations,
        top_k=params['Reader']['top_k']
    )
    result['documents'] = retrieved_docs
    return result

custom_pipeline = CustomMisinformationQAPipeline(dpr_retriever, reader)

In [None]:
prediction = custom_pipeline.run("Can Ivermectin replace vaccines?", 
                    params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}})

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 14.12 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.95 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 13.92 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 42.77 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 38.82 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 25.20 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 40.74 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 40.17 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 42.18 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 41.20 Batches/s]


In [None]:
print_answers(prediction, details='minimum')


Query: Can Ivermectin replace vaccines?
Answers:
[   {   'answer': 'It cannot be used without this consent',
        'context': 'us (COVID-19), "under medical protocol and informed '
                   'consent," according to a note from the Ministry of Health. '
                   'It cannot be used without this consent.'},
    {   'answer': 'Japanese health authorities do not recommend ivermectin as '
                  'a treatment against COVID-19',
        'context': 'Explanation: Japanese health authorities do not recommend '
                   'ivermectin as a treatment against COVID-19, and almost 80% '
                   "of the country's population is va"},
    {   'answer': 'clinical trials replicating this result are still lacking. '
                  'Therefore, it is not possible at this time to claim that '
                  'ivermectin is a cure for COVID-19',
        'context': ' clinical trials replicating this result are still '
                   'lacking. Therefore

In [None]:
# bm25
custom_bm25_pipeline = CustomMisinformationQAPipeline(bm25_retriever, reader)
prediction = custom_bm25_pipeline.run("Can Ivermectin replace vaccines?", 
                    params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}})
print_answers(prediction, details='minimum')

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.47 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 25.92 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 13.11 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 28.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.66 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.28 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.42 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 35.70 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.57 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.56 Batches/s]



Query: Can Ivermectin replace vaccines?
Answers:
[   {   'answer': 'Japan did not stop vaccinating its population, nor did it '
                  'authorize the use of ivermectin for COVID-19 cases',
        'context': 'Explanation: Japan did not stop vaccinating its '
                   'population, nor did it authorize the use of ivermectin for '
                   'COVID-19 cases.'},
    {   'answer': "doesn't replace the search for a vaccine",
        'context': "sting chlorine dioxide as a treatment for COVID-19, it's "
                   'dangerous. They do test hydroxychloroquine but this '
                   "doesn't replace the search for a vaccine."},
    {   'answer': 'it is extremely difficult to find infectious disease '
                  'experts who say it could be considered an alternative to '
                  'vaccines',
        'context': 'ial for Covid-19 patients, but it is extremely difficult '
                   'to find infectious disease experts who say it co

## 7. Retrieval Evaluation

In [None]:
# Helpers

def ir_precision(true_docs_link, pred_docs_link):
  true = set(true_docs_link)
  pred = set(pred_docs_link)
  tp = len(true.intersection(pred))
  return tp / len(pred)

def ir_recall(true_docs_link, pred_docs_link):
  true = set(true_docs_link)
  pred = set(pred_docs_link)
  tp = len(true.intersection(pred))
  return tp / len(true)

def ir_f1(true_docs_link, pred_docs_link):
  prec = ir_precision(true_docs_link, pred_docs_link)
  rec = ir_recall(true_docs_link, pred_docs_link)
  if (prec + rec == 0):
    return 0.0
  return (2 * prec * rec) / (prec + rec)



In [None]:
df_test = pd.read_csv(BASE_DIR + 'misinfo_retrieval_test.csv')
df_test.head(3)

Unnamed: 0,No,Keyword,Question,Relevant Link(s),Answer
0,1,Flurona,Is flurona a new COVID-19 variant?,https://www.poynter.org/?ifcn_misinformation=flurona-is-a-new-variant-of-the...,such cases are rare
1,2,Ivermectin,Can Ivermectin replace vaccines?,https://www.poynter.org/?ifcn_misinformation=japan-is-now-replacing-vaccines...,Japanese health authorities do not recommend ivermectin as a treatment again...
2,3,Sinovac,Is the Sinovac vaccine dangerous?,https://www.poynter.org/?ifcn_misinformation=the-sinovac-vaccine-caused-the-...,There is no evidence that the boy died because of the jab


In [None]:
new_relevant_links = []
all_links = df['link'].to_list()

for _,relevant_links in df_test['Relevant Link(s)'].iteritems():
  new_relev_link = [i for i in relevant_links.split("\n") if i in all_links]
  new_relevant_links.append("\n".join(new_relev_link))

df_test['Relevant Link(s)'] = new_relevant_links
df_test.head(3)

Unnamed: 0,No,Keyword,Question,Relevant Link(s),Answer
0,1,Flurona,Is flurona a new COVID-19 variant?,https://www.poynter.org/?ifcn_misinformation=flurona-is-a-new-variant-of-the...,such cases are rare
1,2,Ivermectin,Can Ivermectin replace vaccines?,https://www.poynter.org/?ifcn_misinformation=japan-is-now-replacing-vaccines...,Japanese health authorities do not recommend ivermectin as a treatment again...
2,3,Sinovac,Is the Sinovac vaccine dangerous?,,There is no evidence that the boy died because of the jab


In [None]:
# test

def eval_retriever(retriever, df_test):

  test_results = []

  for k in [1,5,10,-1]:

    precision_scores = []
    recall_scores = []
    f1_scores = []

    for _,row in df_test.iterrows():
      query = row['Question']
      k_val = k
      if row['Relevant Link(s)'].strip() == "":
        continue  
      if k == -1:
        k_val = len(row['Relevant Link(s)'].split('\n'))
      true_links = row['Relevant Link(s)'].split('\n')[:k_val]
      result = retriever.retrieve(query, top_k=k_val)
      pred_links = [i.meta['link'] for i in result]

      precision_scores.append(ir_precision(true_links, pred_links))
      recall_scores.append(ir_recall(true_links, pred_links))
      f1_scores.append(ir_f1(true_links, pred_links))
    
    test_results.append({
        "k": k,
        "precision": precision_scores,
        "recall": recall_scores,
        "f1": f1_scores
    })
  
  return test_results

def print_retrieval_scores(retr_result):
  for res in retr_result:
    print("=====================")
    if res['k'] == -1:
      print("P:", sum(res['precision'])/len(res['precision']))
      print("R:", sum(res['recall'])/len(res['recall']))
      print("F1:", sum(res['f1'])/len(res['f1']))
    else:
      print("P@{}:".format(res['k']), sum(res['precision'])/len(res['precision']))
      print("R@{}:".format(res['k']), sum(res['recall'])/len(res['recall']))
      print("F1@{}:".format(res['k']), sum(res['f1'])/len(res['f1']))
    print("=====================")

dpr_results = eval_retriever(dpr_retriever, df_test)
bm25_results = eval_retriever(bm25_retriever, df_test)

In [None]:
print_retrieval_scores(dpr_results)

P@1: 0.09090909090909091
R@1: 0.09090909090909091
F1@1: 0.09090909090909091
P@5: 0.03636363636363637
R@5: 0.18181818181818182
F1@5: 0.060606060606060615
P@10: 0.02727272727272728
R@10: 0.2727272727272727
F1@10: 0.04958677685950413
P: 0.09090909090909091
R: 0.09090909090909091
F1: 0.09090909090909091


In [None]:
print_retrieval_scores(bm25_results)

P@1: 0.45454545454545453
R@1: 0.45454545454545453
F1@1: 0.45454545454545453
P@5: 0.10909090909090909
R@5: 0.5
F1@5: 0.1774891774891775
P@10: 0.0818181818181818
R@10: 0.6341991341991341
F1@10: 0.1390062700223128
P: 0.46753246753246747
R: 0.46753246753246747
F1: 0.46753246753246747


# 8. Reader Evaluation

In [None]:
df['link'][0]



In [None]:
docs = document_store.query(query=None, filters={"link": df['link'][0]})

In [None]:
docs



In [None]:
test_context = []

for _,links in df_test['Relevant Link(s)'].iteritems():
  links = links.strip()
  if (links == ""):
    test_context.append(None)
    continue
  else:
    links_list = links.split("\n")
    main_link = links_list[0]

    docs = document_store.query(query=None, filters={"link": main_link})
    if (len(docs) == 0):
      test_context.append(None)
      continue
    else:
      test_context.append(docs[0].meta['explanation'])

df_test['test_context'] = test_context
df_test_qa = df_test.dropna()
df_test_qa

Unnamed: 0,No,Keyword,Question,Relevant Link(s),Answer,test_context
0,1,Flurona,Is flurona a new COVID-19 variant?,https://www.poynter.org/?ifcn_misinformation=flurona-is-a-new-variant-of-the...,such cases are rare,Explanation: Experts say the term refers to simultaneous but separate influe...
1,2,Ivermectin,Can Ivermectin replace vaccines?,https://www.poynter.org/?ifcn_misinformation=japan-is-now-replacing-vaccines...,Japanese health authorities do not recommend ivermectin as a treatment again...,Explanation: Japanese health authorities do not recommend ivermectin as a tr...
3,4,Pfizer,Does the Pfizer vaccine cause miscarriage?,https://www.poynter.org/?ifcn_misinformation=in-a-pfizer-vaccine-clinical-tr...,They occur without vaccination or with vaccination,Explanation: This is a misinterpretation of research data.
4,5,Microchip,Do vaccines contain microchip?,https://www.poynter.org/?ifcn_misinformation=pfizer-ceo-albert-bourla-said-t...,microrobots that are impossible to add to vaccines,Explanation: The video is from 2018 and has nothing to do with Covid-19 vacc...
5,6,Pets,Can pets get COVID?,https://www.poynter.org/?ifcn_misinformation=pets-can-also-contract-covid-19...,"animals can be infected by humans, but not the other way around","Explanation: Studies clarify that these animals can be infected by humans, b..."
6,7,Hydroxychloroquine,Is hydroxychloroquine effective against COVID-19?,https://www.poynter.org/?ifcn_misinformation=nebulization-with-hydroxychloro...,it has no benefit against the Covid-19,Explanation: There is no scientific evidence that hydroxychloroquine has act...
8,9,Indonesia,Did the chinese government accuse Indonesia for the coronavirus?,https://www.poynter.org/?ifcn_misinformation=chinese-government-accuses-indo...,No sentence was found from the Shaanxi government accusing Indonesia of spre...,"Explanation: In its official statement, the Shaanxi Health Commission, China..."
10,11,WHO booster,Is the booster dose meaningless?,https://www.poynter.org/?ifcn_misinformation=who-said-the-booster-dose-was-m...,WHO did not say that booster doses were ineffective or meaningless,Explanation: WHO did not say that booster doses were ineffective or meaningl...
11,12,5G,Is vaccines related to 5G?,https://www.poynter.org/?ifcn_misinformation=5g-testing-has-begun-to-check-h...,5G technology in communications and vaccination in healthcare are not related,Explanation: The conspiracy theory that the 5G network and some non-existent...
12,13,CDC,Is infection better than vaccines to improve immunity?,https://www.poynter.org/?ifcn_misinformation=cdc-study-shows-that-infection-...,vaccines are the most effective way to reduce hospitalizations and the sprea...,Explanation: The idea that infection is better than vaccination is brought u...


In [None]:
from rouge_score import rouge_scorer

# Use Rouge scores
scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=False)
scores = scorer.score('The quick brown fox jumps over the lazy dog',
                      'The quick brown dog jumps on the log.')
scores

{'rouge1': Score(precision=0.75, recall=0.6666666666666666, fmeasure=0.7058823529411765),
 'rougeL': Score(precision=0.625, recall=0.5555555555555556, fmeasure=0.5882352941176471)}

In [None]:
scores = []

scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=False)

for _,row in df_test_qa.iterrows():
  q = row['Question']
  result = reader.predict_on_texts(q, texts=[row['test_context']], top_k=1)
  pred_answer = result['answers'][0].answer
  true_answer = row['Answer']

  rouge = scorer.score(true_answer, pred_answer)
  scores.append(rouge)

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.30 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.82 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 29.41 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 27.08 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.35 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 29.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 29.17 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 28.89 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.21 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 24.72 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 27.54 Batches/s]


In [None]:
scores

[{'rouge1': Score(precision=0.0, recall=0.0, fmeasure=0.0),
  'rougeL': Score(precision=0.0, recall=0.0, fmeasure=0.0)},
 {'rouge1': Score(precision=1.0, recall=1.0, fmeasure=1.0),
  'rougeL': Score(precision=1.0, recall=1.0, fmeasure=1.0)},
 {'rouge1': Score(precision=0.0, recall=0.0, fmeasure=0.0),
  'rougeL': Score(precision=0.0, recall=0.0, fmeasure=0.0)},
 {'rouge1': Score(precision=0.2857142857142857, recall=0.25, fmeasure=0.26666666666666666),
  'rougeL': Score(precision=0.2857142857142857, recall=0.25, fmeasure=0.26666666666666666)},
 {'rouge1': Score(precision=1.0, recall=0.25, fmeasure=0.4),
  'rougeL': Score(precision=1.0, recall=0.25, fmeasure=0.4)},
 {'rouge1': Score(precision=0.23076923076923078, recall=0.375, fmeasure=0.2857142857142857),
  'rougeL': Score(precision=0.15384615384615385, recall=0.25, fmeasure=0.1904761904761905)},
 {'rouge1': Score(precision=1.0, recall=0.11764705882352941, fmeasure=0.21052631578947367),
  'rougeL': Score(precision=1.0, recall=0.117647058

In [None]:
def print_reader_eval(scores, metrics=['rouge1', 'rougeL']):
  for m in metrics:
    print("===================")
    print(m)
    print("===================")
    precision_scores = [score[m].precision for score in scores]
    recall_scores = [score[m].recall for score in scores]
    f1_scores = [score[m].fmeasure for score in scores]
    print("Precision:", sum(precision_scores)/len(precision_scores))
    print("Recall:", sum(recall_scores)/len(recall_scores))
    print("F1:", sum(f1_scores)/len(f1_scores))

print_reader_eval(scores)

rouge1
Precision: 0.5924075924075924
Recall: 0.22614514125209312
F1: 0.2728093278332513
rougeL
Precision: 0.5854145854145855
Recall: 0.21478150488845674
F1: 0.2641513191752426


In [None]:
# Evaluate full context

test_context = []

for _,links in df_test['Relevant Link(s)'].iteritems():
  links = links.strip()
  if (links == ""):
    test_context.append(None)
    continue
  else:
    links_list = links.split("\n")
    main_link = links_list[0]

    docs = document_store.query(query=None, filters={"link": main_link})
    if (len(docs) == 0):
      test_context.append(None)
      continue
    else:
      test_context.append(docs[0].content)

df_test['test_context'] = test_context
df_test_qa = df_test.dropna()
df_test_qa

Unnamed: 0,No,Keyword,Question,Relevant Link(s),Answer,test_context
0,1,Flurona,Is flurona a new COVID-19 variant?,https://www.poynter.org/?ifcn_misinformation=flurona-is-a-new-variant-of-the...,such cases are rare,FALSE: Flurona is a new variant of the virus that causes Covid-19. Explanati...
1,2,Ivermectin,Can Ivermectin replace vaccines?,https://www.poynter.org/?ifcn_misinformation=japan-is-now-replacing-vaccines...,Japanese health authorities do not recommend ivermectin as a treatment again...,FALSE: Japan is now replacing vaccines with ivermectin. Explanation: Japanes...
3,4,Pfizer,Does the Pfizer vaccine cause miscarriage?,https://www.poynter.org/?ifcn_misinformation=in-a-pfizer-vaccine-clinical-tr...,They occur without vaccination or with vaccination,"FALSE: In a Pfizer vaccine clinical trial, all pregnant women lost their bab..."
4,5,Microchip,Do vaccines contain microchip?,https://www.poynter.org/?ifcn_misinformation=pfizer-ceo-albert-bourla-said-t...,microrobots that are impossible to add to vaccines,FALSE: Pfizer CEO Albert Bourla said the Covid-19 vaccines contain microchip...
5,6,Pets,Can pets get COVID?,https://www.poynter.org/?ifcn_misinformation=pets-can-also-contract-covid-19...,"animals can be infected by humans, but not the other way around",False: Pets can also contract COVID-19 and can infect humans Explanation: St...
6,7,Hydroxychloroquine,Is hydroxychloroquine effective against COVID-19?,https://www.poynter.org/?ifcn_misinformation=nebulization-with-hydroxychloro...,it has no benefit against the Covid-19,FALSE: Nebulization with hydroxychloroquine cures COVID-19. Explanation: The...
8,9,Indonesia,Did the chinese government accuse Indonesia for the coronavirus?,https://www.poynter.org/?ifcn_misinformation=chinese-government-accuses-indo...,No sentence was found from the Shaanxi government accusing Indonesia of spre...,Misleading: Chinese government accuses Indonesia of spreading the coronaviru...
10,11,WHO booster,Is the booster dose meaningless?,https://www.poynter.org/?ifcn_misinformation=who-said-the-booster-dose-was-m...,WHO did not say that booster doses were ineffective or meaningless,FALSE: WHO said the booster dose was “meaningless.” Explanation: WHO did not...
11,12,5G,Is vaccines related to 5G?,https://www.poynter.org/?ifcn_misinformation=5g-testing-has-begun-to-check-h...,5G technology in communications and vaccination in healthcare are not related,FALSE: 5G testing has begun! To check how far they are with the chipping wit...
12,13,CDC,Is infection better than vaccines to improve immunity?,https://www.poynter.org/?ifcn_misinformation=cdc-study-shows-that-infection-...,vaccines are the most effective way to reduce hospitalizations and the sprea...,MISLEADING: CDC study shows that infection is better at providing immunity t...


In [None]:
scores = []

scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=False)

for _,row in df_test_qa.iterrows():
  q = row['Question']
  result = reader.predict_on_texts(q, texts=[row['test_context']], top_k=1)
  pred_answer = result['answers'][0].answer
  true_answer = row['Answer']

  rouge = scorer.score(true_answer, pred_answer)
  scores.append(rouge)

print_reader_eval(scores)

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 28.83 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.07 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 26.69 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 28.70 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.79 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.48 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 35.17 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.23 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.71 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.27 Batches/s]

rouge1
Precision: 0.23382034632034632
Recall: 0.15717738886188085
F1: 0.18295175838654096
rougeL
Precision: 0.21515151515151515
Recall: 0.13881191595095338
F1: 0.16485411528889787



