# Evaluating keyword extraction





In the evaluation of keyword extraction, the process involved using the FEVER dataset and its associated URLs. The first step was to retrieve a wiki page using the provided URL from the FEVER dataset and consider it as the ground truth. Then, keywords were extracted from the claim, and for each keyword, a wiki page was retrieved. If any of the retrieved wiki pages matched the ground truth wiki page, the extracted keyword was considered correct. This evaluation was performed on 1000 claims.

To ensure data cleanliness, rows with 'evidence_sentence_id' equal to -1 were removed, indicating that the FEVER URL or wiki page contained no relevant sentences. Furthermore, we removed rows containing FEVER URLs that do not lead to a wiki page. This could be because the wiki page no longer exists or an execution error.

The initial attempt involved using named entity recognition, which achieved an accuracy of 77.6%. The second attempt utilized LLM (Language Model) and instructed it to extract one or two entity names or phrases from the claim, resulting in a 71.2% accuracy.

Although named entity recognition had higher accuracy, there were cases where it did not return any keywords, whereas the LLM approach consistently provided keywords, albeit with lower accuracy. For example, for the claim "License to Wed is a movie," named entity recognition returned nothing [], while LLM returned ['License to Wed'], which was the correct keyword.

To address this, both approaches were combined, and keywords were extracted using both methods. The extracted keywords were then aggregated, resulting in an accuracy of 87.7%.

It's important to note that the ground truth was based on the FEVER URL, and even if the retrieved wiki pages did not match the ground truth, it does not necessarily mean that the retrieved pages don’t contain relevant information related to the claim.






In [None]:
!pip install chromadb tqdm fireworks-ai python-dotenv pandas wikipedia
!pip install sentence-transformers
!pip install datasets

In [None]:
import fireworks.client
import os
import dotenv
import chromadb
import json
from tqdm.auto import tqdm
import pandas as pd
import random
from tqdm import tqdm
import wikipedia
import spacy
nlp = spacy.load("en_core_web_sm")

# you can set envs using Colab secrets
dotenv.load_dotenv()

fireworks.client.api_key = ''

def get_completion(prompt, model=None, max_tokens=50):
    fw_model_dir = "accounts/fireworks/models/"
    if model is None:
        model = fw_model_dir + "llama-v2-7b"
    else:
        model = fw_model_dir + model
    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )
    return completion.choices[0].text

mistral_llm = "mistral-7b-instruct-4k"

In [None]:
# Test LLM Api

mistral_llm = "mistral-7b-instruct-4k"

prompt = """[INST]
Given the following wedding guest data, write a very short 3-sentences thank you letter:

{
  "name": "John Doe",
  "relationship": "Bride's cousin",
  "hometown": "New York, NY",
  "fun_fact": "Climbed Mount Everest in 2020",
  "attending_with": "Sophia Smith",
  "bride_groom_name": "Tom and Mary"
}

Use only the data provided in the JSON object above.

The senders of the letter is the bride and groom, Tom and Mary.
[/INST]"""

get_completion(prompt, model=mistral_llm, max_tokens=150)

" Dear John Doe,\n\nWe, Tom and Mary, would like to extend our heartfelt gratitude for your attendance at our wedding. It was a pleasure to have you there, and we truly appreciate the effort you made to be a part of our special day.\n\nWe were thrilled to learn about your fun fact - climbing Mount Everest is an incredible accomplishment! We hope you had a safe and memorable journey.\n\nThank you again for joining us on this special occasion. We hope to stay in touch and catch up on all the amazing things you've been up to.\n\nWith love,\n\nTom and Mary"

In [None]:
from datasets import load_dataset

dataset = load_dataset("fever", "v1.0")

In [None]:
X_train = dataset["train"]
X_valid = dataset["labelled_dev"]
X_test = dataset["paper_test"]

In [None]:

#get rid of rows that don't have sentences relevant to the claim
filtered_data =[row for row in X_train if row['evidence_sentence_id']!=-1 ]
data = pd.DataFrame(filtered_data)
data=data[:2800]
data=data.loc[:,['claim','evidence_wiki_url']]
data=data.drop_duplicates()
data = data.to_dict(orient='records')
#get rid of rows which the url doesn't lead to a wiki page, either because the page no longer exist or error
data =[row for row in data if wikipedia.search(row['evidence_wiki_url'])!=[]]
data=data[:1000]


In [None]:
#LLM only

mistral_llm = "mistral-7b-instruct-4k"


def prompt_key(claim):
    prompt_key = f"""[INST]
    Given the following claim, extract a query phrase to search on wikipedia for most relevant information:

    CLAIM: {claim}

    Extract ONE entity name or ONE phrase that when used to search on wikipedia, returns the most informative page on the claim.
    Return ONLY the name or the phrase. DO NOT return anything else.
    [/INST]"""
    return prompt_key



count=0
n=1000
for i in range(n):
    claim = data[i]["claim"]
    #get keywords
    keyword=get_completion(prompt_key(claim), model=mistral_llm, max_tokens=150)
    #some times the keywords returned from the LLM has space infront of its answer or in quotation marks so we have to get rid of then
    keyword=keyword.strip().replace('"', '').split(', ')
    for word in keyword:
        #some times the keyword doesnt return any wiki page, skip it so it doesn't cause a error
        page_titles_1 = wikipedia.search(word)
        if page_titles_1==[]:
            continue
        top_title_1 = page_titles_1[0]
        page_titles_2 = wikipedia.search(data[i]['evidence_wiki_url'])
        top_title_2 = page_titles_2[0]
        #check if the returned wiki page match
        if top_title_1==top_title_2:
            count=count+1
            #if we already found a keyword that matches the ground truth then move on to the next keyword
            break





print(f'Extracted correct keyword: {count}/{n}')
print(f'Did not extract correct keyword: {n-count}/{n}')
print('Accuracy: ',(count)/n)

Extracted correct keyword: 631/1000
Did not extract correct keyword: 369/1000
Accuracy:  0.631


In [None]:
#name entity recognition only
def claim_extract_NER(claim):
    doc = nlp(claim)
    entities = [(entity.text, entity.label_) for entity in doc.ents]
    #it returns a entity and a entity class, we only want the entity
    keyword = [entity[0] for entity in entities]
    return keyword

count=0
n=1000
for i in range(n):
    claim = data[i]["claim"]
    keywords=claim_extract_NER(claim)
    for word in keywords:
        page_titles_1 = wikipedia.search(word)
        top_title_1 = page_titles_1[0]
        page_titles_2 = wikipedia.search(data[i]['evidence_wiki_url'])
        top_title_2 = page_titles_2[0]
        if top_title_1==top_title_2:
            count=count+1
            #if we already found a keyword that matches the ground truth then move on to the next keyword
            break




print(f'Extracted correct keyword: {count}/{n}')
print(f'Did not extract correct keyword: {n-count}/{n}')
print('Accuracy: ',(count)/n)

Extracted correct keyword: 773/1000
Did not extract correct keyword: 227/1000
Accuracy:  0.773


In [None]:

# name entity recognition and LLM
def claim_extract(claim):
    prompt_key = f"""[INST]
    Given the following claim, extract a query phrase to search on wikipedia for most relevant information:

    CLAIM: {claim}

    Extract ONE entity name or ONE phrase that when used to search on wikipedia, returns the most informative page on the claim.
    Return ONLY the name or the phrase. DO NOT return anything else.
    [/INST]"""

    doc = nlp(claim)
    entities = [(entity.text, entity.label_) for entity in doc.ents]
    keyword_1 = [entity[0] for entity in entities]
    keyword_2=get_completion(prompt_key, model=mistral_llm, max_tokens=150)
    keyword_2=keyword_2.strip().replace('"', '').split(', ')
    keyword=keyword_1+keyword_2
    keyword=list(set(keyword))
    return keyword




count=0
n=1000
for i in range(n):
    claim = data[i]["claim"]
    keyword=claim_extract(claim)
    for word in keyword:
        page_titles_1 = wikipedia.search(word)
        if page_titles_1==[]:
            continue
        top_title_1 = page_titles_1[0]
        page_titles_2 = wikipedia.search(data[i]['evidence_wiki_url'])
        top_title_2 = page_titles_2[0]
        if top_title_1==top_title_2:
            count=count+1
            break





print(f'Extracted correct keyword: {count}/{n}')
print(f'Did not extract correct keyword: {n-count}/{n}')
print('Accuracy: ',(count)/n)

Extracted correct keyword: 872/1000
Did not extract correct keyword: 128/1000
Accuracy:  0.872
