## Table of Contents
Code below here is segmented by functionality and what was performed in each part. For running our scripts in this notebook, you will need to modify the file path in the data import section [here](#import_training_data) to wherever the excel sheets containing the training sets have been set. It is important that the two excel sheets are saved in the same file location otherwise a second path will need to be specified in the cell at the link above.

### Python Imports
   * [Link to package import](#import_packages)
   * [Link to training data import](#import_training_data)
   
### Data Formatting
* [Article List Formating](#generate_python_list)
* [Article Retrieval Code](#retrieve_articles)

### Article Retrieval (TF-IDF)
* [Data Preprocessing](#preprocess_data)
* [Document Retrieval Benchmarking](#document_retrieval_benchmarking)
* [Pretrained Model](#import_pretrained_model)

### Answer Retrieval (BERT)
* [BERT Implementation](#BERT_training)
* [BERT Looping Helper Functions](#looping_helpers)
* [Benchmarking BERT with Looping](#benchmarking_BERT_with_loop)
* [BERT Benchmarking Troubleshooting](#BERT_analysis)

## Import Python Packages <a id='import_packages'></a>

In [431]:
# general packages
import pandas as pd
import numpy as np
import os
import torch
import random
import spacy
import nltk
import re

# specific functionality
from nltk.stem import WordNetLemmatizer 
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import csr_matrix

# import BERT models from transformers
import transformers
from transformers import BertTokenizer, AutoTokenizer, BertForQuestionAnswering, BertTokenizerFast, BertConfig, DistilBertForQuestionAnswering, DistilBertTokenizerFast


## Import Training Sets <a id='import_training_data'></a>

### ** IMPORTANT FOR RUNNING ** Need to change the file path to match the directory where the excel sheets containing the training data have been saved

In [432]:
display_dataframe = False

# must change this path to match file location
savepath = r'C:\Users\drhulbert\Documents\Repos\CS_247-Project-Code-main\CS_247-Project-Code-main\\'

# Import the excel sheet containing the BERT training data
training_BERT_filename = "squad_training_dataset_BERT.csv"
save_BERT_fullpath = os.path.join(savepath,training_BERT_filename)
training_BERT_dataset_df = pd.read_csv(save_BERT_fullpath)

training_docretr_filename = "squad_training_dataset_docretr.csv"
save_docretr_fullpath = os.path.join(savepath,training_docretr_filename)
training_docretr_dataset_df = pd.read_csv(save_docretr_fullpath)

# drop non-answers from the dataset
training_BERT_dataset_df = training_BERT_dataset_df.dropna()

# display the training set dataframe
if display_dataframe:
    display(training_BERT_dataset_df)
    display(training_docretr_dataset_df)
    display(training_BERT_dataset_df)

# create training lists
questions_BERT = training_BERT_dataset_df['question'].to_list()
answers_BERT = training_BERT_dataset_df['answer'].to_list()
topics_BERT = training_BERT_dataset_df['topic'].to_list()
answers_start = training_BERT_dataset_df['answer_start'].to_list()

# Import training data for article to questions
topic_to_article_BERT_filename = "topics_to_articles_BERT.csv"
save_fullpath = os.path.join(savepath,topic_to_article_BERT_filename)
topics_articles_BERT_df = pd.read_csv(save_fullpath)

#get the unique topic names from the dataframe
topic_strings_BERT = topics_articles_BERT_df.columns.to_list()
#get the correct label for the questions
question_topic_BERT = training_BERT_dataset_df['topic'].to_list()
#get the correct answers for questions
question_answers_BERT = training_BERT_dataset_df['answer'].to_list()

## Generate List of Articles <a id='generate_python_list'></a>

In [433]:
def Get_Article(df,topic):
    article = df[topic].to_list()[0]
    
    return article

articles_BERT = []
for topic in topic_strings_BERT:
    articles_BERT.append(Get_Article(topics_articles_BERT_df,topic))

In [434]:
def segment_documents(docs, max_doc_length=500):
    # List containing full and segmented docs
    segmented_docs = []

    for doc in docs:
        # Split document by spaces to obtain a word count that roughly approximates the token count
        split_to_words = doc.split(" ")

        # If the document is longer than our maximum length, split it up into smaller segments and add them to the list 
        if len(split_to_words) > max_doc_length:
            for doc_segment in range(0, len(split_to_words), max_doc_length):
                segmented_docs.append(" ".join(split_to_words[doc_segment:doc_segment + max_doc_length]))

        # If the document is shorter than our maximum length, add it to the list
        else:
            segmented_docs.append(doc)

    return segmented_docs

In [435]:
segmented_articles_BERT = segment_documents(articles_BERT)
print(segmented_articles_BERT[0])

Beyonce Giselle Knowles-Carter (/bi:'janseI/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyonce's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".Following the disbandment of Destiny's Child in June 2005, she released her second solo album, B'Day (2006), which contained hits "Deja Vu", "Irreplaceable", and "Beautiful Liar". Beyonce also ventured into acting, with a Golden Globe-nominated performance in Dreamgirls (2006), and starring roles in The P

## Article Retrieval Code <a id='retrieve_articles'></a>

In [436]:
#make pre-processing functions
def convert_lower_case(data):
    return str(np.char.lower(data))

def remove_punctuation(data):
    new_data = ""
    symbols = "!\"#$%&()*+-./:;<=>?@[\]^_`{|}~\n"
    for i in symbols:
        new_data = np.char.replace(data, i, ' ')
        
    return(str(new_data))
def remove_apostrophe(data):
    return str(np.char.replace(data, "'", ""))

def remove_single_characters(data):
    new_text = ""
    
    word_list = nltk.word_tokenize(data)
    
    for w in word_list:
        if len(w) > 1:
            new_text = new_text + " " + w
    
    return new_text

def Lemmatize(data):
    lemmatizer = WordNetLemmatizer()
    
    word_list = nltk.word_tokenize(data)
    
    lemmatized_output = ' '.join([lemmatizer.lemmatize(w) for w in word_list])
    
    return lemmatized_output
def Stemming(data):
    ps = PorterStemmer()
    
    word_list = nltk.word_tokenize(data)
    
    stem_output = ' '.join([ps.stem(w) for w in word_list])
    
    return stem_output

#create a function to preprocess the data
def Pre_Process_Data(data):
    new_data = remove_punctuation(data)
    return new_data

def Retrieve_Article(query, docs, k=5):
    #pre-process the query
    query = Pre_Process_Data(query)
    
    query_words = re.split('\s+', query)
    num_cols = len(query_words)
    
    # Initialize a vectorizer that removes English stop words
    vectorizer = TfidfVectorizer(analyzer="word", stop_words='english',sublinear_tf=True,use_idf=True)
    
    # Create a corpus of query and documents and convert to TFIDF vectors
    query_and_docs = [query] + docs
    matrix = vectorizer.fit_transform(query_and_docs)
    
    #apply SVD to the TF-IDF vectorized matrix
    svd = TruncatedSVD(n_components=num_cols+200,n_iter=1,random_state=42)
    
    #fit and transform the SVD model
    matrix_new = svd.fit_transform(matrix)
    matrix_new = csr_matrix(matrix_new)

    # Holds our cosine similarity scores
    scores = []

    # The first vector is our query text, so compute the similarity of our query against all document vectors
    for i in range(1, len(query_and_docs)):
        scores.append(cosine_similarity(matrix_new[0], matrix_new[i])[0][0])

    # Sort list of scores and return the top k highest scoring documents
    sorted_list = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
    top_doc_indices = [x[0] for x in sorted_list[:k]]
    top_docs = [docs[x] for x in top_doc_indices]

    return top_docs, top_doc_indices

## Preprocess the Data before Retrieval <a id='preprocess_data'></a>

In [437]:
preprocess_articles_BERT = []
for doc in articles_BERT:
    preprocess_articles_BERT.append(Pre_Process_Data(doc))

## Benchmark Document Retrieval <a id='document_retrieval_benchmarking'></a>

In [438]:
#conduct document retrieval for each question in the dataset

def Benchmark_DocRetrieval(questions,question_topic,articles,articles_true,topic_strings,RandQs=False):
    
    if(RandQs):
        random_indices = list(random.sample(range(0, len(questions)), 10))

        sample_questions = []
        for idx in random_indices:
            sample_questions.append(questions[idx])
    else:
        #create a sample of questions
        sample_questions = questions[0:10]
    
    total = len(sample_questions)
    correct = 0
    tracker = 1
    for question in sample_questions:
        #create a tracker
        print("Question #%d/%d" %(tracker,total))

        #get the true label for the question
        true_question_label = question_topic[questions.index(question)]

        #run the doc retriever on the current question 
        top_articles, predicted_articles_indices = Retrieve_Article(question,articles,k=5)
       
        #iterate over all predicted articles and check if the prediction is correct
        for prediction in top_articles:    
            #get the true topic of the predicted article
            true_article_label = topic_strings[articles_true.index(prediction)]

            #this will handle if the correct article is even chosen
            if(true_question_label==true_article_label):
                correct += 1

        tracker += 1

    return correct/total

print("Retrieving the top 5 articles for each question...")
retrieval_accuracy = Benchmark_DocRetrieval(questions_BERT,question_topic_BERT,articles_BERT,articles_BERT,topic_strings_BERT,RandQs=True)
print("k=5 Article Retrieval Accuracy: " + str(retrieval_accuracy))

Retrieving the top 5 articles for each question...
Question #1/10
Question #2/10
Question #3/10
Question #4/10
Question #5/10
Question #6/10
Question #7/10
Question #8/10
Question #9/10
Question #10/10
k=5 Article Retrieval Accuracy: 0.8


## Import the PreTrained Model <a id='import_pretrained_model'></a>

In [439]:
transformers.logging.set_verbosity_error()

# modelname = 'bert-large-uncased-whole-word-masking-finetuned-squad'
modelname = 'deepset/bert-base-cased-squad2'

model = BertForQuestionAnswering.from_pretrained(modelname)
tokenizer = AutoTokenizer.from_pretrained(modelname)

## BERT Function <a id='BERT_training'></a>

In [474]:
#create function that runs the BERT model
def Run_BERT(question, text_batch):
    
    #encode the question and the paragraph(text)
    input_ids = tokenizer.encode(question,text_batch,max_length=512)
    
    #search the input_ids for the first instance of the SEP token
    sep_index = input_ids.index(tokenizer.sep_token_id)
            
    #Segment A occurs from the first char to the end of the SEP token instance
    num_seg_a = sep_index+1
    
    #The rest of the tokens will belong to segment B
    num_seg_b = len(input_ids)-num_seg_a
    
    #construct a list of 0's and 1's
    segment_ids = [0]*num_seg_a + [1]*num_seg_b
    
    #there should be a segment id for every input token
    #if this doesnt return an error we are good
    assert len(segment_ids) == len(input_ids)
    
    #run the model using the current data
    outputs = model(torch.tensor([input_ids]), #the tokens representing the input text 
                   token_type_ids=torch.tensor([segment_ids]), #the segment ids to differentiate Q from A
                   return_dict=True)
    
    #get the start and end vectors
    start_scores = outputs.start_logits
    end_scores = outputs.end_logits
    
    #reconstruct the answer from the scores
    answer_start = torch.argmax(start_scores)
    answer_end = torch.argmax(end_scores)
    
    #get the string versions of the input tokens
    tokens = tokenizer.convert_ids_to_tokens(input_ids)
    
    #create an answer variable and append the start of the first word
    answer = tokens[answer_start]
    
    #fill out the remainder of the answer
    for i in range(answer_start + 1, answer_end + 1):
        #if we have a subword token, recombine it with the previous token
        if(tokens[i][0:2]=='##'):
            answer += tokens[i][2:]
        elif(tokens[i][0]==','):
            answer += tokens[i][0]
        elif(tokens[i][0]=='\''):
            answer += tokens[i][0]
        elif(tokens[i][0]=='-'):
            answer += tokens[i][0]
        elif(tokens[i][0]=='s'):
            answer += tokens[i]
        elif(tokens[i][0] == '.'):
            answer += tokens[i][0]
        elif(tokens[i][0].isnumeric() and i > 1):
            if tokens[i-1][0]=='.':
                answer += tokens[i][0]
        else:
            answer += ' ' + tokens[i]

    return answer

## Helper Functions <a id='looping_helpers'></a>

In [485]:
# given a paragraph and answer from BERT will return the exact answer context based on periods in the paragraph
def find_answer_context(paragraph, answer, buffer_sentences=0):
    pgraph_chars = ''.join(paragraph.split(' '))
    answer_chars = ''.join(answer.split(' '))
    
    # find the index in the character string where answer starts
    answer_loc = pgraph_chars.find(answer_chars)
    
    # find all the indices of the periods
    period_indices = [x for x in findall('.', pgraph_chars)]
    
    # find the index value where the answer would be inserted
    stop_idx = np.searchsorted(period_indices, answer_loc)
    
    # find the periods marking to the left and right of the start point
    if stop_idx > 0:
        context_left = period_indices[stop_idx-1::-1]
    else:
        context_left = []
    context_right = period_indices[stop_idx:-1]
    
    # loop through the periods until we find the appropraite number of buffer sentences worth
    p_idx = 0
    left_count = 0
    while left_count <= buffer_sentences and p_idx < len(context_left):
        if not pgraph_chars[context_left[p_idx]+1].isnumeric():
            left_count +=1
        p_idx += 1
    
    left_period_num = len(context_left) - p_idx
    
    p_idx = 0
    right_count = 0
    while right_count <= buffer_sentences and p_idx < len(context_right):
        if not pgraph_chars[context_right[p_idx]+1].isnumeric():
            right_count +=1
        p_idx += 1
    
    right_period_num = len(context_left) + p_idx + 1
  
    # find the indices in the paragraph
    left_p_idx = find_nth(paragraph, '.', left_period_num)+1
    right_p_idx = find_nth(paragraph, '.', right_period_num)+1

    return paragraph[left_p_idx:right_p_idx]
    

# returns all the period indices in the string s
def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

# find the nth occurance of needle in the haystack
def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start

# returns the sentences from the paragraphs that contain the answers
def sentences_with_answers(paragraphs, answers):
    # storage data structures
    answer_sentences = []
    total_count = 0
    for idx, ans in enumerate(answers):
        text = find_answer_context(paragraphs[idx], ans)
        if text != '':
            answer_sentences.append(text)
            total_count += len(find_answer_context(paragraphs[idx], ans).split(' '))
    return total_count, answer_sentences

# combines the documents into a compressed corpus
def compress_corpus(documents, max_doc_size = 450):
    size = 0
    compressed_corp = []
    current_doc = ''
    for d in documents:
        d_list = d
        if size + len(d_list.split(' ')) < max_doc_size:
            current_doc += ' ' + d
            size += len(d_list.split(' '))
        else:
            compressed_corp.append(current_doc)
            current_doc = d
            size = len(d_list.split(' '))
    compressed_corp.append(current_doc)
    return compressed_corp

# runs BERT repeatedly until a single answer is output (this answer can be none)
def narrow_down_answers(question, documents, answers):
    
    while len(answers) > 1:
        # back out sentences from the answers
        counts, sentences = sentences_with_answers(documents, answers)
        # compress the sentences down to a smaller number of documents
        compressed_corpus = compress_corpus(sentences)

        answers = []
        documents = compressed_corpus.copy()
        for paragraph in compressed_corpus:
            # run BERT
            BERT_answer = Run_BERT(question, paragraph)
            
            # check that BERT answer is acceptable before adding to answer list
            if '[CLS]' not in BERT_answer:
                answers.append(BERT_answer)

    if len(answers) == 0:
        return None
    else:
        return answers[0]

## Benchmarking BERT with Looping <a id='benchmarking_BERT_with_loop'></a>
Individual benchmark talleying was completed by hand based on the outputs to the juptery cell below

In [489]:
# number of questions to test on
num_questions = 100

# added external loop to save memory so that benchmarking can run faster
for _ in range(num_questions):
    record_questions = []
    record_correct_answer = []
    record_answers = []
    record_segments = []
    record_articles = []

    random_indices = list(random.sample(range(0, len(questions_BERT)), 1))

    sample_questions = []
    for idx in random_indices:
        sample_questions.append(questions_BERT[idx])

    test_rounds = len(sample_questions)
    correct_answers = 0

    #define the stopwords
    sp = spacy.load('en_core_web_sm')
    noncontext_words = sp.Defaults.stop_words
   
    #loop over the sample questions
    for question in sample_questions:
        BERT_answers = []
        print("####################################################################################")
        print('Question:', question)
        print('Question Index:', questions_BERT.index(question))
        # Dan added dictionary to look up segment
        lookup_segment = {}
        segments = []
        answers_to_consider = []
        current = 0

        #get the correct answer to this question
        correct_answer = question_answers_BERT[questions_BERT.index(question)]
        print("Correct Answer: %s" % (correct_answer))
        record_correct_answer.append(correct_answer)

        #get the top k paragraphs
        candidate_articles, article_indices = Retrieve_Article(question,articles_BERT,10)
        record_articles.append(candidate_articles)

        #segment the chosen candidate article in "paragraphs"
        candidate_seg_articles = segment_documents(candidate_articles, max_doc_length=450)

        #return the answers from each of the top k paragraphs in descending order by relevancy
        for seg_idx, segment in enumerate(candidate_seg_articles):
            #BERT_prediction, start_idx, end_idx = Run_BERT(question, segment)
            BERT_prediction = Run_BERT(question, segment)

            BERT_answers.append(BERT_prediction)

            ## Dan added this for troubleshooting next part
            if '[CLS]' not in BERT_prediction:
                # store segment information
                lookup_segment[seg_idx] = current
                segments.append(segment)

                # store answer information extracted from text
                answers_to_consider.append(BERT_prediction)


                # increment index for dictionary
                current +=1
            #check to see if the return type is a string
            if(type(BERT_prediction)==str):
                #create lists of words for the predicted and the correct answers
                BERT_pred_list = re.split('\s+', BERT_prediction)
                BERT_true_list = re.split('\s+', correct_answer)

                BERT_pred_list_fix = []
                BERT_true_list_fix = []
                #remove the stop words in the lists
                for word in BERT_pred_list:
                    if(word not in noncontext_words):
                        BERT_pred_list_fix.append(word)

                #remove the stop words in the lists
                for word in BERT_true_list:
                    if(word not in noncontext_words):
                        BERT_true_list_fix.append(word)

                #check to see if any words in the prediction are in the answer
                true_ans_len = len(BERT_true_list_fix)
                num_matches = 0
                for word in BERT_pred_list_fix:
                    if(word in BERT_true_list_fix):
                        num_matches += 1

                if(true_ans_len==1):
                    if(num_matches==true_ans_len):
                        correct_answers += 1
                else:
                    if(num_matches>=round(0.5*true_ans_len)):
                        correct_answers += 1
        record_questions.append(question)
        record_answers.append(answers_to_consider)
        record_segments.append(segments)

        final_answer = narrow_down_answers(question, segments, answers_to_consider)   
        print('BERT Answer:', final_answer)
    BERT_accuracy = correct_answers/test_rounds
    print(BERT_accuracy)

####################################################################################
Question: What can prevent a green color in glass?
Question Index: 62814
Correct Answer: Manganese dioxide
BERT Answer: systems that do not invoke it
0.0
####################################################################################
Question: Where is a royal assent ceremony held within the United Kingdom?
Question Index: 58508
Correct Answer: Palace of Westminster
BERT Answer: Palace of Westminster
1.0
####################################################################################
Question: Where did Bell and his wife go on their honeymoon?
Question Index: 10643
Correct Answer: Europe
BERT Answer: Quebec
1.0
####################################################################################
Question: Why do muslim anti-masonics believe that the Freemasons want to destroy the Al-Aqsa Mosque?
Question Index: 35350
Correct Answer: rebuild the Temple of Solomon in Jerusalem
BERT Answer: to reb

BERT Answer: None
0.0
####################################################################################
Question: How much of the population lives below the poverty line?
Question Index: 18790
Correct Answer: More than two-thirds
BERT Answer: None
0.0
####################################################################################
Question: What two countries have not legally committed to advancing an anti-discriminaory stance towards young people?
Question Index: 30380
Correct Answer: U.S. and South Sudan
BERT Answer: U. S. and South Sudan
1.0
####################################################################################
Question: The Special Clerical Court is accountable to only which body?
Question Index: 75821
Correct Answer: the Supreme Leader
BERT Answer: None
0.0
####################################################################################
Question: Who made the Angel of Independence?
Question Index: 39828
Correct Answer: the order of the Emperor Maximilian
B

####################################################################################
Question: Who introduced Irving to Spielberg?
Question Index: 68610
Correct Answer: Brian De Palma
BERT Answer: None
1.0
####################################################################################
Question: Which Chetnik leader did Tito hold talks with?
Question Index: 17110
Correct Answer: Draza Mihailovic
BERT Answer: President Dwight D. Eisenhower
1.0
####################################################################################
Question: How long ago was the decedent of 99% of all modern potatoes cultivated as long ago as?
Question Index: 79745
Correct Answer: 10,000 years ago
BERT Answer: 10, years ago
1.0
####################################################################################
Question: What year was the UAR formed?
Question Index: 57370
Correct Answer: 1958
BERT Answer: 1958
1.0
####################################################################################
Questi

## BERT Looping Analysis <a id='BERT_analysis'></a>

Check the outputs for a given value in the test set above as well as given refernce information to plug values into the looping portions of the algorithm in the cell below.

In [484]:
val = 0
true_question_label = question_topic_BERT[questions_BERT.index(record_questions[val])]
print('The correct article is:         ', true_question_label)
print('Question:                       ', record_questions[val])
print('The correct answer is:          ', record_correct_answer[val])
print('All answers backed out by BERT: ', record_answers[val])
record_segments[val]

The correct article is:          Wood
Question:                        Are the knots that dead tree limbs form attached or not attached?
The correct answer is:           not attached
All answers backed out by BERT:  ['not attached']


['composed of wider elements. It is usually lighter in color than that near the outer portion of the ring, and is known as earlywood or springwood. The outer portion formed later in the season is then known as the latewood or summerwood. However, there are major differences, depending on the kind of wood (see below).A knot is a particular type of imperfection in a piece of wood; it will affect the technical properties of the wood, usually reducing the local strength and increasing the tendency for splitting along the wood grain, but may be exploited for visual effect. In a longitudinally sawn plank, a knot will appear as a roughly circular "solid" (usually darker) piece of wood around which the grain of the rest of the wood "flows" (parts and rejoins). Within a knot, the direction of the wood (grain direction) is up to 90 degrees different from the grain direction of the regular wood.In the tree a knot is either the base of a side branch or a dormant bud. A knot (when the base of a sid

### Test the Looping Portion to Debug/Understand Accuracy

In [471]:
print('Final answer:', narrow_down_answers(record_questions[val], record_segments[val], record_answers[val]))

These are the sentences: [" Ragworms' jaws are now being studied by engineers as they offer an exceptional combination of lightness and strength.Since annelids are soft-bodied, their fossils are rare - mostly jaws and the mineralized tubes that some of the species secreted. Although some late Ediacaran fossils may represent annelids, the oldest known fossil that is identified with confidence comes from about 518 million years ago in the early Cambrian period.", ' The 27 surviving panels of the nave are the most important mosaic cycle in Rome of this period. Two other important 5th century mosaics are lost but we know them from 17th-century drawings.', ' The mosaics were executed in the 1220s.Other important Venetian mosaics can be found in the Cathedral of Santa Maria Assunta in Torcello from the 12th century, and in the Basilical of Santi Maria e Donato in Murano with a restored apse mosaic from the 12th century and a beautiful mosaic pavement (1140).', ' The Monastery of Martyrius wa

### Print the Beginning of the Top K Articles for Checking Article Retrieval Accuracy

In [478]:
for q in record_articles[val]:
    print(q[0:200], end='\n\n')

The main passenger airport serving the metropolis and the state is Melbourne Airport (also called Tullamarine Airport), which is the second busiest in Australia, and the Port of Melbourne is Australia

Raleigh (/'ra:li/; RAH-lee) is the capital of the state of North Carolina as well as the seat of Wake County in the United States. It is the second most populous city in North Carolina, after Charlott

The University of Kansas (KU) is a public research university and the largest in the U.S. state of Kansas. KU branch campuses are located in the towns of Lawrence, Wichita, Overland Park, Salina, and 

An exhibition game (also known as a friendly, a scrimmage, a demonstration, a preseason game, a warmup match, or a preparation match, depending at least in part on the sport) is a sporting event whose

Victoria married her first cousin, Prince Albert of Saxe-Coburg and Gotha, in 1840. Their nine children married into royal and noble families across the continent, tying them together and earn