# QA Model

Here I have used retriever model to retrieve context from the source data. Then I have used pinecone to store the embeddings. Then I have used a generator model to generate answers.

# Install Dependencies

In [1]:
!pip install -qU datasets pinecone-client sentence-transformers torch

In [10]:
!pip install sentence_transformers

^C


In [31]:
!pip install transformers==4.25.1



In [11]:
!pip install sentence-transformers==2.2.2

Collecting sentence-transformers==2.2.2
  Using cached sentence_transformers-2.2.2-py3-none-any.whl
Installing collected packages: sentence-transformers
  Attempting uninstall: sentence-transformers
    Found existing installation: sentence-transformers 2.5.1
    Uninstalling sentence-transformers-2.5.1:
      Successfully uninstalled sentence-transformers-2.5.1
Successfully installed sentence-transformers-2.2.2


In [7]:
!pip install pinecone-client



# Load and Prepare Dataset

Our source data will be taken from the Wiki Snippets dataset, which contains over 17 million passages from Wikipedia. But, since indexing the entire dataset may take some time, we will only utilize 50,000 passages in this demo that include "History" in the "section title" column. If you want, you may utilize the complete dataset. Pinecone vector database can effortlessly manage millions of documents for you.

In [26]:
from datasets import load_dataset

# load the dataset from huggingface in streaming mode and shuffle it
wiki_data = load_dataset(
    'vblagoje/wikipedia_snippets_streamed',
    split='train',
    streaming=True
).shuffle(seed=960)

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


We are loading the dataset in the streaming mode so that we don't have to wait for the whole dataset to download (which is over 9GB). Instead, we iteratively download records one at a time.

In [2]:
# show the contents of a single document in the dataset
next(iter(wiki_data))

{'wiki_id': 'Q7649565',
 'start_paragraph': 20,
 'start_character': 272,
 'end_paragraph': 24,
 'end_character': 380,
 'article_title': 'Sustainable Agriculture Research and Education',
 'section_title': "2000s & Evaluation of the program's effectiveness",
 'passage_text': "preserving the surrounding prairies. It ran until March 31, 2001.\nIn 2008, SARE celebrated its 20th anniversary. To that date, the program had funded 3,700 projects and was operating with an annual budget of approximately $19 million. Evaluation of the program's effectiveness As of 2008, 64% of farmers who had received SARE grants stated that they had been able to earn increased profits as a result of the funding they received and utilization of sustainable agriculture methods. Additionally, 79% of grantees said that they had experienced a significant improvement in soil quality though the environmentally friendly, sustainable methods that they were"}

In [27]:
# filter only documents with History as section_title
history = wiki_data.filter(
    lambda d: d['section_title'].startswith('History')
)

In [16]:
next(iter(history))

{'wiki_id': 'Q2644349',
 'start_paragraph': 10,
 'start_character': 397,
 'end_paragraph': 10,
 'end_character': 534,
 'article_title': 'Taupo District',
 'section_title': 'History',
 'passage_text': 'was not until the 1950s that the region started to develop, with forestry and the construction of the Wairakei geothermal power station.'}

Let's iterate through the dataset and apply our filter to select the 23040 historical passages. We will extract `article_title`, `section_title` and `passage_text` from each document.

In [28]:
from tqdm.auto import tqdm  # progress bar

total_doc_count = 23039

counter = 0
docs = []
# iterate through the dataset and apply our filter
for d in tqdm(history, total=total_doc_count):
    # extract the fields we need
    doc = {
        "article_title": d["article_title"],
        "section_title": d["section_title"],
        "passage_text": d["passage_text"]
    }
    # add the dict containing fields we need to docs list
    docs.append(doc)

    # stop iteration once we reach 50k
    if counter == total_doc_count:
        break

    # increase the counter on every iteration
    counter += 1

  0%|          | 0/23039 [00:00<?, ?it/s]

In [29]:
docs

[{'article_title': 'Taupo District',
  'section_title': 'History',
  'passage_text': 'was not until the 1950s that the region started to develop, with forestry and the construction of the Wairakei geothermal power station.'},
 {'article_title': 'Sutarfeni',
  'section_title': 'History & Western asian analogues',
  'passage_text': 'Sutarfeni History strand-like pheni were Phenakas mentioned in various indian texts. Phenakas is a broad term which includes various dishes prepared by using layered fried dough. Vijayanagar records indicate that Pheni was another much relished sweet dish prepared from wheat flour and sugar, similar to phenaka of North India and had varieties like sugar pheni, milk pheni and vermicelli pheni Western asian analogues Sutarpheni is of the Indian analogs of the Turkish pismaniye, which uses wheat flour instead of rice flour, and the Persian pashmak, which substitutes sesame paste for wheat flour.  The choice of rice flour as the'},
 {'article_title': 'The Bishop 

In [15]:
import pandas as pd

# create a pandas dataframe with the documents we extracted
df = pd.DataFrame(docs)
df.head()

Unnamed: 0,article_title,section_title,passage_text
0,Taupo District,History,was not until the 1950s that the region starte...
1,Sutarfeni,History & Western asian analogues,Sutarfeni History strand-like pheni were Phena...
2,The Bishop Wand Church of England School,History,The Bishop Wand Church of England School Histo...
3,Teufelsmoor,History & Situation today,"made to preserve the original landscape, altho..."
4,Surface Hill Uniting Church,History,in perpetual reminder that work and worship go...


# Initialize Pinecone Index

In [2]:
import os

pc_api_key=os.getenv('pinecone_api_key')

In [3]:
from pinecone import Pinecone

pc = Pinecone(api_key=pc_api_key)

Now we create a new index. We will name it "abstractive-question-answering" — you can name it anything we want. We specify the metric type as "cosine" and dimension as 384 because the retriever we use to generate context embeddings is optimized for cosine similarity and outputs 384-dimension vectors.

In [4]:
index_name = "abstractive-question-answering"

# connect to abstractive-question-answering index we created
index = pc.Index(index_name)

# Initialize Retriever

Loads a pre-trained Sentence Transformer model named "all-MiniLM-L12-v2" using the specified device. This model is retrieved from the Hugging Face model hub. The chosen model, "all-MiniLM-L12-v2", is a variant of the MiniLM model, which is a smaller and faster version of the Language Model (LM) architecture.

In [5]:
import torch
from sentence_transformers import SentenceTransformer

# set device to GPU if available
device = 'cpu'
# load the retriever model from huggingface model hub
retriever = SentenceTransformer("sentence-transformers/all-MiniLM-L12-v2", device=device)
retriever

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

# Generate Embeddings and Upsert

Next, we need to generate embeddings for the context passages. We will do this in batches to help us more quickly generate embeddings and upload them to the Pinecone index. When passing the documents to Pinecone, we need an id (a unique value), context embedding, and metadata for each document representing context passages in the dataset. The metadata is a dictionary containing data relevant to our embeddings, such as the article title, section title, passage text, etc.

In [16]:
len(df)

23040

In [17]:
# we will use batches of 64
batch_size = 64
import time

# Record start time
start_time = time.time()

for i in tqdm(range(0, len(df), batch_size)):
    # find end of batch
    i_end = min(i+batch_size, len(df))
    # extract batch
    batch = df.iloc[i:i_end]
    # generate embeddings for batch
    emb = retriever.encode(batch["passage_text"].tolist()).tolist()
    # get metadata
    meta = batch.to_dict(orient="records")
    # create unique IDs
    ids = [f"{idx}" for idx in range(i, i_end)]
    # add all to upsert list
    to_upsert = list(zip(ids, emb, meta))
    # upsert/insert these records to pinecone
    _ = index.upsert(vectors=to_upsert)
    # Record end time
    end_time = time.time()

# check that we have all vectors in index
index.describe_index_stats()
# Calculate elapsed time
elapsed_time = end_time - start_time

print("Execution time:", elapsed_time, "seconds")

  0%|          | 0/360 [00:00<?, ?it/s]

Execution time: 9283.05222940445 seconds


In [18]:
print("Execution time:", (elapsed_time/3600), "hours")

Execution time: 2.5786256192790136 hours


In [17]:
batch["passage_text"]

0     was not until the 1950s that the region starte...
1     Sutarfeni History strand-like pheni were Phena...
2     The Bishop Wand Church of England School Histo...
3     made to preserve the original landscape, altho...
4     in perpetual reminder that work and worship go...
                            ...                        
59    see the Eldridge re-open as a hotel again in t...
60    at the southern end of Waitakere City and Cart...
61    TIAA Bank History TIAA Bank was created in 191...
62    built to act simultaneously as canals for boat...
63    the building was partially demolished in 1938,...
Name: passage_text, Length: 64, dtype: object

In [19]:
batch.to_dict(orient="records")

[{'article_title': 'Taupo District',
  'section_title': 'History',
  'passage_text': 'was not until the 1950s that the region started to develop, with forestry and the construction of the Wairakei geothermal power station.'},
 {'article_title': 'Sutarfeni',
  'section_title': 'History & Western asian analogues',
  'passage_text': 'Sutarfeni History strand-like pheni were Phenakas mentioned in various indian texts. Phenakas is a broad term which includes various dishes prepared by using layered fried dough. Vijayanagar records indicate that Pheni was another much relished sweet dish prepared from wheat flour and sugar, similar to phenaka of North India and had varieties like sugar pheni, milk pheni and vermicelli pheni Western asian analogues Sutarpheni is of the Indian analogs of the Turkish pismaniye, which uses wheat flour instead of rice flour, and the Persian pashmak, which substitutes sesame paste for wheat flour.  The choice of rice flour as the'},
 {'article_title': 'The Bishop 

# Initialize Generator

We will use ELI5 BART for the generator which is a Sequence-To-Sequence model trained using the ‘Explain Like I’m 5’ (ELI5) dataset. Sequence-To-Sequence models can take a text sequence as input and produce a different text sequence as output.I have used here vblagoje/bart_lfqa which have used Long-Form QA beyond ELI5(LQFA): an updated dataset.

In [6]:
from transformers import BartTokenizer, BartForConditionalGeneration

# load bart tokenizer and model from huggingface
tokenizer = BartTokenizer.from_pretrained('vblagoje/bart_lfqa')
generator = BartForConditionalGeneration.from_pretrained('vblagoje/bart_lfqa').to(device)

All the components of our abstract QA system are complete and ready to be queried. But first, let's write some helper functions to retrieve context passages from Pinecone index and to format the query in the way the generator expects the input.

In [7]:
def query_pinecone(query, top_k):
    # generate embeddings for the query
    xq = retriever.encode([query]).tolist()
    # search pinecone index for context passage with the answer
    xc = index.query(vector=xq, top_k=top_k, include_metadata=True)
    return xc

In [8]:
def format_query(query, context):
    # extract passage_text from Pinecone search result and add the <P> tag
    context = [f"<P> {m['metadata']['passage_text']}" for m in context]
    # concatinate all context passages
    context = " ".join(context)
    # contcatinate the query and context passages
    query = f"question: {query} context: {context}"
    return query

In [9]:
def generate_answer(query):
    # tokenize the query to get input_ids
    inputs = tokenizer([query], max_length=1024, return_tensors="pt").to(device)
    # use generator to predict output ids
    ids = generator.generate(inputs["input_ids"], num_beams=2, min_length=20, max_length=40)
    # use tokenizer to decode the output ids
    answer = tokenizer.batch_decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    return answer

In [21]:
from datasets import load_dataset

qa_dataset=load_dataset("vblagoje/lfqa")

In [22]:
qa_dataset

DatasetDict({
    train: Dataset({
        features: ['q_id', 'title', 'selftext', 'document', 'subreddit', 'url', 'answers', 'title_urls', 'selftext_urls', 'answers_urls'],
        num_rows: 226147
    })
    validation: Dataset({
        features: ['q_id', 'title', 'selftext', 'document', 'subreddit', 'url', 'answers', 'title_urls', 'selftext_urls', 'answers_urls'],
        num_rows: 3020
    })
    test: Dataset({
        features: ['q_id', 'title', 'selftext', 'document', 'subreddit', 'url', 'answers', 'title_urls', 'selftext_urls', 'answers_urls'],
        num_rows: 10000
    })
})

In [9]:
qa_dataset["train"]["title"]

["what's the difference between a forest and a wood?",
 'Are there any good source material on the Warsaw Ghetto to be had online?',
 'we do we instinctively grab a part of our body after it is hurt?',
 'Following the passing of the Thirteenth Amendment, were there any cases of slave-owners attempting to continue the practice illegally?',
 "In medieval and pre-modern times, political entities made marriage pacts between heirs in order to secure peace. Often times, this didn't last for more than 20 years, if not even less. Why did they even bother?",
 'What happened to German and Italian volunteers in the International Brigades of the Spanish Civil War after they were disbanded in 1938?',
 'What (if anything) did Native Americans think lay beyond the Pacific and Atlantic oceans?',
 'Is it possible that in the distant future, an organism that exists right now on Earth, will evolve to the point that it can be considered intelligent life? Which animal comes the closest? ',
 "How much can y

In [24]:
qa_dataset["train"]["answers"][3]

{'a_id': ['cbc0e13', 'cbc0gmv', 'cbc2ka2'],
 'score': [31, 58, 10],
 'text': ['All across the South during the years following the civil war, a series of "Black codes" were passed into law. Their purpose was to effectively re-enslave the freed slaves by justifying their forced labor by labeling them vagrants, essentially making unemployment illegal and thereby allowing a state to force a former slave to work by arresting them and then using them as convict labor.\n\nSometimes, the black codes were simply pre-civil war slave laws with the word "slave" replaced with "negro."',
  'It was less a few dark corners, and more a concerted effort by large swathes of society, who attempted to keep slavery alive in all but name. Here is the Fourth Circuit discussing some of this history:\n\n > The South was far from wholly reconciled to the abandonment of the system of forced labor that contributed significantly to the economic success of its agriculture. *See* [R. Fogel and S. Engerman, *Time on 

In [38]:
qa_dataset["train"]["answers"][3]["text"]

['All across the South during the years following the civil war, a series of "Black codes" were passed into law. Their purpose was to effectively re-enslave the freed slaves by justifying their forced labor by labeling them vagrants, essentially making unemployment illegal and thereby allowing a state to force a former slave to work by arresting them and then using them as convict labor.\n\nSometimes, the black codes were simply pre-civil war slave laws with the word "slave" replaced with "negro."',
 'It was less a few dark corners, and more a concerted effort by large swathes of society, who attempted to keep slavery alive in all but name. Here is the Fourth Circuit discussing some of this history:\n\n > The South was far from wholly reconciled to the abandonment of the system of forced labor that contributed significantly to the economic success of its agriculture. *See* [R. Fogel and S. Engerman, *Time on the Cross* (1974)](_URL_1_). Many planters felt strongly that they simply coul

In [14]:
query = "Following the passing of the Thirteenth Amendment, were there any cases of slave-owners attempting to continue the practice illegally?"
context = query_pinecone(query, top_k=10)
context

{'matches': [{'id': '11918',
              'metadata': {'article_title': 'Muri, Nigeria',
                           'passage_text': 'of the British administration a '
                                           'favorite route for the smuggling '
                                           'of slaves.',
                           'section_title': 'History'},
              'score': 0.492974043,
              'values': []},
             {'id': '22951',
              'metadata': {'article_title': 'Tuk band',
                           'passage_text': 'were forced to leave their drums '
                                           'behind, but found the Mahogany '
                                           'trees on the island well suited '
                                           'for the drum base, and they '
                                           'fashioned the drum skin from '
                                           'sheep, goats and cattle.\n'
                                   

In [17]:
from pprint import pprint

In [18]:
# format the query in the form generator expects the input
query = format_query(query, context["matches"])
pprint(query)

('question: question: Following the passing of the Thirteenth Amendment, were '
 'there any cases of slave-owners attempting to continue the practice '
 'illegally? context: <P> of the British administration a favorite route for '
 'the smuggling of slaves. <P> were forced to leave their drums behind, but '
 'found the Mahogany trees on the island well suited for the drum base, and '
 'they fashioned the drum skin from sheep, goats and cattle.\n'
 'The English slave owners instituted a law in the late 1600s to outlaw the '
 'playing of drums, with one of the penalties being death.\n'
 'The plantation owners were afraid the slaves would use the drums to "talk to '
 'each other", and organize rebellions.\n'
 'The drums were an intricate part of the African culture, and the African '
 'slaves could no more stop playing the drum as they could stop breathing.\n'
 'The slaves simply <P> to 45, while also including former military officers '
 'up to age 64, as codified in 10 U.S.C.\xa0§\xa024

In [19]:
answer=generate_answer(query)
answer

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


"I can't speak to the Thirteenth Amendment specifically, but I can speak to the practice of the Fugitive Slave Act of 1850. The Fugitive Slave Act was a law passed by"

To evaluate the model I have used rogue. There are several answers present in the dataset. I have compared with Every questions.

In [23]:
from rouge import Rouge

reference = qa_dataset["train"]["answers"][3]["text"]

# initialize the rouge object
rouge = Rouge()

rogue_scores=[]
rogue_1_scores=[]
# get the scores
for i in range(len(reference)):
    rogue_score=rouge.get_scores(answer, reference[i])
    rogue_scores.append(rogue_score)
    rogue_1_score=rogue_score[0]["rouge-1"]["r"]
    rogue_1_scores.append(rogue_1_score)
print(rogue_scores)
print(rogue_1_scores)

[[{'rouge-1': {'r': 0.125, 'p': 0.36363636363636365, 'f': 0.18604650782044357}, 'rouge-2': {'r': 0.0, 'p': 0.0, 'f': 0.0}, 'rouge-l': {'r': 0.109375, 'p': 0.3181818181818182, 'f': 0.1627906938669552}}], [{'rouge-1': {'r': 0.02952029520295203, 'p': 0.36363636363636365, 'f': 0.05460750714347288}, 'rouge-2': {'r': 0.005208333333333333, 'p': 0.07692307692307693, 'f': 0.009756096373111388}, 'rouge-l': {'r': 0.02952029520295203, 'p': 0.36363636363636365, 'f': 0.05460750714347288}}], [{'rouge-1': {'r': 0.04285714285714286, 'p': 0.2727272727272727, 'f': 0.07407407172687097}, 'rouge-2': {'r': 0.0, 'p': 0.0, 'f': 0.0}, 'rouge-l': {'r': 0.04285714285714286, 'p': 0.2727272727272727, 'f': 0.07407407172687097}}]]
[0.125, 0.02952029520295203, 0.04285714285714286]


In [24]:
reference

['All across the South during the years following the civil war, a series of "Black codes" were passed into law. Their purpose was to effectively re-enslave the freed slaves by justifying their forced labor by labeling them vagrants, essentially making unemployment illegal and thereby allowing a state to force a former slave to work by arresting them and then using them as convict labor.\n\nSometimes, the black codes were simply pre-civil war slave laws with the word "slave" replaced with "negro."',
 'It was less a few dark corners, and more a concerted effort by large swathes of society, who attempted to keep slavery alive in all but name. Here is the Fourth Circuit discussing some of this history:\n\n > The South was far from wholly reconciled to the abandonment of the system of forced labor that contributed significantly to the economic success of its agriculture. *See* [R. Fogel and S. Engerman, *Time on the Cross* (1974)](_URL_1_). Many planters felt strongly that they simply coul

In [26]:
import numpy as np

reference[np.argmax(np.array(rogue_1_scores))]

'All across the South during the years following the civil war, a series of "Black codes" were passed into law. Their purpose was to effectively re-enslave the freed slaves by justifying their forced labor by labeling them vagrants, essentially making unemployment illegal and thereby allowing a state to force a former slave to work by arresting them and then using them as convict labor.\n\nSometimes, the black codes were simply pre-civil war slave laws with the word "slave" replaced with "negro."'

As the answers is not a fixed value and can have multiple answers. It's not easy to evaluate the model like this. But if we go through the answers, It is giving decent results.

In [14]:
from datasets import DatasetDict

# Assuming you already have the DatasetDict object named dataset_dict

train_dataset = qa_dataset['train']
random_sample = train_dataset.shuffle(seed=42).select(range(100))

In [15]:
# Renaming the 'title' column to 'questions'
random_sample = random_sample.rename_column('title', 'questions')

In [16]:
random_sample

Dataset({
    features: ['q_id', 'questions', 'selftext', 'document', 'subreddit', 'url', 'answers', 'title_urls', 'selftext_urls', 'answers_urls'],
    num_rows: 100
})

In [17]:
import pandas as pd

# Assuming the Dataset is already loaded and named 'dataset'
dataset_dict = {
    'questions': random_sample['questions'],
    'answers': random_sample['answers']
}

df = pd.DataFrame(dataset_dict)

In [18]:
df

Unnamed: 0,questions,answers
0,"When our sun explodes into a supernova, how wo...","{'a_id': ['c4nkstt', 'c4nm4qc', 'c4p62pc'], 's..."
1,what is pump and dump scam?,"{'a_id': ['cw8qtwh', 'cw8rwno'], 'score': [3, ..."
2,can somebody please explain cosmic microwave b...,"{'a_id': ['c2fvfje', 'c2fvfje'], 'score': [6, ..."
3,Were any of the pre-Colonial societies north o...,"{'a_id': ['ccr9kto', 'ccrcqwz'], 'score': [3, ..."
4,Are most venoms designed to affect specific ty...,"{'a_id': ['cy5az0q'], 'score': [2], 'text': ['..."
...,...,...
95,What is the connection between electricity and...,"{'a_id': ['cp9fwli'], 'score': [3], 'text': ['..."
96,What is the simplest organism known to develop...,"{'a_id': ['cjnco10'], 'score': [14], 'text': [..."
97,Does gravity interact with electrons?,"{'a_id': ['djlx2pa', 'djlx9vp', 'djlxvks'], 's..."
98,What happened to the royal families of Europe ...,"{'a_id': ['clpqjwd', 'clp1r2v'], 'score': [2, ..."


In [19]:
def extract_answer(answer_list):
    answers = answer_list["text"]
    return answers

# Assuming df is your DataFrame
df['answers'] = df['answers'].apply(extract_answer)

In [20]:
df.head()

Unnamed: 0,questions,answers
0,"When our sun explodes into a supernova, how wo...","[Our sun won't actually go supernova, as it do..."
1,what is pump and dump scam?,"[Do you remember back early in this century, h..."
2,can somebody please explain cosmic microwave b...,"[When the universe was very young—as in, less ..."
3,Were any of the pre-Colonial societies north o...,[One of the most warlike and military active N...
4,Are most venoms designed to affect specific ty...,[Venoms are usually tailored to a species prey...


In [115]:
def get_answer(query):
    context = query_pinecone(query, top_k=10)
    query = format_query(query, context["matches"])
    answer=generate_answer(query)
    return answer

In [116]:
small_sample=df.sample(n=10, random_state=42)

In [117]:
small_sample

Unnamed: 0,questions,answers
83,How does light actually interact with molecules?,[It's a bit hard to answer without knowing wha...
53,What record is there for Alexander's conquest ...,"[I'm afraid I have to give you a really, reall..."
70,What kind of spider did I just kill?,"[Is it in [this lineup](_URL_0_)?, It probably..."
45,do animals suffer from shrinking gene pools?,[It's not common but it can happen if it's sev...
44,How do they keep radiation from contaminating ...,[There are separate cooling loops. The water c...
39,why does sugar cause sleepiness and in some ac...,"[Generally, the energy provided is stored for ..."
22,why do we use udp when tcp exists?,[UDP: entire message not guaranteed to be deli...
80,Newton's Cradle (the 5 metal pendulums swingin...,[Momentum is conserved only when there are no ...
10,Have there been and recent significant geograp...,[Those of us who draw historic maps always hav...
0,"When our sun explodes into a supernova, how wo...","[Our sun won't actually go supernova, as it do..."


In [118]:
import time

# Record start time
start_time = time.time()

small_sample["Model_Answer"]=small_sample['questions'].apply(get_answer)

# Record end time
end_time = time.time()

# Calculate elapsed time
elapsed_time = end_time - start_time

print("Execution time:", (elapsed_time/60), "minutes")

Execution time: 7.824950365225474 minutes


In [119]:
small_sample

Unnamed: 0,questions,answers,Model_Answer
83,How does light actually interact with molecules?,[It's a bit hard to answer without knowing wha...,Light interacts with molecules in a variety of...
53,What record is there for Alexander's conquest ...,"[I'm afraid I have to give you a really, reall...",I'm not sure if this is what you're looking fo...
70,What kind of spider did I just kill?,"[Is it in [this lineup](_URL_0_)?, It probably...","It's probably a spider, but I'm not sure what ..."
45,do animals suffer from shrinking gene pools?,[It's not common but it can happen if it's sev...,"Yes, they do. The problem is that the gene poo..."
44,How do they keep radiation from contaminating ...,[There are separate cooling loops. The water c...,The water evaporates out of the top of the pla...
39,why does sugar cause sleepiness and in some ac...,"[Generally, the energy provided is stored for ...",Sugar is a sugar molecule. It is a sugar molec...
22,why do we use udp when tcp exists?,[UDP: entire message not guaranteed to be deli...,It's not. UDP is a protocol that allows you to...
80,Newton's Cradle (the 5 metal pendulums swingin...,[Momentum is conserved only when there are no ...,The pendulum is not breaking. It is just movin...
10,Have there been and recent significant geograp...,[Those of us who draw historic maps always hav...,I'm not sure if this is what you're looking fo...
0,"When our sun explodes into a supernova, how wo...","[Our sun won't actually go supernova, as it do...",I'm not sure if this is what you're looking fo...


In [128]:
small_sample = small_sample.reset_index(drop=True)

In [129]:
small_sample

Unnamed: 0,questions,answers,Model_Answer
0,How does light actually interact with molecules?,[It's a bit hard to answer without knowing wha...,Light interacts with molecules in a variety of...
1,What record is there for Alexander's conquest ...,"[I'm afraid I have to give you a really, reall...",I'm not sure if this is what you're looking fo...
2,What kind of spider did I just kill?,"[Is it in [this lineup](_URL_0_)?, It probably...","It's probably a spider, but I'm not sure what ..."
3,do animals suffer from shrinking gene pools?,[It's not common but it can happen if it's sev...,"Yes, they do. The problem is that the gene poo..."
4,How do they keep radiation from contaminating ...,[There are separate cooling loops. The water c...,The water evaporates out of the top of the pla...
5,why does sugar cause sleepiness and in some ac...,"[Generally, the energy provided is stored for ...",Sugar is a sugar molecule. It is a sugar molec...
6,why do we use udp when tcp exists?,[UDP: entire message not guaranteed to be deli...,It's not. UDP is a protocol that allows you to...
7,Newton's Cradle (the 5 metal pendulums swingin...,[Momentum is conserved only when there are no ...,The pendulum is not breaking. It is just movin...
8,Have there been and recent significant geograp...,[Those of us who draw historic maps always hav...,I'm not sure if this is what you're looking fo...
9,"When our sun explodes into a supernova, how wo...","[Our sun won't actually go supernova, as it do...",I'm not sure if this is what you're looking fo...


In [132]:
small_sample["questions"]

0     How does light actually interact with molecules?
1    What record is there for Alexander's conquest ...
2                 What kind of spider did I just kill?
3         do animals suffer from shrinking gene pools?
4    How do they keep radiation from contaminating ...
5    why does sugar cause sleepiness and in some ac...
6                   why do we use udp when tcp exists?
7    Newton's Cradle (the 5 metal pendulums swingin...
8    Have there been and recent significant geograp...
9    When our sun explodes into a supernova, how wo...
Name: questions, dtype: object

In [131]:
for i in range(10):
    print("Question:",small_sample["questions"][i])
    print("\n")
    print("Predicted Answer:",small_sample["Model_Answer"][i])
    print("\n")

Question: How does light actually interact with molecules?


Predicted Answer: Light interacts with molecules in a variety of ways, but the most common way is via the electromagnetic force. The electromagnetic force is the force between two charged particles, and it is the force that


Question: What record is there for Alexander's conquest of India from non-Hellenic sources (contemporary)?


Predicted Answer: I'm not sure if this is what you're looking for, but I can tell you that there is a lot of evidence for Alexander's conquest of India. The first thing that comes to


Question: What kind of spider did I just kill?


Predicted Answer: It's probably a spider, but I'm not sure what kind of spider you're talking about. I'm not sure what kind of spider you're talking about, but I can tell you


Question: do animals suffer from shrinking gene pools?


Predicted Answer: Yes, they do. The problem is that the gene pool is so small that it's hard to tell the difference between the two. For

As we can see, the model can generate some decent answers.