[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/search/question-answering/abstractive-question-answering.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/search/question-answering/abstractive-question-answering.ipynb)

# Abstractive Question Answering

Abstractive question-answering focuses on the generation of multi-sentence answers to open-ended questions. It usually works by searching massive document stores for relevant information and then using this information to synthetically generate answers. This notebook demonstrates how Pinecone helps you build an abstractive question-answering system. We need three main components:

- A vector index to store and run semantic search
- A retriever model for embedding context passages
- A generator model to generate answers

# Install Dependencies

In [None]:
!pip install -qU datasets pinecone-client sentence-transformers torch
!pip install -qU datasets pinecone-client sentence-transformers torch
!pip install PyPDF2
!pip install fitz
!pip install frontend
!pip install tools
!pip install --upgrade PyMuPDF
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m493.7/493.7 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.4/179.4 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.4/300.4 kB[0m [31m1

Collecting frontend
  Downloading frontend-0.0.3-py3-none-any.whl (32 kB)
Collecting starlette>=0.12.0 (from frontend)
  Downloading starlette-0.31.1-py3-none-any.whl (69 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.9/69.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting uvicorn>=0.7.1 (from frontend)
  Downloading uvicorn-0.23.2-py3-none-any.whl (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.5/59.5 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting aiofiles (from frontend)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting h11>=0.8 (from uvicorn>=0.7.1->frontend)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, aiofiles, uvicorn, starlette, frontend
[31mERROR: pip's dependency resolver does not currently take into account all the packages tha

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [None]:
import os
import re
import string
import time
from nltk.stem import WordNetLemmatizer
from nltk import word_tokenize
from nltk.corpus import stopwords
wn = WordNetLemmatizer()

# Check if the 'static/' directory exists, and create it if not
static_directory = 'static/'
if not os.path.exists(static_directory):
    os.makedirs(static_directory)


import fitz  # PyMuPDF
import pandas as pd

# Function to extract text from a PDF
def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    texts = []

    for page_number in range(len(doc)):
        page = doc[page_number]

        # Extract text from the page
        page_text = page.get_text()
        texts.append(page_text)

    return texts

# Provide a list of PDF file paths
#pdf_file_paths = ["file1.pdf", "file2.pdf", "file3.pdf","file4.pdf", "file6.pdf", "file7.pdf"]

pdf_file_paths = ["MeatLife2.pdf"]

# Create a list to store documents
docs = []

for pdf_file_path in pdf_file_paths:
    pdf_texts = extract_text_from_pdf(pdf_file_path)
    docs.append({"passage_text": pdf_texts})

# Create a pandas DataFrame with the documents
df1 = pd.DataFrame(docs)
df1['passage_text3'] = df1['passage_text'].apply(lambda x: ' '.join(x))

# df = df1
# df.head()


# def split_text_into_rows(text, max_length):
#     split_text = [text[i:i + max_length] for i in range(0, len(text), max_length)]
#     return split_text

# # Maximum character count for each row
# max_length = 100

# # Split the 'text' column into multiple rows
# df1['passage_text2'] = df1['passage_text'].apply(lambda x: split_text_into_rows(x, max_length))

# # Expand the list of split text into multiple rows
# df1 = df1.explode('passage_text2')

# # Reset the DataFrame index
# df1.reset_index(drop=True, inplace=True)
# df1.head()

delimiter = '.'
split_data = df1['passage_text3'].str.split(delimiter, expand=True)
split_data = split_data.stack().reset_index(level=1, drop=True)
split_data.name = 'passage_text2'
split_data
# Create a new DataFrame with the split data
df = pd.concat([df1, split_data], axis=1)
df['passage_text2'] = df['passage_text2'].replace('\n',' ', regex=True)

df.head()

# # Drop the original 'Text' column if it's no longer needed
# #df = df1.drop('passage_text2', axis=1)
# def clean_txt(text):
#    text = text.replace('\n', ' ')
#    text = text.replace("'", '')
#    #text = text.replace('.', '')
#    text = text.replace('"', '')
#    text = text.replace(',', '')
#    clean_text = [ wn.lemmatize(word, pos="v") for word in word_tokenize(text.lower())]
#    #clean_text2 = [word for word in clean_text if black_txt(word)]
#    return " ".join(clean_text)
#    #return text


# df['passage_text2'] = df['passage_text2'].apply(clean_txt)
# df.head()


Unnamed: 0,passage_text,passage_text3,passage_text2
0,[Advantages of Masan MEATLife \nafter restruct...,Advantages of Masan MEATLife \nafter restructu...,Advantages of Masan MEATLife after restructur...
0,[Advantages of Masan MEATLife \nafter restruct...,Advantages of Masan MEATLife \nafter restructu...,"Accordingly, MML will be restructured to sepa..."
0,[Advantages of Masan MEATLife \nafter restruct...,Advantages of Masan MEATLife \nafter restructu...,The company will transform into a business p...
0,[Advantages of Masan MEATLife \nafter restruct...,Advantages of Masan MEATLife \nafter restructu...,Masan MEATLife (a member company of Masan Gro...
0,[Advantages of Masan MEATLife \nafter restruct...,Advantages of Masan MEATLife \nafter restructu...,The company invested in the rest of the suppl...


In [None]:
df.drop(['passage_text','passage_text3'],axis=1,inplace=True)
df.reset_index(inplace=True,drop=True)
df.head()

Unnamed: 0,passage_text2
0,Advantages of Masan MEATLife after restructur...
1,"Accordingly, MML will be restructured to sepa..."
2,The company will transform into a business p...
3,Masan MEATLife (a member company of Masan Gro...
4,The company invested in the rest of the suppl...


In [None]:
# from transformers import AutoProcessor, Pix2StructForConditionalGeneration
# import requests
# from PIL import Image

# model = Pix2StructForConditionalGeneration.from_pretrained("google/deplot")
# processor = AutoProcessor.from_pretrained("google/deplot")
# url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
# image = Image.open(requests.get(url, stream=True).raw)

# inputs = processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt")
# predictions = model.generate(**inputs, max_new_tokens=512)
#print(processor.decode(predictions[0], skip_special_tokens=True))

Downloading (…)lve/main/config.json:   0%|          | 0.00/4.88k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.13G [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/249 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.62k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/851k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/3.27M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)solve/main/Arial.TTF:   0%|          | 0.00/276k [00:00<?, ?B/s]

In [None]:
new_row = processor.decode(predictions[0], skip_special_tokens=True)

ValueError: ignored

In [None]:
df = pd.concat([df, new_row], axis=0)
df.tail(1)

TypeError: ignored

In [None]:
df.tail(1)

# Initialize Pinecone Index

The Pinecone index stores vector representations of our historical passages which we can retrieve later using another vector (query vector). To build our vector index, we must first establish a connection with Pinecone. For this, we need an API from Pinecone. You can get one for free from [here](https://app.pinecone.io/), and after that, we initialize the connection as follows:

In [None]:
import pinecone

# connect to pinecone environment
pinecone.init(
    api_key="bb491778-4927-4fab-913d-18ca0578500a",
    environment="gcp-starter"  # find next to API key in console
)

Now we create a new index. We will name it "abstractive-question-answering" — you can name it anything we want. We specify the metric type as "cosine" and dimension as 768 because the retriever we use to generate context embeddings is optimized for cosine similarity and outputs 768-dimension vectors.

In [None]:
# Initialize Retriever

In [None]:
import torch
from sentence_transformers import SentenceTransformer

# set device to GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# load the retriever model from huggingface model hub
#retriever = SentenceTransformer("flax-sentence-embeddings/all_datasets_v3_mpnet-base", device=device)
retriever = SentenceTransformer("sentence-transformers/msmarco-roberta-base-v2", device=device)
#retriever = SentenceTransformer("sentence-transformers/multi-qa-MiniLM-L6-cos-v1", device=device)
#retriever = SentenceTransformer("sentence-transformers/multi-qa-mpnet-base-dot-v1", device=device)
#retriever = SentenceTransformer("sentence-transformers/all-distilroberta-v1", device=device)


retriever

SentenceTransformer(
  (0): Transformer({'max_seq_length': 250, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

In [None]:
import torch
from sentence_transformers import SentenceTransformer

# set device to GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
retriever = SentenceTransformer("sentence-transformers/msmarco-roberta-base-v2", device=device)



retriever

In [None]:
retriever.get_sentence_embedding_dimension()

768

In [None]:
index_name = "abstractive-question-answering"

# check if the abstractive-question-answering index exists
if index_name not in pinecone.list_indexes():
    # create the index if it does not exist
    pinecone.create_index(
        index_name,
        dimension=retriever.get_sentence_embedding_dimension(),
        metric="cosine"
    )

# connect to abstractive-question-answering index we created
index = pinecone.Index(index_name)

Next, we need to initialize our retriever. The retriever will mainly do two things:

- Generate embeddings for all historical passages (context vectors/embeddings)
- Generate embeddings for our questions (query vector/embedding)

The retriever will create embeddings such that the questions and passages that hold the answers to our queries are close to one another in the vector space. We will use a SentenceTransformer model based on Microsoft's MPNet as our retriever. This model performs quite well for comparing the similarity between queries and documents. We can use Cosine Similarity to compute the similarity between query and context vectors generated by this model (Pinecone automatically does this for us).

# Generate Embeddings and Upsert

In [None]:
from tqdm.auto import tqdm

batch_size = 64

for i in tqdm(range(0, len(df), batch_size)):
    # find end of batch
    i_end = min(i+batch_size, len(df))
    # extract batch
    batch = df.iloc[i:i_end]
    # generate embeddings for batch
    emb = retriever.encode(batch["passage_text2"].tolist()).tolist()
    # get metadata
    meta = batch.to_dict(orient="records")
    # create unique IDs
    ids = [f"{idx}" for idx in range(i, i_end)]
    # add all to upsert list
    to_upsert = list(zip(ids, emb, meta))
    # upsert/insert these records to pinecone
    _ = index.upsert(vectors=to_upsert)

# check that we have all vectors in index
index.describe_index_stats()

  0%|          | 0/1 [00:00<?, ?it/s]

{'dimension': 768,
 'index_fullness': 0.03107,
 'namespaces': {'': {'vector_count': 3107}},
 'total_vector_count': 3107}

# Initialize Generator

We will use ELI5 BART for the generator which is a Sequence-To-Sequence model trained using the ‘Explain Like I’m 5’ (ELI5) dataset. Sequence-To-Sequence models can take a text sequence as input and produce a different text sequence as output.

The input to the ELI5 BART model is a single string which is a concatenation of the query and the relevant documents providing the context for the answer. The documents are separated by a special token &lt;P>, so the input string will look as follows:

>question: What is a sonic boom? context: &lt;P> A sonic boom is a sound associated with shock waves created when an object travels through the air faster than the speed of sound. &lt;P> Sonic booms generate enormous amounts of sound energy, sounding similar to an explosion or a thunderclap to the human ear. &lt;P> Sonic booms due to large supersonic aircraft can be particularly loud and startling, tend to awaken people, and may cause minor damage to some structures. This led to prohibition of routine supersonic flight overland.

More detail on how the ELI5 dataset was built is available [here](https://arxiv.org/abs/1907.09190) and how ELI5 BART model was trained is available [here](https://yjernite.github.io/lfqa.html).

Let's initialize the BART model using transformers.

In [None]:
from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import T5Tokenizer, T5ForConditionalGeneration
from transformers import XLNetTokenizer, XLNetModel

load bart tokenizer and model from huggingface
tokenizer = BartTokenizer.from_pretrained('vblagoje/bart_lfqa')
generator = BartForConditionalGeneration.from_pretrained('vblagoje/bart_lfqa').to(device)

#tokenizer = T5Tokenizer.from_pretrained('google/flan-t5-xxl') #session crashed
#generator = T5ForConditionalGeneration.from_pretrained('google/flan-t5-xxl').to(device)



tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
generator = XLNetModel.from_pretrained('xlnet-large-cased').to(device)






All the components of our abstract QA system are complete and ready to be queried. But first, let's write some helper functions to retrieve context passages from Pinecone index and to format the query in the way the generator expects the input.

In [None]:
def query_pinecone(query, top_k):
    # generate embeddings for the query
    xq = retriever.encode([query]).tolist()
    # search pinecone index for context passage with the answer
    xc = index.query(xq, top_k=top_k, include_metadata=True)
    return xc

In [None]:
def format_query(query, context):
    # extract passage_text from Pinecone search result and add the <P> tag
    context = [f"<P> {m['metadata']['passage_text2']}" for m in context]
    # concatinate all context passages
    context = " ".join(context)
    # contcatinate the query and context passages
    query = f"question: {query} context: {context}"
    return query

Let's test the helper functions. We will query the Pinecone index function we created earlier with the `query_pinecone` to get context passages and pass them to the `format_query` function.

In [None]:
query = "What is MeatDeli?"
result = query_pinecone(query, top_k=1)
result

{'matches': [{'id': '49',
              'metadata': {'passage_text': ['Ideate, innovate, &\n'
                                            'Increase mileage on your cloud i\n'
                                            'Podcasts\n'
                                            'YouTube\n'
                                            'Need to know\n'
                                            'Ad\n'
                                            'Sign up for Tuoi Tre Sao\n'
                                            'September 10, 2021 15:00 GMT+7\n'
                                            'Advantages of Masan MEATLife '
                                            'after\n'
                                            'restructuring\n'
                                            'On September 10, the Board of '
                                            'Directors of Masan MEATLife Joint '
                                            'Stock Company\n'
                                    

In [None]:
from pprint import pprint

In [None]:
# format the query in the form generator expects the input
query = format_query(query, result["matches"])
pprint(query)

('question: What is MeatDeli? context: <P>  MEATDeli clean meat is processed '
 'using European cool meat technology VIDEO NEWS WORLD LAW BUSINESS TECHNOLOGY '
 'CAR TOURISM LIFESTYLE YOUNG CULTURE ENTERTAINMENT SPORT EDUCATION REAL '
 'ESTATE HEALTH REAL FAKE YO  Bạn có biết Cô gái liên tục đạp chân')


In [None]:
def generate_answer(query):
    # tokenize the query to get input_ids
    inputs = tokenizer([query], max_length=1024, return_tensors="pt").to(device)
    # use generator to predict output ids
    ids = generator.generate(inputs["input_ids"], num_beams=2, min_length=20, max_length=40)
    # use tokenizer to decode the output ids
    answer = tokenizer.batch_decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    return pprint(answer)

In [None]:
generate_answer(query)

TypeError: ignored

As we can see, the generator used the provided context to answer our question. Let's run some more queries.

In [None]:
# query = "When was MeatDeli launched"
# context = query_pinecone(query, top_k=5)
# query = format_query(query, context["matches"])
# generate_answer(query)

To confirm that this answer is correct, we can check the contexts used to generate the answer.

In [None]:
# for doc in context["matches"]:
#     print(doc["metadata"]["passage_text2"], end='\n---\n')

In this case, the answer looks correct. If we ask a question and no relevant contexts are retrieved, the generator will typically return nonsensical or false answers, like with this question about COVID-19:

In [None]:
# query = "What is MeatDeli?"
# context = query_pinecone(query, top_k=10)
# query = format_query(query, context["matches"])
# generate_answer(query)

In [None]:
# for doc in context["matches"]:
#     print(doc["metadata"]["passage_text2"], end='\n---\n')

Let’s finish with a final few questions.

In [None]:
query = "When was meatdeli launched?"
context = query_pinecone(query, top_k=15)
query = format_query(query, context["matches"])
generate_answer(query)

('The first MEATDeli opened in Hanoi, Vietnam in 2015. The first MEATDeli was '
 'opened in Hanoi, Vietnam in 2015. The first MEATDel')


In [None]:
context

{'matches': [{'id': '21',
              'metadata': {'passage_text2': ' Meat has become an independent '
                                            'business segment with  a '
                                            'signiﬁcant scale of MML, '
                                            'contributing 2,068 billion VND'},
              'score': 0.49969846,
              'values': []},
             {'id': '7',
              'metadata': {'passage_text2': ' Potential of meat segment With a '
                                            'population of nearly 100 million '
                                            'people, average income is  '
                                            'continuously improving, causing '
                                            'the demand for clean, traceable '
                                            'meat  to increase in Vietnam'},
              'score': 0.485234499,
              'values': []},
             {'id': '2',
              'metadata

In [None]:
query = "What is the most profitable/sold product of MeatDeli?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

('MeatDeli has a lot of products that are profitable. The most profitable '
 'product is the "Clean Meat" brand. It\'s a clean meat brand that is sold at '
 'a lot of supermarkets')


In [None]:
query = "What does MeatDeli sell?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

('MeatDeli is a company that sells clean meat. They sell it in a variety of '
 'flavors and textures.')


In [None]:
query = "What type of meat MeatDeli sell?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

('MeatDeli sells a variety of meats, but the most common type is "clean meat" '
 'which is meat that has been processed using European cool meat technology.')


In [None]:
query = "Is MeatDeli profitable?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

("It's not profitable, but it's profitable. The company is profitable because "
 "it has a lot of cash. It's not profitable because it has a lot of debt. It's "
 'profitable because')


In [None]:
query = "How much did MeatDeli revenue grow?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

("I'm not sure if this is what you're looking for, but I can tell you that "
 "it's not that much. The revenue of MEATDeli is about 1.5 billion")


In [None]:
query = "Where did MeatDeli invest?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

('MeatDeli is a subsidiary of the Masan Group, which is a conglomerate. The '
 'Masan Group owns a lot of businesses in Vietnam, including a lot of meat '
 'processing and meat distribution')


In [None]:
query = "Give me some information about pork industry from meatdeli?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

('Pork comes from pigs that are raised for meat. Pigs are raised for meat '
 'because they are easy to domesticate. Pigs are raised for meat because they '
 'are easy to domesticate. Pigs')


In [None]:
query = "Give me a summary of MeatDeli's achievements , revenue growth and profit"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

("MeatDeli is the first clean meat restaurant in the world. It's been around "
 "for a few years now, but it's the first clean meat restaurant in the US. "
 "It's been")


In [None]:
query = "Give me some Potential of meat segment"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)


("I'm not sure if this is a good question, but I'll give it a shot. The "
 "potential of meat segment in Vietnam is very high, and it's not even close "
 'to the')


In [None]:
query = "How much revenue did Meatdeli make?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

("I'm not sure if this is what you're looking for, but here's a link to a "
 "report on MEATDeli's revenue.")


In [None]:
context

{'matches': [{'id': '21',
              'metadata': {'passage_text2': ' Meat has become an independent '
                                            'business segment with  a '
                                            'signiﬁcant scale of MML, '
                                            'contributing 2,068 billion VND'},
              'score': 0.49969846,
              'values': []},
             {'id': '7',
              'metadata': {'passage_text2': ' Potential of meat segment With a '
                                            'population of nearly 100 million '
                                            'people, average income is  '
                                            'continuously improving, causing '
                                            'the demand for clean, traceable '
                                            'meat  to increase in Vietnam'},
              'score': 0.485234499,
              'values': []},
             {'id': '2',
              'metadata

As we can see, the model can generate some decent answers.

# Example Application

To try out an application like this one, see this [example application](https://huggingface.co/spaces/pinecone/abstractive-question-answering).