# DEMO FILE
##### Kyle Sang


### Loading Haystack Pipeline
Download the database file here: https://drive.google.com/file/d/1PBxwupuJB5RqyjCnmMDCRDW29MK4fw_8/view?usp=share_link
Download the FAISS file here: https://drive.google.com/file/d/1QEqdGUj5MBcvfMMv37QVfzt5Z0CmiXoe/view?usp=share_link

In [1]:
from haystack.document_stores.faiss import FAISSDocumentStore

new_document_store = FAISSDocumentStore.load(index_path="QAindex.faiss", config_path="QAconfig.json")

  from .autonotebook import tqdm as notebook_tqdm
  init_func(self, *args, **kwargs)


In [2]:
from haystack.nodes import EmbeddingRetriever

retriever = EmbeddingRetriever(document_store=new_document_store,
                               embedding_model="yjernite/retribert-base-uncased",
                               model_format="retribert")

Some weights of RetriBertModel were not initialized from the model checkpoint at yjernite/retribert-base-uncased and are newly initialized: ['bert_query.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
from haystack.utils import print_answers, print_documents
from haystack.pipelines import DocumentSearchPipeline

question = "Who was the father of Arya Stark?"
p_retrieval = DocumentSearchPipeline(retriever)
res = p_retrieval.run(
    query=question,
    params={"Retriever":{"top_k":10}}
)
context = ""
for document in res['documents']:
    context += document.content

In [12]:
print_documents(res, max_text_len=256)


Query: Who was the father of Arya Stark?

{   'content': 'would leave.\n'
               "Although the saga author puts stock in Melkorka's royal Irish "
               "heritage, as seen through both Höskuldr's reaction to the "
               'revelation ("Hoskuld said that she had too long concealed such '
               'a noble birth.") as well as Egill Skallagrímsson\'s when '
               'spea...',
    'name': 'Jorunn Bjarnadottir'}

{   'content': 'Freya Stark Early life and studies Stark was born on 31 '
               'January 1893 in Paris, where her parents were studying art. '
               'Her mother, Flora, was an Italian of Polish-German descent; '
               'her father, Robert, an English painter from Devon. Stark spent '
               'much of her child...',
    'name': 'Freya Stark'}

{   'content': 'Stark is portrayed by English actress Maisie Williams in the '
               "television adaption of the book series, this being Williams' "
               

### Base Huggingface Model

In [13]:
from transformers import pipeline

qa_model = pipeline("question-answering")
qa_model(question = question, context = context)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.6770011186599731, 'start': 795, 'end': 801, 'answer': 'Robert'}

### Roberta Fine-Tuned on HotpotQA

In [14]:
from transformers import RobertaForQuestionAnswering, RobertaTokenizerFast
from transformers import pipeline
model = RobertaForQuestionAnswering.from_pretrained("roberta-base")
model.load_adapter("UKP-SQuARE/HotpotQA_Adapter_RoBERTa",  source="hf")
model.set_active_adapters("HotpotQA")

tokenizer = RobertaTokenizerFast.from_pretrained('roberta-base')

pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
pipe({"question": question,  "context": context})

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForQuestionAnswering: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use 

{'score': 0.9350637793540955,
 'start': 4280,
 'end': 4291,
 'answer': 'Earl Sigurd'}

### Haystack Model

In [15]:
from haystack.nodes import FARMReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

In [16]:
from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader, retriever)

In [21]:
prediction = pipe.run(
    query=question,
    params={
        "Retriever": {"top_k": 10},
        "Reader": {"top_k": 1}
    }
)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 55.42 Batches/s]


In [22]:
from haystack.utils import print_answers

print_answers(
    prediction,
    details="minimum"
)

'Query: Who was the father of Arya Stark?'
'Answers:'
[   {   'answer': 'Ned',
        'context': ' among 300 actresses across England. Season 1 Arya '
                   "accompanies her father Ned and her sister Sansa to King's "
                   "Landing. Before their departure, Arya's h"}]
