# Natural Language Processing

*Natural Language Processing: AI based QA System for Game Of Thrones [Qualitative Analysis of different models]*

We have an interactive chat bot that can have a conversation (*text based and voice based*) with the user and provide response to factoid type questions on the '[Game of Thrones](https://en.wikipedia.org/wiki/Game_of_Thrones)' television series.

The tool lets you do a qualitative comparison of the following NLP models: [BERT](https://huggingface.co/deepset/bert-base-cased-squad2), [RoBERTa](https://huggingface.co/deepset/roberta-base-squad2), [ELECTRA](https://huggingface.co/deepset/electra-base-squad2), and [DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad).

[This notebook](https://github.com/BenoyRNair/NLP_AI_QA) was developed and tested in ***Google Colab***.

## Credits

Libraries:
* [Haystack by Deepset](https://haystack.deepset.ai/overview/intro)
* [gTTS](https://pypi.org/project/gTTS/)
* [pydub](https://pypi.org/project/gTTS/)
* [Azure Speech Service](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/)

References:
* [BERT](https://huggingface.co/deepset/bert-base-cased-squad2)
* [RoBERTa](https://huggingface.co/deepset/roberta-base-squad2)
* [ELECTRA](https://huggingface.co/deepset/electra-base-squad2)
* [DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad)
* [Microsoft DialoGPT](https://huggingface.co/microsoft/DialoGPT-medium)
* [Haystack Tutorial](https://haystack.deepset.ai/tutorials/first-qa-system)

## Licence
@Author [Benoy R Nair](https://github.com/BenoyRNair)

@Author Chris Pryor

@Author Yolandie van der Westhuizen

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied See the License for the specific language governing permissions and limitations under the License.


# Setup

Choose GPU as the 'Hardware Accelerator' for the runtime.

We install the required libraries and software in this step; it might take several minutes and might need the 'Runtime' to be restared a couple of times.

In [1]:
# Make sure you have a GPU running
!nvidia-smi

Sat Jun 11 13:20:21 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Libraries

In [2]:
!pip install --upgrade pip > /dev/null
!pip install git+https://github.com/deepset-ai/haystack.git > /dev/null
#!pip install python-magic > /dev/null

from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
from haystack.nodes import FARMReader, TransformersReader

from ipywidgets import interactive
import ipywidgets as widgets

!pip install gtts > /dev/null

from gtts import gTTS #Import Google Text to Speech
from IPython.display import Audio #Import Audio method from IPython's Display Class

import time
import librosa
import gc

!apt-get install libsox-fmt-all libsox-dev sox > /dev/null
!python -m pip install torchaudio > /dev/null
!python -m pip install git+https://github.com/facebookresearch/WavAugment.git > /dev/null
!pip install ffmpeg-python > /dev/null

!pip install pydub > /dev/null
from pydub import AudioSegment

  Running command git clone --filter=blob:none --quiet https://github.com/deepset-ai/haystack.git /tmp/pip-req-build-uw1w8pts
[0m

INFO - haystack.modeling.model.optimization -  apex not found, won't use it. See https://nvidia.github.io/apex/
ERROR - root -  Failed to import 'magic' (from 'python-magic' and 'python-magic-bin' on Windows). FileTypeClassifier will not perform mimetype detection on extensionless files. Please make sure the necessary OS libraries are installed if you need this functionality.


[0m  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/WavAugment.git /tmp/pip-req-build-cluqr9kr
[0m

### Azure Speech Service

You can enable Speech service and find credentials for your account at [Microsoft Azure portal](https://portal.azure.com/). You can open a free account [here](https://azure.microsoft.com/en-in/free/ai/).

In [3]:
AZURE_SPEECH_KEY = '' #Specify Azure Speeck Key
AZURE_SERVICE_REGION = '' #Specify Azure Service Region


if (AZURE_SPEECH_KEY.strip() == '' or AZURE_SERVICE_REGION.strip() == ''):
  print ("The tool shal be limited to text based chat and text-to-speech capability.")
  print ("You need to define AZURE_SPEECH_KEY and AZURE_SERVICE_REGION to enable speech-to-text in this tool.")
else:
  !pip3 install azure-cognitiveservices-speech > /dev/null

[0m

## Utilities

### Microphone Input

Utility function for grabbing microphone input

In [4]:
# code taken from https://ricardodeazambuja.com/deep_learning/2019/03/09/audio_and_video_google_colab/
from IPython.display import HTML, Audio
from google.colab.output import eval_js
from base64 import b64decode
import numpy as np
import io
import ffmpeg
import tempfile
import pathlib
import torchaudio

AUDIO_INPUT_FILE = "audio-input.wav"

AUDIO_HTML = """
<script>
var my_div = document.createElement("DIV");
var my_p = document.createElement("P");
var my_btn = document.createElement("BUTTON");
var t = document.createTextNode("Press to start recording");

my_btn.appendChild(t);
//my_p.appendChild(my_btn);
my_div.appendChild(my_btn);
document.body.appendChild(my_div);

var base64data = 0;
var reader;
var recorder, gumStream;
var recordButton = my_btn;

var handleSuccess = function(stream) {
  gumStream = stream;
  var options = {
    //bitsPerSecond: 8000, //chrome seems to ignore, always 48k
    mimeType : 'audio/webm;codecs=opus'
    //mimeType : 'audio/webm;codecs=pcm'
  };            
  //recorder = new MediaRecorder(stream, options);
  recorder = new MediaRecorder(stream);
  recorder.ondataavailable = function(e) {            
    var url = URL.createObjectURL(e.data);
    var preview = document.createElement('audio');
    preview.controls = true;
    preview.src = url;
    document.body.appendChild(preview);

    reader = new FileReader();
    reader.readAsDataURL(e.data); 
    reader.onloadend = function() {
      base64data = reader.result;
      //console.log("Inside FileReader:" + base64data);
    }
  };
  recorder.start();
  };

recordButton.innerText = "Recording... press to stop";

navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);


function toggleRecording() {
  if (recorder && recorder.state == "recording") {
      recorder.stop();
      gumStream.getAudioTracks()[0].stop();
      recordButton.innerText = "Saving the recording... pls wait!"
  }
}

// https://stackoverflow.com/a/951057
function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

var data = new Promise(resolve=>{
//recordButton.addEventListener("click", toggleRecording);
recordButton.onclick = ()=>{
toggleRecording()

sleep(2000).then(() => {
  // wait 2000ms for the data to be available...
  // ideally this should use something like await...
  //console.log("Inside data:" + base64data)
  resolve(base64data.toString())

  sleep (1000).then(() => {
    recordButton.innerText = "Recording saved."
  });

});

}
});
      
</script>
"""

def get_audio():
  display(HTML(AUDIO_HTML))
  data = eval_js("data")
  binary = b64decode(data.split(',')[1])
  
  process = (ffmpeg
    .input('pipe:0')
    .output('pipe:1', format='wav')
    .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
  )
  output, err = process.communicate(input=binary)
  
  riff_chunk_size = len(output) - 8
  # Break up the chunk size into four bytes, held in b.
  q = riff_chunk_size
  b = []
  for i in range(4):
      q, r = divmod(q, 256)
      b.append(r)

  # Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
  riff = output[:4] + bytes(b) + output[8:]

  import os
  with tempfile.TemporaryDirectory() as tmpdirname:
    path = pathlib.Path(tmpdirname) / AUDIO_INPUT_FILE
    # The file appears to be saved in mp3 format, though extension says .WAV
    with open(path, 'wb') as f:
       f.write(riff)

    #print (path)
    x, sr = torchaudio.load(path)

    # To convert the *.mp3 file (in temp dir) to *.wav file in the project folder
    sound = AudioSegment.from_mp3 (path)
    sound.export (AUDIO_INPUT_FILE, format="wav")

  return x, sr

### Azure Speech-to-Text API

Batch API is being used here, as microphone input is captured as a WAV file.

In [5]:
if (AZURE_SPEECH_KEY.strip() == '' or AZURE_SERVICE_REGION.strip() == ''):
  print ("The tool shal be limited to text based chat and text-to-speech capability.")
  print ("You need to define AZURE_SPEECH_KEY and AZURE_SERVICE_REGION to enable speech-to-text in this tool.")
else:
  import azure.cognitiveservices.speech as speechsdk

  def azure_batch_stt(filename: str, lang: str, encoding: str) -> str:
      speech_config = speechsdk.SpeechConfig(
          subscription=AZURE_SPEECH_KEY,
          region=AZURE_SERVICE_REGION
      )
      audio_input = speechsdk.AudioConfig(filename=filename)
      speech_recognizer = speechsdk.SpeechRecognizer(
          speech_config=speech_config,
          audio_config=audio_input
      )
      result = speech_recognizer.recognize_once()

      return result.text if result.reason == speechsdk.ResultReason.RecognizedSpeech else None

## Elastic Search

In [6]:
# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2

import os
from subprocess import Popen, PIPE, STDOUT

es_server = Popen(
    ["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1)  # as daemon
)
# wait until ES has started
! sleep 30

from haystack.document_stores import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

INFO - haystack.telemetry -  Haystack sends anonymous usage data to understand the actual usage and steer dev efforts towards features that are most meaningful to users. You can opt-out at anytime by calling disable_telemetry() or by manually setting the environment variable HAYSTACK_TELEMETRY_ENABLED as described for different operating systems on the documentation page. More information at https://haystack.deepset.ai/guides/telemetry


## Knowledge Base

In [7]:
# Let's first fetch some documents that we want to query
# Here: 517 Wikipedia articles for Game of Thrones
doc_dir = "data/tutorial1"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

# Convert files to dicts
# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers)
# It must take a str as input, and return a str.
docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)

# We now have a list of dictionaries that we can write to our document store.
# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself.
# The default format here is:
# {
#    'content': "<DOCUMENT_TEXT_HERE>",
#    'meta': {'name': "<DOCUMENT_NAME_HERE>", ...}
# }
# (Optionally: you can also add more key-value-pairs here, that will be indexed as fields in Elasticsearch and
# can be accessed later for filtering or shown in the responses of the Pipeline)

# Let's have a look at the first 3 entries:
#print(docs[:3])

# Now, let's write the dicts containing documents to our DB.
document_store.write_documents(docs)

INFO - haystack.utils.import_utils -  Fetching from https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip to `data/tutorial1`
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/53_The_Lion_and_the_Rose.txt
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/408_The_Last_of_the_Starks.txt
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/360_List_of_Game_of_Thrones_episodes.txt
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/22_The_Rains_of_Castamere__song_.txt
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/136_Game_of_Thrones__Season_8__soundtrack_.txt
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/460_Battle_of_the_Bastards.txt
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/75_Blackwater__Game_of_Thrones_.txt
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/85_Game_of_Thrones__Seven_Kingdoms.txt
INFO - haysta

# Models

## Selection

In [8]:
models_dict = {
    'BERT': 'deepset/bert-base-cased-squad2'
    ,'RoBERTa': 'deepset/roberta-base-squad2'
    , 'ELECTRA': 'deepset/electra-base-squad2'
    , 'DistilBERT': 'distilbert-base-uncased-distilled-squad'
}

In [9]:
#@markdown Specify the model.
def model(MODEL):
  return MODEL

model_widget = interactive (model, MODEL=models_dict.keys())
display (model_widget)

selected_model = ''

interactive(children=(Dropdown(description='MODEL', options=('BERT', 'RoBERTa', 'ELECTRA', 'DistilBERT'), valu…

## Initialization

In [10]:
from haystack.nodes import BM25Retriever
retriever = BM25Retriever(document_store=document_store)

reader = FARMReader(model_name_or_path=models_dict[model_widget.result], use_gpu=True)
selected_model = model_widget.result

from haystack.pipelines import ExtractiveQAPipeline
pipe = ExtractiveQAPipeline(reader, retriever)

import transformers
nlp = transformers.pipeline("conversational", model="microsoft/DialoGPT-medium")
os.environ["TOKENIZERS_PARALLELISM"] = "true"

gc.collect()

INFO - haystack.modeling.utils -  Using devices: CUDA:0
INFO - haystack.modeling.utils -  Number of GPUs: 1
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find deepset/roberta-base-squad2 locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...


Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/473M [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  Loaded deepset/roberta-base-squad2


Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

INFO - haystack.modeling.utils -  Using devices: CUDA
INFO - haystack.modeling.utils -  Number of GPUs: 1
INFO - haystack.modeling.infer -  Got ya 2 parallel workers to do inference ...
INFO - haystack.modeling.infer -   0     0  
INFO - haystack.modeling.infer -  /w\   /w\ 
INFO - haystack.modeling.infer -  /'\   / \ 


Downloading:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/823M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

49

## Configuration

In [11]:
#@markdown Specify whether you want to talk to the bot

def checkbox_talk_to_bot (TALK_TO_BOT):
    return TALK_TO_BOT

cb_talk_to_bot = interactive (checkbox_talk_to_bot, TALK_TO_BOT=False)

if (AZURE_SPEECH_KEY.strip() == '' or AZURE_SERVICE_REGION.strip() == ''):
  print ("The tool shal be limited to text based chat and text-to-speech capability.")
  print ("You need to define AZURE_SPEECH_KEY and AZURE_SERVICE_REGION to enable speech-to-text in this tool.")
else:
  display (cb_talk_to_bot)

interactive(children=(Checkbox(value=False, description='TALK_TO_BOT'), Output()), _dom_classes=('widget-inter…

In [12]:
#@markdown Specify whether you want the bot to play out loud the responses

def checkbox_play_out_loud (PLAY_OUT_LOUD):
    return PLAY_OUT_LOUD

cb_play_out_loud = interactive (checkbox_play_out_loud, PLAY_OUT_LOUD=False)
display (cb_play_out_loud)

interactive(children=(Checkbox(value=False, description='PLAY_OUT_LOUD'), Output()), _dom_classes=('widget-int…

# Bot

In [13]:
AUDIO_OUTPUT_FILE = "audio-output.wav"

if (selected_model != model_widget.result):
  print ('It looks like you have selected a different model now.\nRun the \'Models > Initialization\' section above.')
else:
  ###############################################################################
  #                             CHATBOT                                         #
  ###############################################################################

  ## for data
  import os
  import datetime
  import numpy as np
  import re

  from IPython.utils import io

  # Build the AI
  class ChatBot():
      def __init__(self, name):
        print ("--- starting up", name, "---")
        print ("\nYou can write to me or talk to me.")
        print ("Type (or say) \'bye\' to exit the chat...\n")
        self.name = name
        self.bot_running = False

      def wake_up(self, text):
        return True if self.name.lower() in text.lower() else False

      def read_text (self):
        self.text = input("You: ")

      def listen_speech (self):
        x, sr = get_audio()
        self.text = azure_batch_stt (AUDIO_INPUT_FILE, 'en-US', 'LINEAR16')

        if not self.text:
          self.text = ""
        
        print ("You: " + self.text)

      @staticmethod
      def action_time():
        return datetime.datetime.now().time().strftime('%H:%M')
      
      def check_and_play_out_loud(self, text):
        if cb_play_out_loud.result:
          tts = gTTS (text) 
          tts.save (AUDIO_OUTPUT_FILE)
          wn = Audio (AUDIO_OUTPUT_FILE, autoplay = True)
          display (wn)
          time.sleep (librosa.get_duration (filename = AUDIO_OUTPUT_FILE))

      def output (self, text):
        if not re.search ('[a-zA-Z0-9]', text):
            text = 'I got no comment!'

        print ("\tBot:", text)
        ai.check_and_play_out_loud (text)

      def run_bot (self):
        while ai.bot_running:
          require_conversation = True

          if cb_talk_to_bot.result:
            # listed to input speech
            ai.listen_speech()
          else:
            # read from input text
            ai.read_text()

          ## no input
          if not ai.text:
            res = "Is everything alright? You are so quiet today."

          ## wake up
          elif ai.wake_up(ai.text) is True:
            res = "Hello, I am WaterBot; what can I do for you?"
          
          ## action time
          elif "time" in ai.text.lower():
            res = "The time is " + ai.action_time()
          
          ## name
          elif any (i in ai.text.strip().lower() for i in ['your name', 'who are you', 'what are you']):
            res = "Hello, I'm " + ai.name
          
          ## respond politely
          elif any(i in ai.text for i in ["thank you","thanks"]):
            res = np.random.choice(["you're welcome!",
                                    "anytime!",
                                    "no problem!",
                                    "cool!",
                                    "I'm here if you need me!",
                                    "peace out!"])

          elif any (i in ai.text.strip().lower() for i in ['bye', 'good night', 'see ya', 'see you', 'exit', 'c ya', 'cya']):
            ai.output ('You have a good one.')
            break
          ## conversation
          else:
            with io.capture_output() as captured:
              prediction = pipe.run (query = ai.text, params = {"Retriever": {"top_k": 20}, "Reader": {"top_k": 5}});
            
            if prediction['answers']:
              prediction_attrs = vars(prediction['answers'][0])

              if (prediction_attrs['score'] >= 0.85):
                res = prediction_attrs['answer'] # + '\t' + str (prediction_attrs['score']) + '\t in the context: \"' + prediction_attrs['context'] + '\"'
                require_conversation = False

            if require_conversation:
              chat = nlp(transformers.Conversation(ai.text), pad_token_id=50256)
              res = str(chat)
              res = res[res.find("bot >> ") + 6:].strip()

          ai.output (res)
        
  # Run the AI
  if __name__ == "__main__":
      
      ai = ChatBot (name = "GoT Bot")

      if cb_talk_to_bot.result:
        btn = widgets.Button(description='Start')

        def button_eventhandler(obj):
          if not ai.bot_running:
            btn.layout.visibility = 'hidden'
            ai.bot_running = True
            ai.run_bot()

        btn.on_click(button_eventhandler)
        display(btn)
      else:
        ai.bot_running = True
        ai.run_bot()

--- starting up GoT Bot ---

You can write to me or talk to me.
Type (or say) 'bye' to exit the chat...

You: hi
	Bot: Hey! :D


You: who is arya stark's father?
	Bot: Eddard


You: Who is Jon's direwolf?
	Bot: Ghost


You: bye
	Bot: You have a good one.


# Qualitiative Assessment

A test set with the following twenty questions were used to evaluate and assess the performance of the models.

Positive marks were awarded to each correct response, and negative marks for each wrong response. Weightage was also given to the confidence score estimated by the model for each response and the relative performance of the model against the other models for each question.

The assessment scored the models in the following order (from best performing to least performing):

1. [ELECTRA](https://huggingface.co/deepset/electra-base-squad2)
2. [RoBERTa](https://huggingface.co/deepset/roberta-base-squad2)
3. [BERT](https://huggingface.co/deepset/bert-base-cased-squad2)
4. [DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad)

Refer to the [scoring worksheet](https://github.com/BenoyRNair/NLP_AI_QA/blob/main/Assessment/GoT_QnA_Scoring.xlsx) for details of the assessment.

In [None]:
# Questions & answers sourced from https://www.funtrivia.com/en/Television/Game-of-Thrones-20275.html
# The correct answer to each question is included as a comment after each question.

query_txt = "Who is the father of Arya Stark?" #Eddard Stark
#query_txt = "Which chemical was used during the Battle of the Blackwater to destroy Stannis Baratheon's fleet?" #Wildfire
#query_txt = "In which Westeros constituency can we find The Dreadfort?" #The North
#query_txt = "Who is Myrcella betrothed to in season 2?" #Trystane Martell
#query_txt = "Who is Jon's direwolf?" #Ghost
#query_txt = "Which name is given to the bastards of The Reach?" #Flowers
#query_txt = "Which is the main color of House Tarly's sigil?" #Green
#query_txt = "The Water Gardens belong to which constituency of Westeros?" #Dorne
#query_txt = "What nasty creature felled a drunken King Robert?" #Boar
#query_txt = "What is the name of Arya Stark's sword?" #Needle
#query_txt = "What is the motto of House Tyrell?" #Growing Strong
#query_txt = "What is the nickname given to Brynden Tully?" #Blackfish
#query_txt = "Where can we find the Moon Door?" #The Eyrie
#query_txt = "Where was Catelyn Stark when she died?" #The Twins
#query_txt = "Who was Shagga's father?" #Dolf
#query_txt = "Which name is given to the inhabitants of The Neck?" #Crannogmen
#query_txt = "Who did Jon execute after his first general meeting as Lord Commander with the men of the Night's Watch?" #Janos Slynt
#query_txt = "Which house has a sigil of a silver trout?" #House Tully
#query_txt = "What is the name of the ancestral Valyrian steel sword belonging to House Tarly?" #Heartsbane
#query_txt = "Who was Hand of the King before Eddard Stark?" #Jon Arryn

prediction = pipe.run(
    query= query_txt, params={"Retriever": {"top_k": 20}, "Reader": {"top_k": 5}}
)

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  3.82 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  9.18 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.63 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.58 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.87 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 10.27 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 29.73 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.41 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.98 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 13.61 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.30 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 11.56 Batches/s

In [None]:
print_answers(prediction, details="minimum")


Query: What is the nickname given to Brynden Tully?
Answers:
[   {   'answer': 'Blackfish',
        'context': " meet with Littlefinger, who reveals that Sansa's "
                   'great-uncle Brynden "Blackfish" Tully has captured House '
                   "Tully's home Riverrun from the Freys. When "},
    {   'answer': 'Blackfish',
        'context': 'l, where the young king is reunited with his great-uncle, '
                   'Ser Brynden "Blackfish" Tully, and his uncle, Edmure '
                   'Tully, the new lord of Riverrun. While '},
    {   'answer': 'Blackfish',
        'context': " in retaking Winterfell and mentions that Sansa's "
                   'great-uncle Brynden "Blackfish" Tully has seized Riverrun '
                   "from the Freys. Sansa refuses Baelish's of"},
    {   'answer': 'Blackfish',
        'context': '\n'
                   '====Season 6====\n'
                   'After Brynden "Blackfish" Tully captures Riverrun from '
                   

In [None]:
from pprint import pprint

print(prediction)

{'query': 'What is the nickname given to Brynden Tully?', 'no_ans_gap': 37.75481033325195, 'answers': [<Answer {'answer': 'Blackfish', 'type': 'extractive', 'score': 0.9999924302101135, 'context': ' meet with Littlefinger, who reveals that Sansa\'s great-uncle Brynden "Blackfish" Tully has captured House Tully\'s home Riverrun from the Freys. When ', 'offsets_in_document': [{'start': 875, 'end': 884}], 'offsets_in_context': [{'start': 71, 'end': 80}], 'document_id': 'dc6ed243e3d72f4a3748735a4049ccd', 'meta': {'name': '488_Brienne_of_Tarth.txt'}}>, <Answer {'answer': 'Blackfish', 'type': 'extractive', 'score': 0.9999918937683105, 'context': 'l, where the young king is reunited with his great-uncle, Ser Brynden "Blackfish" Tully, and his uncle, Edmure Tully, the new lord of Riverrun. While ', 'offsets_in_document': [{'start': 1816, 'end': 1825}], 'offsets_in_context': [{'start': 71, 'end': 80}], 'document_id': '500af9c33e7e9c60a892a74756e9e14a', 'meta': {'name': '349_List_of_Game_of_Thro