In [1]:
%%bash

pip install --upgrade pip
pip install farm-haystack[colab,inference]

Collecting pip
  Downloading pip-24.0-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 8.0 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-24.0
Collecting farm-haystack[colab,inference]
  Downloading farm_haystack-1.25.2-py3-none-any.whl.metadata (27 kB)
Collecting boilerpy3 (from farm-haystack[colab,inference])
  Downloading boilerpy3-1.0.7-py3-none-any.whl.metadata (5.8 kB)
Collecting events (from farm-haystack[colab,inference])
  Downloading Events-0.5-py3-none-any.whl.metadata (3.9 kB)
Collecting httpx (from farm-haystack[colab,inference])
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting lazy-imports==0.3.1 (from farm-haystack[colab,inference])
  Downloading lazy_imports-0.3.1-py3-none-any.whl.metadata (10 kB)
Collecting posthog (from farm-haystack[colab



In [2]:
from haystack.telemetry import tutorial_running

tutorial_running(1)

Set the logging level to INFO:

In [3]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

In [4]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_bm25=True)

INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0


The DocumentStore is now ready. Now it's time to fill it with some Documents.

## Preparing Documents

1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in *data/build_your_first_question_answering_system* as a set of *.txt* files.

In [5]:
from haystack.utils import fetch_archive_from_http

doc_dir = "data/build_your_first_question_answering_system"

fetch_archive_from_http(
    url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip",
    output_dir=doc_dir,
)

INFO:haystack.utils.import_utils:Fetching from https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip to 'data/build_your_first_question_answering_system'


True

In [6]:
import os
from haystack.pipelines.standard_pipelines import TextIndexingPipeline

files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)]
indexing_pipeline = TextIndexingPipeline(document_store)
indexing_pipeline.run_batch(file_paths=files_to_index)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
INFO:haystack.pipelines.base:It seems that an indexing Pipeline is run, so using the nodes' run method instead of run_batch.
Converting files: 100%|██████████| 183/183 [00:01<00:00, 111.47it/s]
Preprocessing: 100%|██████████| 183/183 [00:03<00:00, 57.73docs/s]
Updating BM25 representation...: 100%|██████████| 2359/2359 [00:00<00:00, 4506.20 docs/s]


{'documents': [<Document: {'content': '\n\n"\'\'\'The Dragon and the Wolf\'\'\'" is the seventh and final episode of the seventh season of HBO\'s fantasy television series \'\'Game of Thrones\'\', and the 67th episode overall. It was written by series co-creators David Benioff and D. B. Weiss, and directed by Jeremy Podeswa.\n\nThe episode\'s plot includes a negotiation between Cersei and Daenerys, and a rift between Cersei and Jaime; Theon rededicates himself to Yara; Sansa and Arya unite against Littlefinger; Jon Snow is revealed to be the child of Lyanna Stark and Rhaegar Targaryen; Jon and Daenerys\'s romantic relationship comes to fruition; and the Army of the Dead penetrates the Wall. "The Dragon and the Wolf" received a positive reception from critics, who listed the meeting at the Dragonpit, the full revelation of Jon Snow\'s lineage, Cersei\'s lack of cooperation to defeat the White Walkers, Aidan Gillen\'s final performance as Littlefinger, and the demolition of the Wall as h

In [7]:
from haystack.nodes import BM25Retriever

retriever = BM25Retriever(document_store=document_store)

The Retriever is ready but we still need to initialize the Reader.

In [13]:
from haystack.nodes import FARMReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0
INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0
INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'deepset/roberta-base-squad2' (Roberta)
INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'deepset/roberta-base-squad2' (Roberta model) from model hub.
INFO:haystack.modeling.utils:Using devices: CPU - Number of GPUs: 0


We've initalized all the components for our pipeline. We're now ready to create the pipeline.

In [9]:
from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader, retriever)

The pipeline's ready, you can now go ahead and ask a question!

In [10]:
prediction = pipe.run(
    query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)

Inferencing Samples: 100%|██████████| 1/1 [00:18<00:00, 18.70s/ Batches]


Here are some questions
- Who is the father of Arya Stark?
- Who created the Dothraki vocabulary?
- Who is the sister of Sansa?

2. Print out the answers the pipeline returned:

In [11]:
from pprint import pprint

pprint(prediction)

{'answers': [<Answer {'answer': 'Eddard', 'type': 'extractive', 'score': 0.993372917175293, 'context': "s Nymeria after a legendary warrior queen. She travels with her father, Eddard, to King's Landing when he is made Hand of the King. Before she leaves,", 'offsets_in_document': [{'start': 207, 'end': 213}], 'offsets_in_context': [{'start': 72, 'end': 78}], 'document_ids': ['9e3c863097d66aeed9992e0b6bf1f2f4'], 'meta': {'_split_id': 3}}>,
             <Answer {'answer': 'Ned', 'type': 'extractive', 'score': 0.9753612279891968, 'context': "k in the television series.\n\n====Season 1====\nArya accompanies her father Ned and her sister Sansa to King's Landing. Before their departure, Arya's h", 'offsets_in_document': [{'start': 630, 'end': 633}], 'offsets_in_context': [{'start': 74, 'end': 77}], 'document_ids': ['7d3360fa29130e69ea6b2ba5c5a8f9c8'], 'meta': {'_split_id': 10}}>,
             <Answer {'answer': 'Lord Eddard Stark', 'type': 'extractive', 'score': 0.9177317023277283, 'context':

3. Simplify the printed answers:

In [12]:
from haystack.utils import print_answers

print_answers(prediction, details="minimum")  ## Choose from `minimum`, `medium`, and `all`

'Query: Who is the father of Arya Stark?'
'Answers:'
[   {   'answer': 'Eddard',
        'context': 's Nymeria after a legendary warrior queen. She travels '
                   "with her father, Eddard, to King's Landing when he is made "
                   'Hand of the King. Before she leaves,'},
    {   'answer': 'Ned',
        'context': 'k in the television series.\n'
                   '\n'
                   '====Season 1====\n'
                   'Arya accompanies her father Ned and her sister Sansa to '
                   "King's Landing. Before their departure, Arya's h"},
    {   'answer': 'Lord Eddard Stark',
        'context': 'rk daughters.\n'
                   '\n'
                   'During the Tourney of the Hand to honour her father Lord '
                   'Eddard Stark, Sansa Stark is enchanted by the knights '
                   'performing in the event.'},
    {   'answer': 'Ned',
        'context': ' girl disguised as a boy all along and is surprised to '
      