### Imports and pre-recs

In [84]:
from haystack.nodes import AnswerParser, PromptNode, PromptTemplate, TransformersReader, BM25Retriever
from haystack.schema import Document
from haystack.document_stores import InMemoryDocumentStore
from haystack.utils import convert_files_to_docs
from haystack import Pipeline

### Test document with basic config

Testing basic [Haystack](https://haystack.deepset.ai/) default config.

In [26]:
# Read test file content
test_file = open("test_article.txt", "r", encoding="utf8")
test_content = test_file.read()
print(test_content)
test_file.close()

Every six weeks the members of THE Bank of England’s Monetary Policy Committee commute – I’d imagine from homes in Surrey (train to Waterloo then Waterloo & City Line) or Hampshire (train to Waterloo then Waterloo & City Line) or Sussex (train to Victoria then District/Circle Line to Mansion House and a short walk) – to Bank, where the BoE is situated, right next to the station. The members meet – every six weeks – to vote. To vote on what rate will be set as the Bank Rate.


In [43]:
# Create test Document
test_doc = [Document(test_content)]

In [None]:
# Setting up test
prompt_node = PromptNode()
question_answering_per_doc = PromptTemplate("deepset/question-answering-per-document", output_parser=AnswerParser())

In [42]:
# Generating answer from prompt
result = prompt_node.prompt(
    query="How often does the Bank of England’s Monetary Policy Committee meet?",
    documents=test_doc,
    prompt_template=question_answering_per_doc
)

In [34]:
# Printing result
print(result[0])

<Answer: answer='every six weeks', score=None, context=None>


Fuck me that took a long time.

### Adding custom model

Try with [tinyroberta-squad2](https://huggingface.co/deepset/tinyroberta-squad2) QA model to see what we get.

In [39]:
HF_MODEL_NAME = 'deepset/tinyroberta-squad2' 

reader = TransformersReader(
    model_name_or_path=HF_MODEL_NAME,
    tokenizer=HF_MODEL_NAME,
    use_gpu=-1
)

config.json:   0%|          | 0.00/835 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/326M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/383 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

In [57]:
result = reader.predict(
    query="How often does the Bank of England’s Monetary Policy Committee meet?",
    documents=test_doc,
    top_k=10
)

In [58]:
# Printing result
print(result['answers'][0])

<Answer: answer='Every six weeks', score=0.7004600167274475, context='Every six weeks the members of THE Bank of England...'>


### Multiple documents

In [61]:
# Change to docuemnt store template
question_anwering_with_scores = PromptTemplate("deepset/question-answering-with-document-scores")

In [62]:
# Document store from memory
document_store = InMemoryDocumentStore(use_bm25=True)

In [69]:
# Load docs into doc store
docs = convert_files_to_docs('ArticleStore')
document_store.write_documents(docs)

Updating BM25 representation...: 100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<?, ? docs/s]


In [76]:
# Test documents loaded
result = reader.predict(
    query="When was the BoE formed?",
    documents=document_store,
    top_k=10
)
print(result['answers'][0])

<Answer: answer='1694', score=0.9720327854156494, context='The BoE was formed in 1694 to help finance the war...'>


### Simple pipeline

In [82]:
# Create retriever and prompt_node
retriever = BM25Retriever(document_store)
reader = TransformersReader(
    model_name_or_path=HF_MODEL_NAME,
    tokenizer=HF_MODEL_NAME,
    use_gpu=-1
)

In [86]:
# Create basic pipeline
p = Pipeline()
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=reader, name="Reader", inputs=["Retriever"])

In [87]:
# Test the pipeline
result = p.run(query="When was the BoE formed?")

In [91]:
print(result['answers'][0])

<Answer: answer='1694', score=0.9720327854156494, context='The BoE was formed in 1694 to help finance the war...'>


TODO:

- More formal input pipeline
- Web scraper
- Formalise into classes
- Testing/playing