# Make Your QA Pipelines Talk!

<img style="float: right;" src="https://upload.wikimedia.org/wikipedia/en/d/d8/Game_of_Thrones_title_card.jpg">

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial17_Audio.ipynb)

Question answering works primarily on text, but Haystack provides some features for audio files that contain speech as well.

In this tutorial, we're going to see how to use `AnswerToSpeech` to convert answers into audio files.

### Prepare environment

#### Colab: Enable the GPU runtime
Make sure you enable the GPU runtime to experience decent speed in this tutorial.
**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/img/colab_gpu_runtime.jpg">

In [1]:
# Make sure you have a GPU running
!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


In [2]:
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest master of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git@text2speech#egg=farm-haystack[colab,audio]

Collecting farm-haystack[audio,colab]
  Cloning https://github.com/deepset-ai/haystack.git (to revision text2speech) to /tmp/pip-install-fdtl7ojv/farm-haystack_87043446aa4842bdb3d78d2d18afa755
  Running command git clone --filter=blob:none --quiet https://github.com/deepset-ai/haystack.git /tmp/pip-install-fdtl7ojv/farm-haystack_87043446aa4842bdb3d78d2d18afa755
  Running command git checkout -b text2speech --track origin/text2speech
  Switched to a new branch 'text2speech'
  Branch 'text2speech' set up to track remote branch 'text2speech' from 'origin'.
  Resolved https://github.com/deepset-ai/haystack.git to commit 695908b8e21707763e8e70f9a45e1a059d3f5d2e
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting grpcio==1.43.0
  Using cached grpcio-1.43.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
Building wheels for collected packages: farm-hays

### Setup Elasticsearch


In [4]:
# Recommended: Start Elasticsearch using Docker via the Haystack utility function
from haystack.utils import launch_es

launch_es()

docker: Error response from daemon: Conflict. The container name "/elasticsearch" is already in use by container "08d21e28c8d4a43d645a5aaf11ec41f2611f7397874014f599c51bbd89a7061a". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.


In [5]:
# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2

import os
from subprocess import Popen, PIPE, STDOUT

es_server = Popen(
    ["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1)  # as daemon
)
# wait until ES has started
! sleep 30

chown: changing ownership of 'elasticsearch-7.9.2/modules/x-pack-ilm/x-pack-ilm-7.9.2.jar': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/x-pack-ilm/NOTICE.txt': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/x-pack-ilm/LICENSE.txt': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/x-pack-ilm/plugin-descriptor.properties': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/x-pack-ilm': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/search-business-rules/NOTICE.txt': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/search-business-rules/search-business-rules-7.9.2.jar': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/search-business-rules/LICENSE.txt': Operation not permitted
chown: changing ownership of 'elasticsearch-7.9.2/modules/search-business-rules

SubprocessError: Exception occurred in preexec_fn.

### Populate the document store with `SpeechDocuments`

First of all, we will populate the document store with a simple indexing pipeline. See [Tutorial 1](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial1_Basic_QA_Pipeline.ipynb) for more details about these steps.

To the basic version, we can add here a DocumentToSpeech node that also generates an audio file for each of the indexed documents. This will make possible, during querying, to access the audio version of the documents the answers were extracted from without having to generate it on the fly.

**Note**: this additional step can slow down your indexing quite a lot. Experiment with very small corpora at the start.

In [6]:
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.utils import fetch_archive_from_http, launch_es
from pathlib import Path
from haystack import Pipeline
from haystack.nodes import FileTypeClassifier, TextConverter, PreProcessor, DocumentToSpeech

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

# Get the documents
documents_path = "data/tutorial17"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt17.zip"
fetch_archive_from_http(url=s3_url, output_dir=documents_path)

# List all the paths
file_paths = [p for p in Path(documents_path).glob("**/*")]

# Note: In this example we're going to use only one text file from the wiki, as the DocumentToSpeech node is relatively slow
# on CPU machines. Comment out this line to use all documents from the dataset if you machine is powerful enough.
file_paths = [p for p in file_paths if "Arya_Stark" in p.name]

# Prepare some basic metadata for the files
files_metadata = [{"name": path.name} for path in file_paths]

# Here we create a basic indexing pipeline
indexing_pipeline = Pipeline()

# - Makes sure the file is a TXT file (FileTypeClassifier node)
classifier = FileTypeClassifier()
indexing_pipeline.add_node(classifier, name="classifier", inputs=["File"])

# - Converts a file into text and performs basic cleaning (TextConverter node)
text_converter = TextConverter(remove_numeric_tables=True)
indexing_pipeline.add_node(text_converter, name="text_converter", inputs=["classifier.output_1"])

# - Pre-processes the text by performing splits and adding metadata to the text (Preprocessor node)
preprocessor = PreProcessor(
    clean_whitespace=True,
    clean_empty_lines=True,
    split_length=100,
    split_overlap=50,
    split_respect_sentence_boundary=True,
)
indexing_pipeline.add_node(preprocessor, name="preprocessor", inputs=["text_converter"])

#
# DocumentToSpeech
#
# Here is where we convert all documents to be indexed into SpeechDocuments, that will hold not only
# the text content, but also their audio version.
doc2speech = DocumentToSpeech(
    model_name_or_path="espnet/kan-bayashi_ljspeech_vits", generated_audio_dir=Path("./generated_audio_documents")
)
indexing_pipeline.add_node(doc2speech, name="doc2speech", inputs=["preprocessor"])

# - Writes the resulting documents into the document store (ElasticsearchDocumentStore node from the previous cell)
indexing_pipeline.add_node(document_store, name="document_store", inputs=["doc2speech"])

# Then we run it with the documents and their metadata as input
indexing_pipeline.run(file_paths=file_paths, meta=files_metadata)

INFO - haystack.utils.import_utils -  Fetching from https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt17.zip to `data/tutorial17`
100%|██████████| 183/183 [00:01<00:00, 109.64docs/s]


{'documents': [<Document: {'content': '\n\nThe music for the fantasy TV series \'\'Game of Thrones\'\' is composed by Ramin Djawadi. The music is primarily non-diegetic and instrumental with the occasional vocal performances, and is created to support musically the characters and plots of the show. It features various themes, the most prominent being the "main title theme" that accompanies the series\' title sequence. In every season, a soundtrack album was released. The music for the show has won a number of awards, including a Primetime Emmy Award for Outstanding Music Composition for a Series in 2018.', 'content_type': 'text', 'score': None, 'meta': {'name': '454_Music_of_Game_of_Thrones.txt', '_split_id': 0}, 'embedding': None, 'id': '9a57926c1852af19e8d48032d3c113a'}>,
  <Document: {'content': 'It features various themes, the most prominent being the "main title theme" that accompanies the series\' title sequence. In every season, a soundtrack album was released. The music for the

### Querying
   
Now we will create a pipeline very similar to the basic `ExtractiveQAPipeline` of Tutorial 1,
with the addition of a node that converts our answers into audio files! Once the answer is retrieved, we can also listen to the audio version of the document where the answer came from.

In [7]:
from pathlib import Path
from haystack import Pipeline
from haystack.nodes import BM25Retriever, FARMReader, AnswerToSpeech

retriever = BM25Retriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2-distilled", use_gpu=True)
answer2speech = AnswerToSpeech(
    model_name_or_path="espnet/kan-bayashi_ljspeech_vits", generated_audio_dir=Path("./audio_answers")
)

audio_pipeline = Pipeline()
audio_pipeline.add_node(retriever, name="Retriever", inputs=["Query"])
audio_pipeline.add_node(reader, name="Reader", inputs=["Retriever"])
audio_pipeline.add_node(answer2speech, name="AnswerToSpeech", inputs=["Reader"])

INFO - haystack.modeling.utils -  Using devices: CPU
INFO - haystack.modeling.utils -  Number of GPUs: 0
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find deepset/roberta-base-squad2-distilled locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...
INFO - haystack.modeling.model.language_model -  Loaded deepset/roberta-base-squad2-distilled
INFO - haystack.modeling.utils -  Using devices: CPU
INFO - haystack.modeling.utils -  Number of GPUs: 0
INFO - haystack.modeling.infer -  Got ya 15 parallel workers to do inference ...
INFO - haystack.modeling.infer -   0     0     0     0     0     0     0     0     0     0     0     0     0     0     0  
INFO - haystack.modeling.infer -  /w\   /w\   /w\   /w\   /w\   /w\   /w\   /|\  /w\   /w\   /w\   /w\   /w\   /w\   /|\
INFO - haystack.modeling.infer -  /'\   / \   /'\   /'\   / \   / \   /'\   /'\   /

## Ask a question!

In [8]:
# You can configure how many candidates the Reader and Retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers.
prediction = audio_pipeline.run(
    query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.15s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.03s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.02s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.29 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.09 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.01s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.04s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.05s/ Batches]


In [9]:
# Now you can either print the object directly...
from pprint import pprint

pprint(prediction)

# Sample output:
# {
#     'answers': [ <SpeechAnswer:
#                       answer_audio=PosixPath('generated_audio_answers/fc704210136643b833515ba628eb4b2a.wav'),
#                       answer="Eddard",
#                       context_audio=PosixPath('generated_audio_answers/8c562ebd7e7f41e1f9208384957df173.wav'),
#                       context='...'
#                       type='extractive', score=0.9919578731060028,
#                       offsets_in_document=[{'start': 608, 'end': 615}], offsets_in_context=[{'start': 72, 'end': 79}],
#                       document_id='cc75f739897ecbf8c14657b13dda890e', meta={'name': '43_Arya_Stark.txt'}}  >,
#                  <SpeechAnswer:
#                       answer_audio=PosixPath('generated_audio_answers/07d6265486b22356362387c5a098ba7d.wav'),
#                       answer="Ned",
#                       context_audio=PosixPath('generated_audio_answers/3f1ca228d6c4cfb633e55f89e97de7ac.wav'),
#                       context='...'
#                       type='extractive', score=0.9767240881919861,
#                       offsets_in_document=[{'start': 3687, 'end': 3801}], offsets_in_context=[{'start': 18, 'end': 132}],
#                       document_id='9acf17ec9083c4022f69eb4a37187080', meta={'name': '43_Arya_Stark.txt'}}>,
#                  ...
#                ]
#     'documents': [ <SpeechDocument:
#                        content_type='text', score=0.8034909798951382, meta={'name': '43_Arya_Stark.txt'}, embedding=None, id=d1f36ec7170e4c46cde65787fe125dfe',
#                        content_audio=PosixPath('generated_audio_documents/07d6265486b22356362387c5a098ba7d.wav'),
#                        content='\n===\'\'A Game of Thrones\'\'===\nSansa Stark begins the novel by being betrothed to Crown ...'>,
#                    <SpeechDocument:
#                        content_type='text', score=0.8002150354529785, meta={'name': '191_Gendry.txt'}, embedding=None, id='dd4e070a22896afa81748d6510006d2',
#                        content_audio=PosixPath('generated_audio_documents/07d6265486b22356362387c5a098ba7d.wav'),
#                        content='\n===Season 2===\nGendry travels North with Yoren and other Night's Watch recruits, including Arya ...'>,
#                    ...
#                  ],
#     'no_ans_gap':  11.688868522644043,
#     'node_id': 'Reader',
#     'params': {'Reader': {'top_k': 5}, 'Retriever': {'top_k': 5}},
#     'query': 'Who is the father of Arya Stark?',
#     'root_node': 'Query'
# }

{'answers': [GeneratedAudioAnswer(answer=PosixPath('audio_answers/e134aa0534fe984926b6677a2a892457.wav'), type='generative', score=0.994974821805954, context=PosixPath('audio_answers/ae4b8aad87818f1e5d0b6154dcfa4f85.wav'), offsets_in_document=[Span(start=239, end=252)], offsets_in_context=[Span(start=69, end=82)], document_id='d2defab44a22020532de6b0c65e621a', meta={'_split_id': 36, 'name': '347_Game_of_Thrones__season_2_.txt', 'audio_format': 'wav', 'sample_rate': 22050}, answer_transcript='Balon Greyjoy', context_transcript='raits, although her rivalry with Theon remained intact. Their father Balon Greyjoy was played by Patrick Malahide. Many of the characters involved in '),
             GeneratedAudioAnswer(answer=PosixPath('audio_answers/07d6265486b22356362387c5a098ba7d.wav'), type='generative', score=0.9942662119865417, context=PosixPath('audio_answers/3f1ca228d6c4cfb633e55f89e97de7ac.wav'), offsets_in_document=[Span(start=376, end=379)], offsets_in_context=[Span(start=74, end=77

In [10]:
from haystack.utils import print_answers

# ...or use a util to simplify the output
# Change `minimum` to `medium` or `all` to raise the level of detail
print_answers(prediction, details="minimum")


# Sample output:
#
# Query: Who is the father of Arya Stark?
# Answers:
# [   {   'answer_audio': PosixPath('generated_audio_answers/07d6265486b22356362387c5a098ba7d.wav'),
#         'answer': 'Eddard',
#         'context_transcript': PosixPath('generated_audio_answers/3f1ca228d6c4cfb633e55f89e97de7ac.wav'),
#         'context': ' role of Arya Stark in the television series. '
#                    'Arya accompanies her father Eddard and her sister '
#                    'Sansa to King's Landing. Before their departure, Arya's h'},
#    {   'answer_audio': PosixPath('generated_audio_answers/83c3a02141cac4caffe0718cfd6c405c.wav'),
#        'answer': 'Lord Eddard Stark',
#        'context_audio': PosixPath('generated_audio_answers/8c562ebd7e7f41e1f9208384957df173.wav'),
#        'context': 'ark daughters. During the Tourney of the Hand '
#                   'to honour her father Lord Eddard Stark, Sansa '
#                   'Stark is enchanted by the knights performing in '
#                   'the event.'},
#    ...


Query: Who is the father of Arya Stark?
Answers:
[   {   'answer': PosixPath('audio_answers/e134aa0534fe984926b6677a2a892457.wav'),
        'answer_transcript': 'Balon Greyjoy',
        'context': PosixPath('audio_answers/ae4b8aad87818f1e5d0b6154dcfa4f85.wav'),
        'context_transcript': 'raits, although her rivalry with Theon remained '
                              'intact. Their father Balon Greyjoy was played '
                              'by Patrick Malahide. Many of the characters '
                              'involved in '},
    {   'answer': PosixPath('audio_answers/07d6265486b22356362387c5a098ba7d.wav'),
        'answer_transcript': 'Ned',
        'context': PosixPath('audio_answers/3f1ca228d6c4cfb633e55f89e97de7ac.wav'),
        'context_transcript': ' role of Arya Stark in the television series. '
                              'Arya accompanies her father Ned and her sister '
                              "Sansa to King's Landing. Before their "
                    

### Hear them out!

In [None]:
from IPython.display import display, Audio
import soundfile as sf

In [11]:
# The first answer in isolation

speech, _ = sf.read(prediction["answers"][0].answer_audio)
display(Audio(speech, rate=24000))

In [None]:
# The context of the first answer

speech, _ = sf.read(prediction["answers"][0].context_audio)
display(Audio(speech, rate=24000))

In [None]:
# The document the first answer was extracted from

original_document = [doc for doc in prediction["documents"] if doc.id == prediction["answers"][0].document_id][0]
speech, _ = sf.read(original_document.content)
display(Audio(speech, rate=24000))

## About us

This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany

We bring NLP to the industry via open source!  
Our focus: Industry specific language models & large scale QA systems.  
  
Some of our other work: 
- [German BERT](https://deepset.ai/german-bert)
- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)
- [FARM](https://github.com/deepset-ai/FARM)

Get in touch:
[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)

By the way: [we're hiring!](https://www.deepset.ai/jobs)
