# Better Retrieval via "Embedding Retrieval"

### Importance of Retrievers

The Retriever has a huge impact on the performance of our overall search pipeline.


### Different types of Retrievers
#### Sparse
Family of algorithms based on counting the occurrences of words (bag-of-words) resulting in very sparse vectors with length = vocab size.

**Examples**: BM25, TF-IDF

**Pros**: Simple, fast, well explainable

**Cons**: Relies on exact keyword matches between query and text


#### Dense
These retrievers use neural network models to create "dense" embedding vectors. Within this family, there are two different approaches:

a) Single encoder: Use a **single model** to embed both the query and the passage.
b) Dual-encoder: Use **two models**, one to embed the query and one to embed the passage.

**Examples**: REALM, DPR, Sentence-Transformers

**Pros**: Captures semantic similarity instead of "word matches" (for example, synonyms, related topics).

**Cons**: Computationally more heavy to use, initial training of the model (though this is less of an issue nowadays as many pre-trained models are available and most of the time, it's not needed to train the model).


### Embedding Retrieval

In this Tutorial, we use an `EmbeddingRetriever` with [Sentence Transformers](https://www.sbert.net/index.html) models.

These models are trained to embed similar sentences close to each other in a shared embedding space.

Some models have been fine-tuned on massive Information Retrieval data and can be used to retrieve documents based on a short query (for example, `multi-qa-mpnet-base-dot-v1`). There are others that are more suited to semantic similarity tasks where you are trying to find the most similar documents to a given document (for example, `all-mpnet-base-v2`). There are even models that are multilingual (for example, `paraphrase-multilingual-mpnet-base-v2`). For a good overview of different models with their evaluation metrics, see the [Pretrained Models](https://www.sbert.net/docs/pretrained_models.html#) in the Sentence Transformers documentation.



## Preparing the Colab Environment

- [Enable GPU Runtime](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab)


## Installing Haystack

To start, let's install the latest release of Haystack with `pip`:

In [None]:
%%bash

pip install --upgrade pip
pip install farm-haystack[colab,faiss,inference,preprocessing,file-conversion]

Collecting pip
  Downloading pip-23.3-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 33.1 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.3
Collecting farm-haystack[colab,faiss,file-conversion,inference,preprocessing]
  Downloading farm_haystack-1.21.2-py3-none-any.whl.metadata (26 kB)
Collecting boilerpy3 (from farm-haystack[colab,faiss,file-conversion,inference,preprocessing])
  Downloading boilerpy3-1.0.6-py3-none-any.whl (22 kB)
Collecting events (from farm-haystack[colab,faiss,file-conversion,inference,preprocessing])
  Downloading Events-0.5-py3-none-any.whl.metadata (3.9 kB)
Collecting httpx (from farm-haystack[colab,faiss,file-conversion,inference,preprocessing])
  Downloading httpx-0.25.0-py3-none-any.whl.metadata (7.6 kB)
Collecting lazy-imports==0.3.1 (from farm-hay

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires openai, which is not installed.
ipython-sql 0.5.0 requires sqlalchemy>=2.0, but you have sqlalchemy 1.4.49 which is incompatible.
tensorflow 2.13.0 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.8.0 which is incompatible.


### Enabling Telemetry
Knowing you're using this tutorial helps us decide where to invest our efforts to build a better product but you can always opt out by commenting the following line. See [Telemetry](https://docs.haystack.deepset.ai/docs/telemetry) for more details.

In [None]:
from haystack.telemetry import tutorial_running

tutorial_running(6)

## Logging

We configure how logging messages should be displayed and which log level should be used before importing Haystack.
Example log message:
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/218_Olenna_Tyrell.txt
Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:

In [None]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

## Initializing the DocumentStore

FAISS is a library for efficient similarity search on a cluster of dense vectors.
The `FAISSDocumentStore` uses a SQL(SQLite in-memory be default) database under-the-hood
to store the document text and other meta data. The vector embeddings of the text are
indexed on a FAISS Index that later is queried for searching answers.
The default flavour of FAISSDocumentStore is "Flat" but can also be set to "HNSW" for
faster search at the expense of some accuracy. Just set the faiss_index_factor_str argument in the constructor.
For more info on which suits your use case: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index

In [None]:
from haystack.document_stores import FAISSDocumentStore

document_store = FAISSDocumentStore(faiss_index_factory_str="Flat", )

INFO:haystack.telemetry:Haystack sends anonymous usage data to understand the actual usage and steer dev efforts towards features that are most meaningful to users. You can opt-out at anytime by manually setting the environment variable HAYSTACK_TELEMETRY_ENABLED as described for different operating systems in the [documentation page](https://docs.haystack.deepset.ai/docs/telemetry#how-can-i-opt-out). More information at [Telemetry](https://docs.haystack.deepset.ai/docs/telemetry).


### Option 2: Milvus

> As of version 1.15, MilvusDocumentStore has been deprecated in Haystack. It is deleted from the haystack repository as of version 1.17 and moved to [haystack-extras](https://github.com/deepset-ai/haystack-extras/tree/main). For more details, check out [Deprecation of MilvusDocumentStore](https://github.com/deepset-ai/haystack/discussions/4785).

Milvus is an open source database library that is also optimized for vector similarity searches like FAISS.
Like FAISS it has both a "Flat" and "HNSW" mode but it outperforms FAISS when it comes to dynamic data management.
It does require a little more setup, however, as it is run through Docker and requires the setup of some config files.
See [their docs](https://milvus.io/docs/v1.0.0/milvus_docker-cpu.md) for more details.

In [None]:
# Milvus cannot be run on Colab, so this cell is commented out.
# To run Milvus you need Docker (versions below 2.0.0) or a docker-compose (versions >= 2.0.0), neither of which is available on Colab.
# See Milvus' documentation for more details: https://milvus.io/docs/install_standalone-docker.md

# !pip install farm-haystack[milvus]==1.16.1

# from haystack.utils import launch_milvus
# from haystack.document_stores import MilvusDocumentStore

# launch_milvus()
# document_store = MilvusDocumentStore()

## Cleaning and Writing Documents

Similarly to the previous tutorials, we download, convert and write some Game of Thrones articles to our DocumentStore.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
ls

[0m[01;34mdrive[0m/  faiss_document_store.db  [01;34msample_data[0m/


In [None]:
cd drive/MyDrive/QA_sentencias_2022/

/content/drive/.shortcut-targets-by-id/14-qpyVS6wHYJKVRbZacpR4r95XmeT0ST/QA_sentencias_2022


In [None]:
ls

'Copia de 01_Basic_QA_Pipeline.ipynb'
'Copia de 03_Scalable_QA_System.ipynb'
'Copia de 08_Preprocessing.ipynb'
'Copia de 22_Pipeline_with_PromptNode.ipynb'
'Copia de 26_Hybrid_Retrieval.ipynb'
'Copia de Tutorial6_Better_Retrieval_via_Embedding_Retrieval.ipynb'
 [0m[01;34mSentencias_2022[0m/


In [None]:
doc_dir = "./Sentencias_2022"

In [None]:
doc_dir

'./Sentencias_2022'

In [None]:
#convierte los txt a Documents y los guarda en all_docs
from haystack.utils import convert_files_to_docs


all_docs = convert_files_to_docs(dir_path=doc_dir)

INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-001-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-002-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-003-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-004-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-005-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-006-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-007-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-008-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-009-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-010-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-011-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-012-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-013-22.txt
INFO:haystack.utils.preprocessing:Conv

##Toma los Documents y los preprocesa, quedan guardados en *docs*

In [None]:
from haystack.nodes import PreProcessor

preprocessor = PreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    clean_header_footer=False,
    split_by="word",
    split_length=100,
    split_respect_sentence_boundary=True,
)
docs = preprocessor.process(all_docs)

print(f"n_files_input: {len(all_docs)}\nn_docs_output: {len(docs)}")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Preprocessing: 100%|██████████| 313/313 [00:23<00:00, 13.16docs/s]

n_files_input: 313
n_docs_output: 75668





##Escribe _docs_ en la document_store

In [None]:
# Now, let's write the dicts containing documents to our DB.
document_store.write_documents(docs)

Writing Documents: 80000it [03:28, 384.32it/s]


In [None]:
from haystack.nodes import EmbeddingRetriever, PromptNode

retriever = EmbeddingRetriever(document_store = document_store,
                               embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


(…)e/main/config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

INFO:haystack.nodes.retriever.dense:Init retriever using embeddings of model sentence-transformers/multi-qa-mpnet-base-dot-v1


(…)70bdf8fca0ca826b6b5d16ebc/.gitattributes:   0%|          | 0.00/737 [00:00<?, ?B/s]

(…)ca0ca826b6b5d16ebc/1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

(…)abd4f70bdf8fca0ca826b6b5d16ebc/README.md:   0%|          | 0.00/8.65k [00:00<?, ?B/s]

(…)d4f70bdf8fca0ca826b6b5d16ebc/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

(…)d16ebc/config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

(…)bdf8fca0ca826b6b5d16ebc/data_config.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

(…)a826b6b5d16ebc/sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

(…)0ca826b6b5d16ebc/special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

(…)70bdf8fca0ca826b6b5d16ebc/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

(…)ca0ca826b6b5d16ebc/tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

(…)0bdf8fca0ca826b6b5d16ebc/train_script.py:   0%|          | 0.00/13.9k [00:00<?, ?B/s]

(…)abd4f70bdf8fca0ca826b6b5d16ebc/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

(…)4f70bdf8fca0ca826b6b5d16ebc/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


##Se actualiza la Document_store con datos embebidos

In [None]:
document_store.update_embeddings(retriever)

INFO:haystack.document_stores.faiss:Updating embeddings for 75555 docs...
Updating Embedding:   0%|          | 0/75555 [00:00<?, ? docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  13%|█▎        | 10000/75555 [02:08<14:04, 77.66 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  26%|██▋       | 20000/75555 [04:17<11:54, 77.79 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  40%|███▉      | 30000/75555 [06:25<09:46, 77.73 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  53%|█████▎    | 40000/75555 [08:35<07:38, 77.52 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  66%|██████▌   | 50000/75555 [10:43<05:28, 77.79 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  79%|███████▉  | 60000/75555 [12:51<03:19, 77.93 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  93%|█████████▎| 70000/75555 [15:00<01:11, 77.72 docs/s]

Batches:   0%|          | 0/174 [00:00<?, ?it/s]

Documents Processed: 80000 docs [16:11, 82.36 docs/s]


In [None]:
#Configura la API Key de ChatGPT
import os
from getpass import getpass

openai_api_key = os.getenv("OPENAI_API_KEY", None) or getpass("Enter OpenAI API key:")

Enter OpenAI API key:··········


#Hasta aquí voy, no he podido configurar el prompt node. El código que viene a partir de aquí esta repetido, o un esperador para que no se cierre el colab, no hagan caso

In [None]:
from haystack.nodes import PromptTemplate, AnswerParser

rag_prompt = PromptTemplate(
    prompt="""Synthesize a comprehensive answer from the following text for the given question.
                             Provide a clear and concise response that summarizes the key points and information presented in the text.
                             Your answer should be in your own words and be no longer than 50 words.
                             \n\n Related text: {join(documents)} \n\n Question: {query} \n\n Answer:""",
    output_parser=AnswerParser(),
)

#prompt_node = PromptNode(model_name_or_path="google/flan-t5-large")

prompt_node = PromptNode(
    model_name_or_path="text-davinci-003", api_key=openai_api_key, default_prompt_template=rag_prompt, max_length=40097
)

In [None]:
prompt_node = PromptNode(model_name_or_path = "gpt-4",
                         api_key=openai_api_key,
                         default_prompt_template = "deepset/question-answering-with-references")

In [None]:
from haystack.pipelines import Pipeline

pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

In [None]:
output = pipe.run(query="cuándo se tutela el derecho al trabajo con reintegro al puesto de trabajo?")

print(output["answers"][0].answer)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

KeyError: ignored

In [None]:
import time

# Pausar la ejecución durante 8 minutos (480 segundos)
tiempo_espera = 8 * 60  # 8 minutos * 60 segundos/minuto

print("Esperando 8 minutos...")
time.sleep(tiempo_espera)
print("¡Tiempo de espera completo!")

Esperando 8 minutos...
¡Tiempo de espera completo!


In [None]:
#from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http


# Let's first get some files that we want to use
#doc_dir = "./Sentencias_2022"
#s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt6.zip"
#fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

# Convert files to dicts
#docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True, )

# Now, let's write the dicts containing documents to our DB.
#document_store.write_documents(docs)

INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-001-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-002-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-003-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-004-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-005-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-006-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-007-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-008-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-009-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-010-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-011-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-012-22.txt
INFO:haystack.utils.preprocessing:Converting Sentencias_2022/T-013-22.txt
INFO:haystack.utils.preprocessing:Conv

## Initializing the Retriever

**Here:** We use an `EmbeddingRetriever`.

**Alternatives:**

- `BM25Retriever` with custom queries (for example, boosting) and filters
- `DensePassageRetriever` which uses two encoder models, one to embed the query and one to embed the passage, and then compares the embedding for retrieval
- `TfidfRetriever` in combination with a SQL or InMemory DocumentStore for simple prototyping and debugging

In [None]:
from haystack.nodes import EmbeddingRetriever

retriever = EmbeddingRetriever(
    document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)
# Important:
# Now that we initialized the Retriever, we need to call update_embeddings() to iterate over all
# previously indexed documents and update their embedding representation.
# While this can be a time consuming operation (depending on the corpus size), it only needs to be done once.
# At query time, we only need to embed the query and compare it to the existing document embeddings, which is very fast.
document_store.update_embeddings(retriever)

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.nodes.retriever.dense:Init retriever using embeddings of model sentence-transformers/multi-qa-mpnet-base-dot-v1
INFO:haystack.document_stores.faiss:Updating embeddings for 75555 docs...
Updating Embedding:   0%|          | 0/75555 [00:00<?, ? docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  13%|█▎        | 10000/75555 [02:12<14:29, 75.39 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  26%|██▋       | 20000/75555 [04:21<12:04, 76.64 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  40%|███▉      | 30000/75555 [06:31<09:53, 76.70 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  53%|█████▎    | 40000/75555 [08:44<07:46, 76.20 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  66%|██████▌   | 50000/75555 [10:52<05:32, 76.77 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  79%|███████▉  | 60000/75555 [13:02<03:22, 76.96 docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Documents Processed:  93%|█████████▎| 70000/75555 [15:11<01:11, 77.17 docs/s]

Batches:   0%|          | 0/174 [00:00<?, ?it/s]

Documents Processed: 80000 docs [16:22, 81.42 docs/s]


Aquí mete el generador, NO sigas con el Reader

## Initializing the Reader

Similar to previous tutorials we now initalize our Reader.

Here we use a FARMReader with the [*deepset/roberta-base-squad2*](https://huggingface.co/deepset/roberta-base-squad2) model.

In [None]:
from haystack.nodes import FARMReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'deepset/roberta-base-squad2' (Roberta)
INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'deepset/roberta-base-squad2' (Roberta model) from model hub.
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


## Initializing the Pipeline

With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline.
Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases.
To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions.
You can learn more about `Pipelines` in the [docs](https://docs.haystack.deepset.ai/docs/pipelines).

In [None]:
from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader, retriever)

## Asking a Question

We use the pipeline `run()` method to ask a question. With the `run()` method, you can configure how many candidates the Reader and Retriever shall return. The higher top_k for Retriever, the better (but also the slower) your answers.

In [None]:
prediction = pipe.run(
    query="me hacen bullying en el colegio, me dicen mariquita, ¿qué puedo hacer?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  3.44 Batches/s]


In [None]:
from haystack.utils import print_answers


print_answers(prediction, details="minimum")

'Query: me hacen bullying en el colegio, me dicen mariquita, ¿qué puedo hacer?'
'Answers:'
[   {   'answer': 'agresión',
        'context': 'urisprudencia constitucional[94], el acoso o matoneo '
                   '\x96bullying\x96 es una\n'
                   'agresión que se caracteriza por ser: \x93(i) intencional, '
                   '(ii) representa un deseq'},
    {   'answer': '\x93(i)\nintencional',
        'context': 'o o\n'
                   'matoneo \x96bullying\x96 es una agresión que se '
                   'caracteriza por ser: \x93(i)\n'
                   'intencional, (ii) representa un desequilibrio de poder '
                   'entre el agresor\n'
                   '(indiv'},
    {   'answer': 'referencia al acoso digital o maltrato en las redes '
                  'sociales. La Corte ha\n'
                  'advertido que este fenómeno \x93consiste en el uso de '
                  'nuevas tecnologías de la información y\n'
                  'las comunicaciones 

In [None]:
from pprint import pprint

pprint(prediction)

{'answers': [<Answer {'answer': 'agresión', 'type': 'extractive', 'score': 0.12982498109340668, 'context': 'urisprudencia constitucional[94], el acoso o matoneo \x96bullying\x96 es una\nagresión que se caracteriza por ser: \x93(i) intencional, (ii) representa un deseq', 'offsets_in_document': [{'start': 190, 'end': 198}], 'offsets_in_context': [{'start': 71, 'end': 79}], 'document_ids': ['e7ee662651786a55f7b9ce50c290b1ee'], 'meta': {'name': 'T-453-22.txt', '_split_id': 74, 'vector_id': '67832'}}>,
             <Answer {'answer': '\x93(i)\nintencional', 'type': 'extractive', 'score': 0.11052682995796204, 'context': 'o o\nmatoneo \x96bullying\x96 es una agresión que se caracteriza por ser: \x93(i)\nintencional, (ii) representa un desequilibrio de poder entre el agresor\n(indiv', 'offsets_in_document': [{'start': 144, 'end': 160}], 'offsets_in_context': [{'start': 67, 'end': 83}], 'document_ids': ['fadd0cc0c4bf409998caf5ebb1b606f9'], 'meta': {'name': 'T-453-22.txt', '_split_id': 4, 'vecto

In [None]:
prediction = pipe.run(
    query="cuándo se tutela el derecho al trabajo con reintegro al puesto de trabajo?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.22 Batches/s]


In [None]:
from haystack.utils import print_answers


print_answers(prediction, details="minimum")

('Query: cuándo se tutela el derecho al trabajo con reintegro al puesto de '
 'trabajo?')
'Answers:'
[   {   'answer': 'en su artículo 6',
        'context': 'Al respecto, en su artículo 6\n'
                   'establece que los Estados Parte reconocen el derecho a '
                   'trabajar. Este comprende\n'
                   'el derecho de toda persona a tener la op'},
    {   'answer': 'el juez',
        'context': ' de una persona en condición de discapacidad. En\n'
                   'consecuencia, pide que el juez de tutela: (i) reintegrar '
                   'al trabajador en el\n'
                   'puesto de trabajo que de'},
    {   'answer': 'Los\nEstados Partes',
        'context': 'Los\n'
                   'Estados Partes en el presente Pacto reconocen el derecho a '
                   'trabajar, que\n'
                   'comprende el derecho de toda persona a tener la '
                   'oportunidad de ganarse la'},
    {   'answer': 'diferencias en el trato 

In [None]:
from pprint import pprint

pprint(prediction)

{'answers': [<Answer {'answer': 'en su artículo 6', 'type': 'extractive', 'score': 0.2898719310760498, 'context': 'Al respecto, en su artículo 6\nestablece que los Estados Parte reconocen el derecho a trabajar. Este comprende\nel derecho de toda persona a tener la op', 'offsets_in_document': [{'start': 13, 'end': 29}], 'offsets_in_context': [{'start': 13, 'end': 29}], 'document_ids': ['df89733b45de9e79255d97848a59d32c'], 'meta': {'name': 'T-293-22.txt', '_split_id': 37, 'vector_id': '65216'}}>,
             <Answer {'answer': 'el juez', 'type': 'extractive', 'score': 0.23388442397117615, 'context': ' de una persona en condición de discapacidad. En\nconsecuencia, pide que el juez de tutela: (i) reintegrar al trabajador en el\npuesto de trabajo que de', 'offsets_in_document': [{'start': 297, 'end': 304}], 'offsets_in_context': [{'start': 72, 'end': 79}], 'document_ids': ['38f99094c00734452bd877214790f8e4'], 'meta': {'name': 'T-425-22.txt', '_split_id': 163, 'vector_id': '12927'}}>,
     