# RAG: QA on a Single Document

Deep Search allows users to interact with the documents using conversational AI, i.e. you interact with a virtual assistant which answer your questions using the information in the document.

In this example we demonstrate how achive the same interaction programmatically.

### Access required

The content of this notebook requires access to Deep Search capabilities which are not
available on the public access system.

[Contact us](https://ds4sd.github.io) if you are interested in exploring
these Deep Search capabilities.


### GenAI Integration required

When interacting with the virtual assistant, Deep Search requires a connection to a Generative AI API. Currently, we support connections to [watsonx.ai](https://www.ibm.com/products/watsonx-ai) or the IBM-internal GenAI platform BAM.

Deep Search allows custom GenAI configurations for each project.
In the following example you will require to work in a project which has such GenAI capabilities activated.

### Set notebook parameters


In [1]:
from dsnotebooks.settings import DocQANotebookSettings

# notebooks settings auto-loaded from .env / env vars
notebook_settings = DocQANotebookSettings()

PROFILE_NAME = notebook_settings.profile            # the profile to use
PROJ_KEY = notebook_settings.proj_key               # the project to use

# index and doc for doc QA from semantically indexed collection
SEM_ON_IDX_KEY = notebook_settings.sem_on_idx_key
SEM_ON_IDX_DOC_HASH = notebook_settings.sem_on_idx_doc_hash

# index and doc for doc QA from not semantically indexed collection
SEM_OFF_IDX_KEY = notebook_settings.sem_off_idx_key
SEM_OFF_IDX_DOC_HASH = notebook_settings.sem_off_idx_doc_hash

### Import example dependencies

In [2]:
# Import standard dependenices
import rich

# IPython utilities
from IPython.display import display, Markdown

# Import the deepsearch-toolkit
from deepsearch.cps.client.api import CpsApi
from deepsearch.cps.client.components.elastic import ElasticProjectDataCollectionSource
from deepsearch.cps.queries import DocumentQuestionQuery


### Connect to Deep Search

In [3]:
api = CpsApi.from_env(profile_name=PROFILE_NAME)


### Utils

In [4]:
from deepsearch.cps.client.components.queries import RunQueryResult

def display_qa_result(
        api: CpsApi,
        coords: ElasticProjectDataCollectionSource,
        question: str,
        qa_res: RunQueryResult,
):
    ## compute URL to the document in the Deep Search UI
    doc_url = api.documents.generate_url(
        document_hash=qa_res.outputs["provenance"][0]["doc_hash"],
        data_source=coords,
        item_index=qa_res.outputs["provenance"][0]["pos_in_doc"],
    )
    display(Markdown(f"Question: {question}"))
    display(Markdown(f'Answer: {qa_res.outputs["answer"]}'))
    display(Markdown(f"The provenance of the answer can be inspected on the [source document]({doc_url})."))
    display(Markdown(f"Details:"))
    rich.print(qa_res.outputs)


---

## QA on document within semantically indexed collection

If the document is part of a semantically indexed collection (see doc_collection_qa.ipynb), we can directly do QA on it as shown below:

In [5]:
# prepare collection coordinates
coll_coords = ElasticProjectDataCollectionSource(
    proj_key=PROJ_KEY,
    index_key=SEM_ON_IDX_KEY,
)

question = "Where was the first European IBM lab located?"

# submit natural-language query on document
question_query = DocumentQuestionQuery(
    question=question,
    project=PROJ_KEY,
    index_key=SEM_ON_IDX_KEY,
    document_hash=SEM_ON_IDX_DOC_HASH,
)
question_results = api.queries.run(question_query)

display_qa_result(api=api, coords=coll_coords, question=question, qa_res=question_results)


Question: Where was the first European IBM lab located?

Answer: Adliswil, Switzerland

The provenance of the answer can be inspected on the [source document](https://sds.app.accelerate.science/projects/b09ae7561a01dc7c4b0fd21a43bfd93d140766d1/library/private/6b70072911ad2794a3844dd44d1705a5ba37ca0b?search=JTdCJTIycHJpdmF0ZUNvbGxlY3Rpb24lMjIlM0ElMjI2YjcwMDcyOTExYWQyNzk0YTM4NDRkZDQ0ZDE3MDVhNWJhMzdjYTBiJTIyJTJDJTIydHlwZSUyMiUzQSUyMkRvY3VtZW50JTIyJTJDJTIyZXhwcmVzc2lvbiUyMiUzQSUyMmZpbGUtaW5mby5kb2N1bWVudC1oYXNoJTNBJTIwJTVDJTIyYjMwYmM2NjdhMzI0YWUxMTFkMDI1NTI2NTYzYjY3NGE4ZDNmZDg2OWJjMDdjOGZkMjA0YWE5NWIwNWQ0MWYwYyU1QyUyMiUyMiUyQyUyMmZpbHRlcnMlMjIlM0ElNUIlNUQlMkMlMjJzZWxlY3QlMjIlM0ElNUIlMjJfbmFtZSUyMiUyQyUyMmRlc2NyaXB0aW9uLmNvbGxlY3Rpb24lMjIlMkMlMjJwcm92JTIyJTJDJTIyZGVzY3JpcHRpb24udGl0bGUlMjIlMkMlMjJkZXNjcmlwdGlvbi5wdWJsaWNhdGlvbl9kYXRlJTIyJTJDJTIyZGVzY3JpcHRpb24udXJsX3JlZnMlMjIlNUQlMkMlMjJpdGVtSW5kZXglMjIlM0EwJTJDJTIycGFnZVNpemUlMjIlM0ExMCUyQyUyMnNlYXJjaEFmdGVySGlzdG9yeSUyMiUzQSU1QiU1RCUyQyUyMnZpZXdUeXBlJTIyJTNBJTIyc25pcHBldHMlMjIlMkMlMjJyZWNvcmRTZWxlY3Rpb24lMjIlM0ElN0IlMjJyZWNvcmQlMjIlM0ElN0IlMjJpZCUyMiUzQSUyMmIzMGJjNjY3YTMyNGFlMTExZDAyNTUyNjU2M2I2NzRhOGQzZmQ4NjliYzA3YzhmZDIwNGFhOTViMDVkNDFmMGMlMjIlN0QlMkMlMjJpdGVtSW5kZXglMjIlM0E3MSU3RCU3RA%3D%3D).

Details:

## QA on document not in semantically indexed collection

### Ingestion

In the cell below we show how to semantically index a single document:

In [6]:
from deepsearch.cps.client.components.documents import SemIngestPrivateDataDocumentSource

# prepare collection coordinates
coords = ElasticProjectDataCollectionSource(
    proj_key=PROJ_KEY,
    index_key=SEM_OFF_IDX_KEY,
)

# launch the ingestion of the document for DocumentQA
task = api.documents.semantic_ingest(
    project=PROJ_KEY,
    data_source=SemIngestPrivateDataDocumentSource(
        source=coords,
        document_hash=SEM_OFF_IDX_DOC_HASH,
    ),
)

# wait for the ingestion task to finish
api.tasks.wait_for(task.proj_key, task.task_id)

{'ing_out': {}}

### RAG

In [7]:
question = "Which company created the first game console?"

# submit natural-language query on document
question_query = DocumentQuestionQuery(
    question=question,
    project=PROJ_KEY,
    document_hash=SEM_OFF_IDX_DOC_HASH,
)
question_results = api.queries.run(question_query)

display_qa_result(api=api, coords=coords, question=question, qa_res=question_results)

Question: Which company created the first game console?

Answer: Magnavox

The provenance of the answer can be inspected on the [source document](https://sds.app.accelerate.science/projects/b09ae7561a01dc7c4b0fd21a43bfd93d140766d1/library/private/b4edbe66a8b8fe2ebed7e20d4d7b9335c48b45b0?search=JTdCJTIycHJpdmF0ZUNvbGxlY3Rpb24lMjIlM0ElMjJiNGVkYmU2NmE4YjhmZTJlYmVkN2UyMGQ0ZDdiOTMzNWM0OGI0NWIwJTIyJTJDJTIydHlwZSUyMiUzQSUyMkRvY3VtZW50JTIyJTJDJTIyZXhwcmVzc2lvbiUyMiUzQSUyMmZpbGUtaW5mby5kb2N1bWVudC1oYXNoJTNBJTIwJTVDJTIyMDI5MjEwZGY5MjljNzhlNzBkNzRlNmYxNDFhNDZkODMyNjkwNWNlNTg1NjJmMjA4MTgxOWM4MGMzOTIxZDVhMyU1QyUyMiUyMiUyQyUyMmZpbHRlcnMlMjIlM0ElNUIlNUQlMkMlMjJzZWxlY3QlMjIlM0ElNUIlMjJfbmFtZSUyMiUyQyUyMmRlc2NyaXB0aW9uLmNvbGxlY3Rpb24lMjIlMkMlMjJwcm92JTIyJTJDJTIyZGVzY3JpcHRpb24udGl0bGUlMjIlMkMlMjJkZXNjcmlwdGlvbi5wdWJsaWNhdGlvbl9kYXRlJTIyJTJDJTIyZGVzY3JpcHRpb24udXJsX3JlZnMlMjIlNUQlMkMlMjJpdGVtSW5kZXglMjIlM0EwJTJDJTIycGFnZVNpemUlMjIlM0ExMCUyQyUyMnNlYXJjaEFmdGVySGlzdG9yeSUyMiUzQSU1QiU1RCUyQyUyMnZpZXdUeXBlJTIyJTNBJTIyc25pcHBldHMlMjIlMkMlMjJyZWNvcmRTZWxlY3Rpb24lMjIlM0ElN0IlMjJyZWNvcmQlMjIlM0ElN0IlMjJpZCUyMiUzQSUyMjAyOTIxMGRmOTI5Yzc4ZTcwZDc0ZTZmMTQxYTQ2ZDgzMjY5MDVjZTU4NTYyZjIwODE4MTljODBjMzkyMWQ1YTMlMjIlN0QlMkMlMjJpdGVtSW5kZXglMjIlM0E5JTdEJTdE).

Details:

Asking an additional question on the same document:

In [8]:
question = "Which company bought Magnavox and when?"

# submit natural-language query on document
question_query = DocumentQuestionQuery(
    question=question,
    project=PROJ_KEY,
    document_hash=SEM_OFF_IDX_DOC_HASH,
)
question_results = api.queries.run(question_query)

display_qa_result(api=api, coords=coords, question=question, qa_res=question_results)

Question: Which company bought Magnavox and when?

Answer: Since 1975, Magnavox has been a subsidiary of the Dutch electronics corporation Phillips.

The provenance of the answer can be inspected on the [source document](https://sds.app.accelerate.science/projects/b09ae7561a01dc7c4b0fd21a43bfd93d140766d1/library/private/b4edbe66a8b8fe2ebed7e20d4d7b9335c48b45b0?search=JTdCJTIycHJpdmF0ZUNvbGxlY3Rpb24lMjIlM0ElMjJiNGVkYmU2NmE4YjhmZTJlYmVkN2UyMGQ0ZDdiOTMzNWM0OGI0NWIwJTIyJTJDJTIydHlwZSUyMiUzQSUyMkRvY3VtZW50JTIyJTJDJTIyZXhwcmVzc2lvbiUyMiUzQSUyMmZpbGUtaW5mby5kb2N1bWVudC1oYXNoJTNBJTIwJTVDJTIyMDI5MjEwZGY5MjljNzhlNzBkNzRlNmYxNDFhNDZkODMyNjkwNWNlNTg1NjJmMjA4MTgxOWM4MGMzOTIxZDVhMyU1QyUyMiUyMiUyQyUyMmZpbHRlcnMlMjIlM0ElNUIlNUQlMkMlMjJzZWxlY3QlMjIlM0ElNUIlMjJfbmFtZSUyMiUyQyUyMmRlc2NyaXB0aW9uLmNvbGxlY3Rpb24lMjIlMkMlMjJwcm92JTIyJTJDJTIyZGVzY3JpcHRpb24udGl0bGUlMjIlMkMlMjJkZXNjcmlwdGlvbi5wdWJsaWNhdGlvbl9kYXRlJTIyJTJDJTIyZGVzY3JpcHRpb24udXJsX3JlZnMlMjIlNUQlMkMlMjJpdGVtSW5kZXglMjIlM0EwJTJDJTIycGFnZVNpemUlMjIlM0ExMCUyQyUyMnNlYXJjaEFmdGVySGlzdG9yeSUyMiUzQSU1QiU1RCUyQyUyMnZpZXdUeXBlJTIyJTNBJTIyc25pcHBldHMlMjIlMkMlMjJyZWNvcmRTZWxlY3Rpb24lMjIlM0ElN0IlMjJyZWNvcmQlMjIlM0ElN0IlMjJpZCUyMiUzQSUyMjAyOTIxMGRmOTI5Yzc4ZTcwZDc0ZTZmMTQxYTQ2ZDgzMjY5MDVjZTU4NTYyZjIwODE4MTljODBjMzkyMWQ1YTMlMjIlN0QlMkMlMjJpdGVtSW5kZXglMjIlM0ExJTdEJTdE).

Details: