# RAG and Semantic Retrieval on a Single Document

Deep Search allows users to interact with the documents using conversational AI, i.e. you interact with a virtual assistant which answer your questions using the information in the document.

In this example we demonstrate how achive the same interaction programmatically.

### Access required

The content of this notebook requires access to Deep Search capabilities which are not
available on the public access system.

[Contact us](https://ds4sd.github.io) if you are interested in exploring
these Deep Search capabilities.


### GenAI Integration required

When interacting with the virtual assistant, Deep Search requires a connection to a Generative AI API. Currently, we support connections to [watsonx.ai](https://www.ibm.com/products/watsonx-ai) or the IBM-internal GenAI platform BAM.

Deep Search allows custom GenAI configurations for each project.
In the following example you will require to work in a project which has such GenAI capabilities activated.

### Set notebook parameters


In [1]:
from dsnotebooks.settings import DocQANotebookSettings

# notebooks settings auto-loaded from .env / env vars
notebook_settings = DocQANotebookSettings()

PROFILE_NAME = notebook_settings.profile     # the profile to use
PROJ_KEY = notebook_settings.proj_key        # the project to use

# index and doc for doc QA from semantically indexed collection
SEM_ON_IDX_KEY = notebook_settings.sem_on_idx_key
SEM_ON_IDX_DOC_HASH = notebook_settings.sem_on_idx_doc_hash

# index and doc for doc QA from not semantically indexed collection
SEM_OFF_IDX_KEY = notebook_settings.sem_off_idx_key
SEM_OFF_IDX_DOC_HASH = notebook_settings.sem_off_idx_doc_hash

SKIP_INGESTED_DOCS = notebook_settings.skip_ingested_docs  # whether to skip any already semantically ingested docs

RETR_K = notebook_settings.retr_k            # the number of search results to retrieve
TEXT_WEIGHT = notebook_settings.text_weight  # the weight of lexical search (0.0: semantic-only, 1.0: lexical-only, anything in between: hybrid search)
RERANK = notebook_settings.rerank            # whether to rerank the search results
RAISE = notebook_settings.raise_on_sem_err   # whether semantic operation errors should raise an exception or be reflected in response fields

### Import example dependencies

In [2]:
# Import standard dependenices
import rich

# IPython utilities
from IPython.display import display, Markdown

# Import the deepsearch-toolkit
from deepsearch.cps.client.api import CpsApi
from deepsearch.cps.client.components.elastic import ElasticProjectDataCollectionSource
from deepsearch.cps.queries import RAGQuery, SemanticQuery
from deepsearch.cps.queries.results import RAGResult, SearchResult, SearchResultItem


### Connect to Deep Search

In [3]:
api = CpsApi.from_env(profile_name=PROFILE_NAME)


### Utils

In [4]:
def render_provenance_url(
    api: CpsApi,
    coords: ElasticProjectDataCollectionSource,
    retr_item: SearchResultItem,
):
    ## compute URL to the document in the Deep Search UI
    item_index = int(retr_item.main_path[retr_item.main_path.rfind(".")+1:])
    doc_url = api.documents.generate_url(
        document_hash=retr_item.doc_hash,
        data_source=coords,
        item_index=item_index,
    )
    display(Markdown(f"The provenance of the answer can be inspected on the [source document]({doc_url})."))

---

## QA on document within semantically indexed collection

### Prepare data source

In [5]:
from deepsearch.cps.client.components.documents import PrivateDataDocumentSource

coords = ElasticProjectDataCollectionSource(
    proj_key=PROJ_KEY,
    index_key=SEM_ON_IDX_KEY,
)
data_source = PrivateDataDocumentSource(
    source=coords,
    document_hash=SEM_ON_IDX_DOC_HASH,
)

### RAG

If the document is part of a semantically indexed collection (see [Document Collection QA](https://github.com/DS4SD/deepsearch-examples/tree/main/examples/qa_doc_collection) for details),
we can directly do RAG on it as shown below:

In [6]:
question = "Where was the first European IBM research lab located?"

# submit natural-language query on document
question_query = RAGQuery(
    question=question,
    project=PROJ_KEY,
    data_source=data_source,

    ## optional retrieval params
    retr_k=RETR_K,
)
api_output = api.queries.run(question_query)
rag_result = RAGResult.from_api_output(api_output, raise_on_error=RAISE)

rich.print(rag_result)

Additionally, we can generate a provenance URL to the document in the Deep Search UI:

In [7]:
render_provenance_url(api=api, coords=coords, retr_item=rag_result.answers[0].grounding.retr_items[0])

The provenance of the answer can be inspected on the [source document](https://cps.foc-deepsearch.zurich.ibm.com/projects/e0ea87922f4b732407fb3b9cf3475f0edb90cc2d/library/private/d70f151acff22f19f9cfaffb1f5baa810c8de3db?search=JTdCJTIycHJpdmF0ZUNvbGxlY3Rpb24lMjIlM0ElMjJkNzBmMTUxYWNmZjIyZjE5ZjljZmFmZmIxZjViYWE4MTBjOGRlM2RiJTIyJTJDJTIydHlwZSUyMiUzQSUyMkRvY3VtZW50JTIyJTJDJTIyZXhwcmVzc2lvbiUyMiUzQSUyMmZpbGUtaW5mby5kb2N1bWVudC1oYXNoJTNBJTIwJTVDJTIyYjMwYmM2NjdhMzI0YWUxMTFkMDI1NTI2NTYzYjY3NGE4ZDNmZDg2OWJjMDdjOGZkMjA0YWE5NWIwNWQ0MWYwYyU1QyUyMiUyMiUyQyUyMmZpbHRlcnMlMjIlM0ElNUIlNUQlMkMlMjJzZWxlY3QlMjIlM0ElNUIlMjJfbmFtZSUyMiUyQyUyMmRlc2NyaXB0aW9uLmNvbGxlY3Rpb24lMjIlMkMlMjJwcm92JTIyJTJDJTIyZGVzY3JpcHRpb24udGl0bGUlMjIlMkMlMjJkZXNjcmlwdGlvbi5wdWJsaWNhdGlvbl9kYXRlJTIyJTJDJTIyZGVzY3JpcHRpb24udXJsX3JlZnMlMjIlNUQlMkMlMjJpdGVtSW5kZXglMjIlM0EwJTJDJTIycGFnZVNpemUlMjIlM0ExMCUyQyUyMnNlYXJjaEFmdGVySGlzdG9yeSUyMiUzQSU1QiU1RCUyQyUyMnZpZXdUeXBlJTIyJTNBJTIyc25pcHBldHMlMjIlMkMlMjJyZWNvcmRTZWxlY3Rpb24lMjIlM0ElN0IlMjJyZWNvcmQlMjIlM0ElN0IlMjJpZCUyMiUzQSUyMmIzMGJjNjY3YTMyNGFlMTExZDAyNTUyNjU2M2I2NzRhOGQzZmQ4NjliYzA3YzhmZDIwNGFhOTViMDVkNDFmMGMlMjIlN0QlMkMlMjJpdGVtSW5kZXglMjIlM0E3MCU3RCU3RA%3D%3D).

Let us try out a different question on our document corpus.
Here we also include (commented out) various additional parameters the user can optionally set:
- `retr_k`: number of items to retrieve
- `text_weight`: weight of lexical search (`0.0`: fully semantic search, `1.0`: fully lexical search, anything in-between: hybrid search)
- `rerank`: whether to rerank the retrieval results
- `gen_ctx_extr_method` (Literal["window", "page"], optional): method for gen context extraction from document; defaults to "window"
- `gen_ctx_window_size` (int, optional): (relevant only if `gen_ctx_extr_method` is "window") max chars to use for extracted gen context (actual extraction quantized on doc item level); defaults to 5000
- `gen_ctx_window_lead_weight` (float, optional): (relevant only if `gen_ctx_extr_method` is "window") weight of leading text for distributing remaining window size after extracting the `main_path`; defaults to 0.5 (centered around `main_path`)
- `return_prompt` (bool, optional): whether to return the instantiated prompt; defaults to False

For more details refer to `deepsearch.cps.queries.RAGQuery`.

In [8]:
question = "How many research labs does IBM have?"

# submit natural-language query on document
question_query = RAGQuery(
    question=question,
    project=PROJ_KEY,
    data_source=data_source,

    ## optional retrieval params
    retr_k=RETR_K,
    # text_weight=TEXT_WEIGHT,
    rerank=RERANK,

    ## optional generation params
    # model_id="ibm-mistralai/mixtral-8x7b-instruct-v01-q",
    # gen_params={"random_seed": 42, "max_new_tokens": 1024},
    # prompt_template="Answer the query based on the context.\n\nContext: {{ context }}\n\nQuery: {{ query }}",

    # gen_ctx_extr_method="window",
    # gen_ctx_window_size=5000,
    # gen_ctx_window_lead_weight=0.5
    # return_prompt=True,
)
api_output = api.queries.run(question_query)
rag_result = RAGResult.from_api_output(api_output, raise_on_error=RAISE)

rich.print(rag_result)

In [9]:
render_provenance_url(api=api, coords=coords, retr_item=rag_result.answers[0].grounding.retr_items[0])

The provenance of the answer can be inspected on the [source document](https://cps.foc-deepsearch.zurich.ibm.com/projects/e0ea87922f4b732407fb3b9cf3475f0edb90cc2d/library/private/d70f151acff22f19f9cfaffb1f5baa810c8de3db?search=JTdCJTIycHJpdmF0ZUNvbGxlY3Rpb24lMjIlM0ElMjJkNzBmMTUxYWNmZjIyZjE5ZjljZmFmZmIxZjViYWE4MTBjOGRlM2RiJTIyJTJDJTIydHlwZSUyMiUzQSUyMkRvY3VtZW50JTIyJTJDJTIyZXhwcmVzc2lvbiUyMiUzQSUyMmZpbGUtaW5mby5kb2N1bWVudC1oYXNoJTNBJTIwJTVDJTIyYjMwYmM2NjdhMzI0YWUxMTFkMDI1NTI2NTYzYjY3NGE4ZDNmZDg2OWJjMDdjOGZkMjA0YWE5NWIwNWQ0MWYwYyU1QyUyMiUyMiUyQyUyMmZpbHRlcnMlMjIlM0ElNUIlNUQlMkMlMjJzZWxlY3QlMjIlM0ElNUIlMjJfbmFtZSUyMiUyQyUyMmRlc2NyaXB0aW9uLmNvbGxlY3Rpb24lMjIlMkMlMjJwcm92JTIyJTJDJTIyZGVzY3JpcHRpb24udGl0bGUlMjIlMkMlMjJkZXNjcmlwdGlvbi5wdWJsaWNhdGlvbl9kYXRlJTIyJTJDJTIyZGVzY3JpcHRpb24udXJsX3JlZnMlMjIlNUQlMkMlMjJpdGVtSW5kZXglMjIlM0EwJTJDJTIycGFnZVNpemUlMjIlM0ExMCUyQyUyMnNlYXJjaEFmdGVySGlzdG9yeSUyMiUzQSU1QiU1RCUyQyUyMnZpZXdUeXBlJTIyJTNBJTIyc25pcHBldHMlMjIlMkMlMjJyZWNvcmRTZWxlY3Rpb24lMjIlM0ElN0IlMjJyZWNvcmQlMjIlM0ElN0IlMjJpZCUyMiUzQSUyMmIzMGJjNjY3YTMyNGFlMTExZDAyNTUyNjU2M2I2NzRhOGQzZmQ4NjliYzA3YzhmZDIwNGFhOTViMDVkNDFmMGMlMjIlN0QlMkMlMjJpdGVtSW5kZXglMjIlM0EzJTdEJTdE).

### Semantic retrieval

Besides RAG, which includes natural language generation, a user may only be interested in
the semantic retrieval part.

This can be obtained very similarly to RAG, as shown below:

In [10]:
question = "Where was the first European IBM lab located?"

# submit natural-language query on document
question_query = SemanticQuery(
    question=question,
    project=PROJ_KEY,
    data_source=data_source,

    ## optional params
    retr_k=RETR_K,
    # text_weight=TEXT_WEIGHT,
    # rerank=RERANK,
)
api_output = api.queries.run(question_query)
rag_result = SearchResult.from_api_output(api_output, raise_on_error=RAISE)

rich.print(rag_result)

## RAG on document not in semantically indexed collection

### Prepare source

In [11]:
coords = ElasticProjectDataCollectionSource(
    proj_key=PROJ_KEY,
    index_key=SEM_OFF_IDX_KEY,
)
data_source = PrivateDataDocumentSource(
    source=coords,
    document_hash=SEM_OFF_IDX_DOC_HASH,
)

### Ingestion

In the cell below we show how to semantically index a single document:

In [12]:
# launch the ingestion of the document for DocumentQA
task = api.documents.semantic_ingest(
    project=PROJ_KEY,
    data_source=data_source,
    skip_ingested_docs=False,  # forcing re-indexing for the purpose of this example
)

# wait for the ingestion task to finish
api.tasks.wait_for(PROJ_KEY, task.task_id)

  Expected `list[str]` but got `_LiteralGenericAlias` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


{'ing_out': {}}

Once the document has been semantically ingested, we can run both RAG and semantic retrieval queries against it, as shown below.

### RAG

In [13]:
question = "Which company created the first game console?"

# submit natural-language query on document
question_query = RAGQuery(
    question=question,
    project=PROJ_KEY,
    data_source=data_source,

    ## optional retrieval params
    retr_k=4,
    # text_weight=TEXT_WEIGHT,
    rerank=RERANK,

    ## optional generation params
    # model_id="ibm-mistralai/mixtral-8x7b-instruct-v01-q",
    # gen_params={"random_seed": 42, "max_new_tokens": 1024},
    # prompt_template="Answer the query based on the context.\n\nContext: {{ context }}\n\nQuery: {{ query }}",
)
api_output = api.queries.run(question_query)
rag_result = RAGResult.from_api_output(api_output, raise_on_error=RAISE)

rich.print(rag_result)


In [14]:
render_provenance_url(api=api, coords=coords, retr_item=rag_result.answers[0].grounding.retr_items[0])

The provenance of the answer can be inspected on the [source document](https://cps.foc-deepsearch.zurich.ibm.com/projects/e0ea87922f4b732407fb3b9cf3475f0edb90cc2d/library/private/d0c6b811fa75fb8154bf7659162089a42fdd0895?search=JTdCJTIycHJpdmF0ZUNvbGxlY3Rpb24lMjIlM0ElMjJkMGM2YjgxMWZhNzVmYjgxNTRiZjc2NTkxNjIwODlhNDJmZGQwODk1JTIyJTJDJTIydHlwZSUyMiUzQSUyMkRvY3VtZW50JTIyJTJDJTIyZXhwcmVzc2lvbiUyMiUzQSUyMmZpbGUtaW5mby5kb2N1bWVudC1oYXNoJTNBJTIwJTVDJTIyMDI5MjEwZGY5MjljNzhlNzBkNzRlNmYxNDFhNDZkODMyNjkwNWNlNTg1NjJmMjA4MTgxOWM4MGMzOTIxZDVhMyU1QyUyMiUyMiUyQyUyMmZpbHRlcnMlMjIlM0ElNUIlNUQlMkMlMjJzZWxlY3QlMjIlM0ElNUIlMjJfbmFtZSUyMiUyQyUyMmRlc2NyaXB0aW9uLmNvbGxlY3Rpb24lMjIlMkMlMjJwcm92JTIyJTJDJTIyZGVzY3JpcHRpb24udGl0bGUlMjIlMkMlMjJkZXNjcmlwdGlvbi5wdWJsaWNhdGlvbl9kYXRlJTIyJTJDJTIyZGVzY3JpcHRpb24udXJsX3JlZnMlMjIlNUQlMkMlMjJpdGVtSW5kZXglMjIlM0EwJTJDJTIycGFnZVNpemUlMjIlM0ExMCUyQyUyMnNlYXJjaEFmdGVySGlzdG9yeSUyMiUzQSU1QiU1RCUyQyUyMnZpZXdUeXBlJTIyJTNBJTIyc25pcHBldHMlMjIlMkMlMjJyZWNvcmRTZWxlY3Rpb24lMjIlM0ElN0IlMjJyZWNvcmQlMjIlM0ElN0IlMjJpZCUyMiUzQSUyMjAyOTIxMGRmOTI5Yzc4ZTcwZDc0ZTZmMTQxYTQ2ZDgzMjY5MDVjZTU4NTYyZjIwODE4MTljODBjMzkyMWQ1YTMlMjIlN0QlMkMlMjJpdGVtSW5kZXglMjIlM0E5JTdEJTdE).

### Semantic retrieval

In [15]:
question = "Which company created the first game console?"

# submit natural-language query on document
question_query = SemanticQuery(
    question=question,
    project=PROJ_KEY,
    data_source=data_source,

    ## optional params
    retr_k=4,
    # text_weight=TEXT_WEIGHT,
    rerank=RERANK,
)
api_output = api.queries.run(question_query)
rag_result = SearchResult.from_api_output(api_output, raise_on_error=RAISE)

rich.print(rag_result)

---

## QA on document from public collection

### Prepare source

In [16]:
from deepsearch.cps.client.components.elastic import ElasticDataCollectionSource
from deepsearch.cps.client.components.documents import PublicDataDocumentSource

index_key = "acl"
document_hash = "0002e4fc1cef1c98b411c75e484db0a3d32f6bc1b4058e2985e5f377721761fb"

coords = ElasticDataCollectionSource(
    elastic_id="default",
    index_key=index_key,
)
data_source = PublicDataDocumentSource(
    source=coords,
    document_hash=document_hash,
)

### RAG

In [17]:
question = "How many goals can a player achieve in the MP game?"

# submit natural-language query on document
question_query = RAGQuery(
    question=question,
    project=PROJ_KEY,
    data_source=data_source,

    ## optional retrieval params
    retr_k=RETR_K,
)
api_output = api.queries.run(question_query)
rag_result = RAGResult.from_api_output(api_output, raise_on_error=RAISE)

rich.print(rag_result)