# QA Deep Dive

In this QA Deep Dive notebook, we take a closer look at semantic ingestion, RAG, and retrieval, presenting the various customization options, and providing hints to help you make the most out of your QA application.

For getting started with basic QA usage, check out [QA Quick Start](./qa_quick_start.ipynb).

### Access required

The content of this notebook requires access to Deep Search capabilities which are not
available on the public access system.

[Contact us](https://ds4sd.github.io) if you are interested in exploring
these Deep Search capabilities.


### GenAI Integration required

When interacting with the virtual assistant, Deep Search requires a connection to a Generative AI API. Currently, we support connections to [watsonx.ai](https://www.ibm.com/products/watsonx-ai) or the IBM-internal GenAI platform BAM.

Deep Search allows custom GenAI configurations for each project.
In the following example you will require to work in a project which has such GenAI capabilities activated.

### Set notebook parameters


In [1]:
import os
from dotenv import load_dotenv
from pydantic import TypeAdapter

load_dotenv()

PROFILE_NAME = os.environ.get("DS_NB_PROFILE")  # profile to use; defaults to active one
PROJ_KEY = os.environ["DS_NB_PROJ_KEY"]  # project to use
INDEX_KEY = os.environ["DS_NB_QA_IDX_KEY"]
DOC_HASH = os.environ.get("DS_NB_QA_DOC_HASH")  # set only when targeting a specific doc
QUESTION = os.environ["DS_NB_QUESTION"]

# whether to ingest incrementally:
SKIP_INGESTED_DOCS = TypeAdapter(bool).validate_python(
    os.environ.get("DS_NB_SKIP_INGESTED_DOCS", True)
)
RETR_K = os.environ.get("DS_NB_RETR_K", 3)  # number of search results to retrieve
GEN_TIMEOUT = os.environ.get("DS_NB_GEN_TIMEOUT", 10)  # generation timeout in seconds

### Import example dependencies

In [2]:
import rich
from typing import Union
from IPython.display import display, Markdown
from deepsearch.cps.client.api import CpsApi
from deepsearch.cps.client.components.documents import create_private_data_source
from deepsearch.cps.client.components.elastic import (
    ElasticDataCollectionSource,
    ElasticProjectDataCollectionSource,
)
from deepsearch.cps.queries import RAGQuery, SemanticQuery
from deepsearch.cps.queries.results import RAGResult, SearchResult, SearchResultItem

### Connect to Deep Search

In [3]:
api = CpsApi.from_env(profile_name=PROFILE_NAME)

### Notebook utils

In [4]:
def render_provenance_url(
    api: CpsApi,
    coords: Union[ElasticDataCollectionSource, ElasticProjectDataCollectionSource],
    retr_item: SearchResultItem,
):
    ## compute URL to the document in the Deep Search UI
    item_index = int(retr_item.main_path[retr_item.main_path.rfind(".") + 1 :])
    doc_url = api.documents.generate_url(
        document_hash=retr_item.doc_hash,
        data_source=coords,
        item_index=item_index,
    )
    display(
        Markdown(
            f"The provenance of the answer can be inspected on the [source document]({doc_url})."
        )
    )

### Prepare data source

All semantic operations, i.e. ingestion, RAG, and retrieval, require a *data source* for defining the docs to operate on.

The cell below shows how to configure a *private* data source, i.e. for operating on a whole private collection or a given doc within one.
- set `document_hash` only when targeting a specific doc; when targeting the whole private collection instead, omit it or set it to `None`
- to use a *public* data source instead, switch to the commented code on the bottom of the cell

In [5]:
# configuring a private data source:
data_source = create_private_data_source(
    proj_key=PROJ_KEY,
    index_key=INDEX_KEY,
    document_hash=DOC_HASH,
)

# # configuring a public data source:
# from deepsearch.cps.client.components.documents import create_public_data_source
# data_source = create_public_data_source(
#     elastic_id="default",
#     index_key=INDEX_KEY,
#     document_hash=DOC_HASH,
# )

### Ingestion

If your data source has not yet been semantically indexed, you can ingest it into the vector DB as shown below. Otherwise you can skip this step.

Ingestion of already indexed docs is controlled via param `skip_ingested_docs`. 

Particularly when indexing whole collections, note that the larger the data source, the longer the ingestion duration.

In [6]:
task = api.documents.semantic_ingest(
    project=PROJ_KEY,
    data_source=data_source,
    skip_ingested_docs=SKIP_INGESTED_DOCS,
)

# wait for the ingestion task to finish
api.tasks.wait_for(PROJ_KEY, task.task_id)

  Expected `list[str]` but got `_LiteralGenericAlias` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


{'ing_out': {}}

### RAG

Besides the standard RAG usage shown in [QA Quick Start](./qa_quick_start.ipynb), `RAGQuery` has numerous additional parameters for customizing aspects of retrieval, generation, and overall RAG pipeline:


In [7]:
help(RAGQuery)

Help on function RAGQuery in module deepsearch.cps.queries:

RAGQuery(question: str, *, project: Union[str, deepsearch.cps.client.components.projects.Project], data_source: Union[deepsearch.cps.client.components.documents.PrivateDataDocumentSource, deepsearch.cps.client.components.documents.PrivateDataCollectionSource, deepsearch.cps.client.components.documents.PublicDataDocumentSource], retr_k: int = 10, rerank: bool = False, text_weight: typing.Annotated[float, FieldInfo(default=PydanticUndefined, ge=0.0, le=1.0, multiple_of=0.1, extra={'strict': True})] = 0.1, model_id: Optional[str] = None, prompt_template: Optional[str] = None, gen_params: Optional[Dict[str, Any]] = None, gen_ctx_extr_method: Literal['window', 'page'] = 'window', gen_ctx_window_size: int = 5000, gen_ctx_window_lead_weight: float = 0.5, return_prompt: bool = False, gen_timeout: Optional[float] = None) -> deepsearch.cps.client.queries.query.Query
    Create a RAG query
    
    Args:
        question (str): the natu

For instance, below we set the `return_prompt` parameter to get the actual instantiated prompt.

In [8]:
question_query = RAGQuery(
    question=QUESTION,
    project=PROJ_KEY,
    data_source=data_source,
    retr_k=RETR_K,
    return_prompt=True,
    gen_timeout=GEN_TIMEOUT,
)
api_output = api.queries.run(question_query)
rag_result = RAGResult.from_api_output(api_output)

rich.print(QUESTION)
rich.print(rag_result)

Additionally, we can generate a provenance URL to the document in the Deep Search UI:

In [9]:
render_provenance_url(
    api=api,
    coords=data_source.source,
    retr_item=rag_result.answers[0].grounding.retr_items[0],
)

The provenance of the answer can be inspected on the [source document](https://sds.app.accelerate.science/projects/b09ae7561a01dc7c4b0fd21a43bfd93d140766d1/library/private/6b70072911ad2794a3844dd44d1705a5ba37ca0b?search=JTdCJTIycHJpdmF0ZUNvbGxlY3Rpb24lMjIlM0ElMjI2YjcwMDcyOTExYWQyNzk0YTM4NDRkZDQ0ZDE3MDVhNWJhMzdjYTBiJTIyJTJDJTIydHlwZSUyMiUzQSUyMkRvY3VtZW50JTIyJTJDJTIyZXhwcmVzc2lvbiUyMiUzQSUyMmZpbGUtaW5mby5kb2N1bWVudC1oYXNoJTNBJTIwJTVDJTIyYjMwYmM2NjdhMzI0YWUxMTFkMDI1NTI2NTYzYjY3NGE4ZDNmZDg2OWJjMDdjOGZkMjA0YWE5NWIwNWQ0MWYwYyU1QyUyMiUyMiUyQyUyMmZpbHRlcnMlMjIlM0ElNUIlNUQlMkMlMjJzZWxlY3QlMjIlM0ElNUIlMjJfbmFtZSUyMiUyQyUyMmRlc2NyaXB0aW9uLmNvbGxlY3Rpb24lMjIlMkMlMjJwcm92JTIyJTJDJTIyZGVzY3JpcHRpb24udGl0bGUlMjIlMkMlMjJkZXNjcmlwdGlvbi5wdWJsaWNhdGlvbl9kYXRlJTIyJTJDJTIyZGVzY3JpcHRpb24udXJsX3JlZnMlMjIlNUQlMkMlMjJpdGVtSW5kZXglMjIlM0EwJTJDJTIycGFnZVNpemUlMjIlM0ExMCUyQyUyMnNlYXJjaEFmdGVySGlzdG9yeSUyMiUzQSU1QiU1RCUyQyUyMnZpZXdUeXBlJTIyJTNBJTIyc25pcHBldHMlMjIlMkMlMjJyZWNvcmRTZWxlY3Rpb24lMjIlM0ElN0IlMjJyZWNvcmQlMjIlM0ElN0IlMjJpZCUyMiUzQSUyMmIzMGJjNjY3YTMyNGFlMTExZDAyNTUyNjU2M2I2NzRhOGQzZmQ4NjliYzA3YzhmZDIwNGFhOTViMDVkNDFmMGMlMjIlN0QlMkMlMjJpdGVtSW5kZXglMjIlM0E3MSU3RCU3RA%3D%3D).

As shown below, we can also inspect the timing information of the query execution, including the time spent on each step of the pipeline:

In [10]:
rich.print(api_output.timings)

### Semantic retrieval

Besides the standard semantic retrieval usage shown in [QA Quick Start](./qa_quick_start.ipynb), `SemanticQuery` has numerous additional parameters:

In [11]:
help(SemanticQuery)

Help on function SemanticQuery in module deepsearch.cps.queries:

SemanticQuery(question: str, *, project: Union[str, deepsearch.cps.client.components.projects.Project], data_source: Union[deepsearch.cps.client.components.documents.PrivateDataDocumentSource, deepsearch.cps.client.components.documents.PrivateDataCollectionSource, deepsearch.cps.client.components.documents.PublicDataDocumentSource], retr_k: int = 10, rerank: bool = False, text_weight: typing.Annotated[float, FieldInfo(default=PydanticUndefined, ge=0.0, le=1.0, multiple_of=0.1, extra={'strict': True})] = 0.1) -> deepsearch.cps.client.queries.query.Query
    Create a semantic retrieval query
    
    Args:
        question (str): the natural-language query
        document_hash (str): hash of target document
        project (Union[str, Project]): project to use
        data_source (DataSource): the data source to query
        retr_k (int, optional): num of items to retrieve; defaults to 10
        rerank (bool, optional):

For instance, below we set:
- `rerank` to `True` to rerank the retrieved chunks, and
- `text_weight` to `0.8` to favor the lexical component of hybrid search

In [12]:
# submit natural-language query on document
question_query = SemanticQuery(
    question=QUESTION,
    project=PROJ_KEY,
    data_source=data_source,
    retr_k=RETR_K,
    rerank=True,
    text_weight=0.8,
)
api_output = api.queries.run(question_query)
rag_result = SearchResult.from_api_output(api_output)

rich.print(QUESTION)
rich.print(rag_result)

Again here, we can see the time spent on each step of the pipeline:

In [13]:
rich.print(api_output.timings)