# QA Quick Start

In this QA Quick Start notebook, we take a first look at semantic ingestion, RAG, and retrieval, presenting basic usage patterns.

For more advanced QA examples, check out [QA Deep Dive](./qa_deep_dive.ipynb).

### Access required

The content of this notebook requires access to Deep Search capabilities which are not
available on the public access system.

[Contact us](https://ds4sd.github.io) if you are interested in exploring
these Deep Search capabilities.


### GenAI Integration required

When interacting with the virtual assistant, Deep Search requires a connection to a Generative AI API. Currently, we support connections to [watsonx.ai](https://www.ibm.com/products/watsonx-ai) or the IBM-internal GenAI platform BAM.

Deep Search allows custom GenAI configurations for each project.
In the following example you will require to work in a project which has such GenAI capabilities activated.

### Set notebook parameters


In [1]:
import os
from dotenv import load_dotenv
from pydantic import TypeAdapter

load_dotenv()

PROFILE_NAME = os.environ.get("DS_NB_PROFILE")  # profile to use; defaults to active one
PROJ_KEY = os.environ["DS_NB_PROJ_KEY"]  # project to use
INDEX_KEY = os.environ["DS_NB_QA_IDX_KEY"]
DOC_HASH = os.environ.get("DS_NB_QA_DOC_HASH")  # set only when targeting a specific doc
QUESTION = os.environ["DS_NB_QUESTION"]

# whether to ingest incrementally:
SKIP_INGESTED_DOCS = TypeAdapter(bool).validate_python(
    os.environ.get("DS_NB_SKIP_INGESTED_DOCS", True)
)
RETR_K = os.environ.get("DS_NB_RETR_K", 3)  # number of search results to retrieve
GEN_TIMEOUT = os.environ.get("DS_NB_GEN_TIMEOUT", 10)  # generation timeout in seconds

### Import example dependencies

In [2]:
import rich
from deepsearch.cps.client.api import CpsApi
from deepsearch.cps.client.components.documents import create_private_data_source
from deepsearch.cps.queries import RAGQuery, SemanticQuery
from deepsearch.cps.queries.results import RAGResult, SearchResult

### Connect to Deep Search

In [3]:
api = CpsApi.from_env(profile_name=PROFILE_NAME)

### Prepare data source

The cell below shows how to configure a private data source, i.e. either a whole private collection (in which case `document_hash` should be `None` or omitted) or a given doc within one.

For more details on data sources check out [QA Deep Dive](./qa_deep_dive.ipynb).

In [4]:
data_source = create_private_data_source(
    proj_key=PROJ_KEY,
    index_key=INDEX_KEY,
    document_hash=DOC_HASH,
)

### Ingestion

If your data source has not yet been semantically indexed, you can ingest it into the vector DB as shown below. Otherwise you can skip this step.

Ingestion of already indexed docs is controlled via param `skip_ingested_docs`. 

Particularly when indexing whole collections, note that the larger the data source, the longer the ingestion duration.

In [5]:
task = api.documents.semantic_ingest(
    project=PROJ_KEY,
    data_source=data_source,
    skip_ingested_docs=SKIP_INGESTED_DOCS,
)

# wait for the ingestion task to finish
api.tasks.wait_for(PROJ_KEY, task.task_id)

  Expected `list[str]` but got `_LiteralGenericAlias` with value `typing.Literal['SemanticI...emanticIngestSourceUrl']` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


{'ing_out': {}}

### RAG

The cell below demonstrates basic RAG usage.

For more advanced usage and parametrization, check out [QA Deep Dive](./qa_deep_dive.ipynb).

In [None]:
query = RAGQuery(
    question=QUESTION,
    project=PROJ_KEY,
    data_source=data_source,
    retr_k=RETR_K,  # optional
    gen_timeout=GEN_TIMEOUT,  # optional
)

api_output = api.queries.run(query)
result = RAGResult.from_api_output(api_output)

rich.print(QUESTION)
rich.print(result)

### Semantic retrieval

In certain cases, a user may only be interested in the semantic retrieval part, instead of the whole RAG pipeline.

Basic semantic retrieval usage is shown below:

In [7]:
query = SemanticQuery(
    question=QUESTION,
    project=PROJ_KEY,
    data_source=data_source,
    retr_k=RETR_K,  # optional
)
api_output = api.queries.run(query)
result = SearchResult.from_api_output(api_output)

rich.print(QUESTION)
rich.print(result)