# Document QA

Deep Search allows users to interact with the documents using conversational AI, i.e. you interact with a virtual assistant which answer your questions using the information in the corpus.
This works both on the public and private data libraries.

In this example we demonstrate how achive the same interaction programmatically.


### Access required

The content of this notebook requires access to Deep Search capabilities which are not
available on the public access system.

[Contact us](https://ds4sd.github.io/) if you are interested in exploring
the enterprise-level Deep Search capabilities.


### GenAI Integration required

When interacting with the virtual assistant, Deep Search requires a connection to a Generative AI API. Currently, we support connections to [watsonx.ai](https://www.ibm.com/products/watsonx-ai) or the IBM-internal GenAI platform BAM.

Deep Search allows custom GenAI configurations for each project.
In the following example you will require to work in a project which has such GenAI capabilities activated.

### Notebooks parameters

The following block defines the parameters used to execute the notebook

- `PROJ_KEY`: the Deep Search project to use


In [11]:
PROJ_KEY = "1234567890abcdefghijklmnopqrstvwyz123456"  # For this examples we can use the Community project


### Import example dependencies

In [2]:
# Import standard dependenices
import pandas as pd

# IPython utilities
from IPython.display import display, Markdown, HTML, display_html

# Import the deepsearch-toolkit
import deepsearch as ds
from deepsearch.cps.client.api import CpsApi
from deepsearch.cps.apis import public as sw_client
from deepsearch.cps.client.components.elastic import ElasticDataCollectionSource
from deepsearch.cps.client.queries import Query
from deepsearch.cps.queries import DataQuery, DocumentQuestionQuery
from deepsearch.cps.client.components.queries import RunQueryError

### Connect to Deep Search

In [3]:
api = CpsApi.from_env(profile_name="sds")


---

## Interact with a RedHat Manual

In the following blocks we will
1. Search for an interesting document in our collection of RedHat manuals
2. Load the document into the DocumentQA engine
3. Ask questions


In [4]:
data_collection = "redhat"
data_instance = "default"
search_query = "\"OpenShift Container Platform 4.12 Getting started\""


# Prepare the data query
collection_coords = ElasticDataCollectionSource(elastic_id=data_instance, index_key=data_collection)
query = DataQuery(
    search_query,  # The search query to be executed
    source=["description.title", "file-info.document-hash"],  # Which fields of documents we want to fetch
    coordinates=collection_coords,  # The data collection to be queries
)


# Query Deep Search for the documents matching the query
results = []
query_results = api.queries.run(query)
for row in query_results.outputs["data_outputs"]:
        # Add row to results table
        results.append({
            "Title": row["_source"]["description"]["title"],
            "DocHash": row["_source"]["file-info"]["document-hash"],
        })

print(f'Finished fetching all data. Total is {len(results)} records.')

# Visualize the table with all results
df = pd.json_normalize(results)
display(df)

Finished fetching all data. Total is 1 records.


Unnamed: 0,Title,DocHash
0,Product Documentation for OpenShift Container ...,6e017dcb29a0348e1f6e5554bb547b979e7e47256d9dc2...


In [5]:
# Here we launch the ingestion of the document for DocumentQA

doc_hash = results[0]["DocHash"]

ingest_api = sw_client.DocumentInspectionApi(api.client.swagger_client)

task = api.documents.ingest_documentqa(PROJ_KEY, collection_coords, doc_hash)


In [6]:
# Here we wait for the ingestion task to finish

api.tasks.wait_for(task.proj_key, task.task_id)

In [7]:
# Here we query the document for the question

question = "Can I use the Kubernetes command line utilities with an OpenShift cluster?"

question_query = DocumentQuestionQuery(
    question=question,
    document_hash=doc_hash,
    project=PROJ_KEY,
)
question_results = api.queries.run(question_query)


In [10]:
# Print the answer to the question

## Unpack the answer
answer = question_results.outputs["answer"]

## Compute URL to the document in the Deep Search UI
doc_url = api.documents.generate_url(
    document_hash=doc_hash,
    data_source=collection_coords,
    item_index=question_results.outputs["provenance"][0]["entity_counter"],
)

## Display results
display(Markdown(f"Question: {question}"))
display(Markdown(f"Answer: {answer}"))
display(Markdown(f"The provenance of the answer can be inspected on the [source document]({doc_url})."))

Question: Can I use the Kubernetes command line utilities with an OpenShift cluster?

Answer: OpenShift Container Platform CLI tool (oc) is compatible with kubectl.

The provenance of the answer can be inspected on the [source document](https://sds.app.accelerate.science/projects/1234567890abcdefghijklmnopqrstvwyz123456/library/public?search=JTdCJTIyY29sbGVjdGlvbnMlMjIlM0ElNUIlMjJyZWRoYXQlMjIlNUQlMkMlMjJ0eXBlJTIyJTNBJTIyRG9jdW1lbnQlMjIlMkMlMjJleHByZXNzaW9uJTIyJTNBJTIyZmlsZS1pbmZvLmRvY3VtZW50LWhhc2glM0ElMjAlNUMlMjI2ZTAxN2RjYjI5YTAzNDhlMWY2ZTU1NTRiYjU0N2I5NzllN2U0NzI1NmQ5ZGMyZjBlZGY2Y2I5NDEwYWE1NzU2JTVDJTIyJTIyJTJDJTIyZmlsdGVycyUyMiUzQSU1QiU1RCUyQyUyMnNlbGVjdCUyMiUzQSU1QiUyMl9uYW1lJTIyJTJDJTIyZGVzY3JpcHRpb24uY29sbGVjdGlvbiUyMiUyQyUyMnByb3YlMjIlMkMlMjJkZXNjcmlwdGlvbi50aXRsZSUyMiUyQyUyMmRlc2NyaXB0aW9uLnB1YmxpY2F0aW9uX2RhdGUlMjIlMkMlMjJkZXNjcmlwdGlvbi51cmxfcmVmcyUyMiU1RCUyQyUyMml0ZW1JbmRleCUyMiUzQTAlMkMlMjJwYWdlU2l6ZSUyMiUzQTEwJTJDJTIyc2VhcmNoQWZ0ZXJIaXN0b3J5JTIyJTNBJTVCJTVEJTJDJTIydmlld1R5cGUlMjIlM0ElMjJzbmlwcGV0cyUyMiUyQyUyMnJlY29yZFNlbGVjdGlvbiUyMiUzQSU3QiUyMnJlY29yZCUyMiUzQSU3QiUyMmlkJTIyJTNBJTIyNmUwMTdkY2IyOWEwMzQ4ZTFmNmU1NTU0YmI1NDdiOTc5ZTdlNDcyNTZkOWRjMmYwZWRmNmNiOTQxMGFhNTc1NiUyMiU3RCUyQyUyMml0ZW1JbmRleCUyMiUzQTcwJTdEJTdE).