# Search Demo Corpus

We'll now continue the example with a recipe from the Vectara website where we search our employee handbook.

See here for the REST version of this: https://docs.vectara.com/docs/api-recipes

In [1]:
%pip install -q vectara-skunk-client==0.2.0

Note: you may need to restart the kernel to use updated packages.


In [2]:
import logging

logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S %z')

In [3]:
from vectara.core import Factory

client = Factory().build()

17:28:47 +1100 INFO:initializing builder
17:28:47 +1100 INFO:Factory will load configuration from home directory
17:28:47 +1100 INFO:Loading configuration from users home directory [C:\Users\david]
17:28:47 +1100 INFO:Loading default configuration [default]
17:28:47 +1100 INFO:Parsing config
17:28:47 +1100 INFO:We are processing authentication type [OAuth2]
17:28:47 +1100 INFO:Using provided OAuth2 URL [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
17:28:47 +1100 INFO:OAuth2 URL is [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
17:28:47 +1100 INFO:initializing Client


In [4]:
admin_service = client.admin_service

corpora = admin_service.list_corpora("02-employee-handbook")

if len(corpora) >= 1:
    for corpus in corpora:
        admin_service.delete_corpus(corpus.id)
else:
    logging.info("No existing corpus with the name client-example")

17:28:47 +1100 INFO:Current timestamp 2023-12-28 17:28:47.567823
17:28:47 +1100 INFO:First time requesting token, authenticating
17:28:49 +1100 INFO:Received OAuth token, will expire [12/28/2023, 18:28:49]
17:28:49 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748529]
17:28:49 +1100 INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
17:28:51 +1100 INFO:Current timestamp 2023-12-28 17:28:51.571992
17:28:51 +1100 INFO:Expiry            2023-12-28 18:28:49
17:28:51 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748529]
17:28:51 +1100 INFO:URL for operation delete-corpus is: https://api.vectara.io/v1/delete-corpus


## Create Our Demo Corpus
We now create our new corpus called, "02-employee-handbook" where we'll upload our documents for this test.

In [5]:
create_corpus_result = admin_service.create_corpus("02-employee-handbook", description="Example Corpus for use from Jupyter")
logging.info(f"New corpus created {create_corpus_result}")
corpus_id = create_corpus_result.corpusId

17:28:54 +1100 INFO:Current timestamp 2023-12-28 17:28:54.689195
17:28:54 +1100 INFO:Expiry            2023-12-28 18:28:49
17:28:54 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748529]
17:28:54 +1100 INFO:URL for operation create-corpus is: https://api.vectara.io/v1/create-corpus
17:28:57 +1100 INFO:Created new corpus with 112
17:28:57 +1100 INFO:New corpus created CreateCorpusResponse(corpusId=112, status=Status(code=<StatusCode.OK: 0>, statusDetail='Corpus Created'))


## Upload Handbook to Corpus
We now upload our PDF to the corpus which will parse and then encode the document, storing the embeddings in our vector database.

In [6]:
from pathlib import Path

handbook_path = Path("resources/vectara_employee_handbook.pdf")
indexer_service = client.indexer_service
indexer_service.upload(corpus_id, handbook_path)

17:28:57 +1100 INFO:Headers: {"c": "1623270172", "o": "112"}
17:28:57 +1100 INFO:Current timestamp 2023-12-28 17:28:57.332460
17:28:57 +1100 INFO:Expiry            2023-12-28 18:28:49
17:28:57 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748529]
vectara_employee_handbook.pdf: 52.7kB [00:04, 11.7kB/s]                                                              


UploadDocumentResponse(response=UploadDocumentResponseInner(status=None, quotaConsumed=StorageQuota(numChars='9215', numMetadataChars='4903')), document=None)

In [7]:
query_service = client.query_service
#response = query_service.query("Can I bring any birds to the Vectara office?", corpus_id)
response = query_service.query("How much PTO is offered to employees each year?", corpus_id)

17:29:02 +1100 INFO:Response Language set to [en]
17:29:02 +1100 INFO:Current timestamp 2023-12-28 17:29:02.023014
17:29:02 +1100 INFO:Expiry            2023-12-28 18:28:49
17:29:02 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748529]
17:29:02 +1100 INFO:URL for operation query is: https://api.vectara.io/v1/query


Query is:
{
    "query": [
        {
            "query": "How much PTO is offered to employees each year?",
            "numResults": 10,
            "corpusKey": [
                {
                    "customerId": 1623270172,
                    "corpusId": 112
                }
            ],
            "summary": [
                {
                    "summarizerPromptName": "vectara-summary-ext-v1.2.0",
                    "responseLang": "en",
                    "maxSummarizedResults": 5
                }
            ]
        }
    ]
}



In [8]:
print(response.summary[0].text)

The returned results did not contain sufficient information to be summarized into a useful answer for your query. Please try a different search or restate your query differently.


In [9]:
from dataclasses import asdict
from vectara.util import _custom_asdict_factory
import json

print(json.dumps(asdict(response, dict_factory=_custom_asdict_factory),indent=4))

{
    "response": [
        {
            "text": "Employee Handbook - Company Pet Policy",
            "score": 0.6196232,
            "metadata": [
                {
                    "name": "title_level",
                    "value": "1"
                },
                {
                    "name": "is_title",
                    "value": "true"
                },
                {
                    "name": "lang",
                    "value": "eng"
                }
            ],
            "corpusKey": {
                "corpusId": 112,
                "customerId": 0,
                "semantics": "DEFAULT",
                "dim": [],
                "metadataFilter": "",
                "lexicalInterpolationConfig": null
            },
            "resultOffset": 0
        },
        {
            "text": "\u25cb   Employee safety and animal welfare are our top priorities.",
            "score": 0.5971849,
            "metadata": [
                {
                    