# Vectara-Skunk-Client Example
In the code below we show how easy it is to get started with the client. Before these steps, please ensure you have:
1. Generate either an API Key or OAuth2 App from within Vectara's console.
2. Put these into a configuration ".vec_auth.yaml" in your home directory

Format for configuration should should match the following:
```yaml
default:
  customer_id : "1999999999"
  # For API Key, you only need the API key
  api_key : "abcdabcdabcdabcdabcdabcdababcdabcd"
admin:
  customer_id : "1999999999" # Customer Id as a string
  # For OAuth2, you need app_client_id, app_client_secret, auth_url
  app_client_id : "abcdabcdabcdabcdabcdabcdab"
  app_client_secret : "abcdabcdabcdabcdabcdabcdababcdabcdabcdabcdabcdabcdab"
```

In [1]:
%pip install vectara-skunk-client==0.1.7

Note: you may need to restart the kernel to use updated packages.


## Setup Logging
The Client has extensive logging but we need to make sure it's activated within our Python environment

In [2]:
import logging

logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S %z')

## Create the Client Factory
In the code below we create the client factory without any arguments, this will then use the configuration file in the users home directory.

The full factory flow is shown here:

<img src="https://github.com/davidglevy/vectara-skunk-client/raw/main/resources/images/factory-build-flow.png" alt="Factory Build Flow" width="600px"/>

In [3]:
from vectara.core import Factory

client = Factory().build()

16:28:29 +1100 INFO:initializing builder
16:28:29 +1100 INFO:Factory will load configuration from home directory
16:28:29 +1100 INFO:Loading configuration from users home directory [C:\Users\david]
16:28:29 +1100 INFO:Loading default configuration [default]
16:28:29 +1100 INFO:Parsing config
16:28:29 +1100 INFO:We are processing authentication type [OAuth2]
16:28:29 +1100 INFO:Using provided OAuth2 URL [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
16:28:29 +1100 INFO:OAuth2 URL is [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
16:28:29 +1100 INFO:OAuth Application Client Id: 4faqg61ukqn03jkb0ntilb9ps1
16:28:29 +1100 INFO:OAuth Application Client Secret: 5fk0ofqu7k1j9b3jfef8ufh3olh0gg7sapj82c5pll1ssvbn9tc
16:28:29 +1100 INFO:initializing Client


In [4]:
# Quick test to verify we can see all corpora
admin_service = client.admin_service
corpora = admin_service.list_corpora()
for corpus in corpora:
    logging.info(f"Found corpus [{corpus.name}] with id [{corpus.id}]")

16:28:29 +1100 INFO:Current timestamp 2023-12-28 16:28:29.212296
16:28:29 +1100 INFO:First time requesting token, authenticating
16:28:33 +1100 INFO:Received OAuth token, will expire [12/28/2023, 17:28:33]
16:28:33 +1100 INFO:Already authenticated with non-expired token, expiry is [1703744913]
16:28:33 +1100 INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
16:28:36 +1100 INFO:Found corpus [Australia Broadband] with id [6]
16:28:36 +1100 INFO:Found corpus [Australian Importation Laws] with id [10]
16:28:36 +1100 INFO:Found corpus [Building] with id [73]
16:28:36 +1100 INFO:Found corpus [CorpusFilterAttributesIntTest-testWithFilterAttrib] with id [72]
16:28:36 +1100 INFO:Found corpus [High Court of Australia] with id [17]
16:28:36 +1100 INFO:Found corpus [Public Reports] with id [8]
16:28:36 +1100 INFO:Found corpus [SE] with id [7]
16:28:36 +1100 INFO:Found corpus [South Australian State Law] with id [18]
16:28:36 +1100 INFO:Found corpus [Test] with id [80]


## Check for Existing Corpus
We'll now check if our test corpus exists, and if so, delete it so we start with a clean slate.

In [5]:
admin_service = client.admin_service

corpora = admin_service.list_corpora("01-first-client-example")

if len(corpora) >= 1:
    for corpus in corpora:
        admin_service.delete_corpus(corpus.id)
else:
    logging.info("No existing corpus with the name client-example")

16:28:36 +1100 INFO:Current timestamp 2023-12-28 16:28:36.101322
16:28:36 +1100 INFO:Expiry            2023-12-28 17:28:33
16:28:36 +1100 INFO:Already authenticated with non-expired token, expiry is [1703744913]
16:28:36 +1100 INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
16:28:38 +1100 INFO:No existing corpus with the name client-example


## Create New Corpus
We now use the simple signature vectara.admin.AdminService#create_corpus

In [6]:
create_corpus_result = admin_service.create_corpus("01-first-client-example", description="Example Corpus for use from Jupyter")
logging.info(f"New corpus created {create_corpus_result}")
corpus_id = create_corpus_result.corpusId

16:28:38 +1100 INFO:Current timestamp 2023-12-28 16:28:38.222428
16:28:38 +1100 INFO:Expiry            2023-12-28 17:28:33
16:28:38 +1100 INFO:Already authenticated with non-expired token, expiry is [1703744913]
16:28:38 +1100 INFO:URL for operation create-corpus is: https://api.vectara.io/v1/create-corpus
16:28:41 +1100 INFO:Created new corpus with 109
16:28:41 +1100 INFO:New corpus created CreateCorpusResponse(corpusId=109, status=Status(code=<StatusCode.OK: 0>, statusDetail='Corpus Created'))


## Load the Corpus
We'll now load the corpus using a file on our computer. This file is a word document which will be automatically parsed by Vectara

In [7]:
indexer_service = client.indexer_service
indexer_service.upload(corpus_id, "C:\\Users\\david\\OneDrive\\Documents\\Publications\\RAG Options.docx")

16:28:41 +1100 INFO:Headers: {"c": "1623270172", "o": "109"}
16:28:41 +1100 INFO:Current timestamp 2023-12-28 16:28:41.911244
16:28:41 +1100 INFO:Expiry            2023-12-28 17:28:33
16:28:41 +1100 INFO:Already authenticated with non-expired token, expiry is [1703744913]
RAG Options.docx: 695kB [00:20, 34.2kB/s]                                                                            


UploadDocumentResponse(response=UploadDocumentResponseInner(status=None, quotaConsumed=StorageQuota(numChars='7936', numMetadataChars='3592')), document=None)

## Query the Corpus
Lets run a basic query on the corpus with only a single document in it - answers improve with more relevant material but this shows the complete loop.

In [14]:
query_service = client.query_service
query_result = query_service.query("Why is Vectara a good RAG system?", corpus_id)


16:38:37 +1100 INFO:Current timestamp 2023-12-28 16:38:37.392181
16:38:37 +1100 INFO:Expiry            2023-12-28 17:28:33
16:38:37 +1100 INFO:Already authenticated with non-expired token, expiry is [1703744913]
16:38:37 +1100 INFO:Headers: {"Customer-Id": "1623270172", "Authorization": "Bearer eyJraWQiOiJabjNsd3Q1ejVkR2pUVzV3UEVRYnpGQnFjcnBmeFpHNmN2azFvQmVDQWI4PSIsImFsZyI6IlJTMjU2In0.eyJzdWIiOiI0ZmFxZzYxdWtxbjAzamtiMG50aWxiOXBzMSIsInRva2VuX3VzZSI6ImFjY2VzcyIsInNjb3BlIjoiUXVlcnlTZXJ2aWNlXC9RdWVyeSBRdWVyeVNlcnZpY2VcL1N0cmVhbVF1ZXJ5IiwiYXV0aF90aW1lIjoxNzAzNzQxMzExLCJpc3MiOiJodHRwczpcL1wvY29nbml0by1pZHAudXMtd2VzdC0yLmFtYXpvbmF3cy5jb21cL3VzLXdlc3QtMl81QjI3MjdGbEsiLCJleHAiOjE3MDM3NDQ5MTEsImlhdCI6MTcwMzc0MTMxMSwidmVyc2lvbiI6MiwianRpIjoiNGM3ZWM5NjItNDdjMC00ZDZiLWE3MTctNDNmOTdiZGVjOWU3IiwiY2xpZW50X2lkIjoiNGZhcWc2MXVrcW4wM2prYjBudGlsYjlwczEifQ.UCckx9WmoIIs5YCpPGTDOuGo7cOORA2RYCay-9cyoQBI3MLX_JvQXPX79n1W9OEzkhBXZ-n2OW2QoGaXy9nDJjhped0UaWHOlX83P9KfRx1o5JxDT2LBXuareaHdB68XzCIXDxqPOM1ZJoeKnmTi3BMNC

Query is:
{
    "query": [
        {
            "query": "Why is Vectara a good RAG system?",
            "numResults": 10,
            "corpusKey": [
                {
                    "customerId": 1623270172,
                    "corpusId": 109
                }
            ],
            "summary": [
                {
                    "summarizerPromptName": "vectara-summary-ext-v1.2.0",
                    "responseLang": "en",
                    "maxSummarizedResults": 5
                }
            ]
        }
    ]
}



In [15]:
print(query_result.responseSet[0].summary[0].text)

Vectara is considered a good RAG system because it is the only ready-to-go system specifically designed for serving RAG operationally [1]. It was founded in 2019 with a focus on being a dedicated RAG solution for application builders, offering easy APIs [2]. Vectara aims to be the Snowflake of RAG, providing a platform for application builders to easily implement Generative AI solutions [4]. Its features are highly regarded and praised for their advanced capabilities [5].


In [16]:
from dataclasses import asdict
from vectara.util import _custom_asdict_factory
import json

print(json.dumps(asdict(query_result, dict_factory=_custom_asdict_factory),indent=4))

{
    "responseSet": [
        {
            "response": [
                {
                    "text": "Scoring Category Through the lens of \u201cSaaS specifically for serving RAG operationally\u201d, Vectara is the only ready-to-go system for serving these workloads.",
                    "score": 0.8000339,
                    "metadata": [
                        {
                            "name": "lang",
                            "value": "eng"
                        },
                        {
                            "name": "section",
                            "value": "1"
                        },
                        {
                            "name": "offset",
                            "value": "3786"
                        },
                        {
                            "name": "len",
                            "value": "155"
                        }
                    ],
                    "corpusKey": {
                        "corpusI