# Vectara-Skunk-Client Example
In the code below we show how easy it is to get started with the client. Before these steps, please ensure you have:
1. Generate either an API Key or OAuth2 App from within Vectara's console.
2. Put these into a configuration ".vec_auth.yaml" in your home directory

Format for configuration should should match the following:
```yaml
default:
  customer_id : "1999999999"
  # For API Key, you only need the API key
  api_key : "abcdabcdabcdabcdabcdabcdababcdabcd"
admin:
  customer_id : "1999999999" # Customer Id as a string
  # For OAuth2, you need app_client_id, app_client_secret, auth_url
  app_client_id : "abcdabcdabcdabcdabcdabcdab"
  app_client_secret : "abcdabcdabcdabcdabcdabcdababcdabcdabcdabcdabcdabcdab"
```

In [9]:
%pip install vectara-skunk-client==0.2.0

Collecting vectara-skunk-client==0.2.0
  Downloading vectara_skunk_client-0.2.0-py3-none-any.whl.metadata (916 bytes)
Downloading vectara_skunk_client-0.2.0-py3-none-any.whl (34 kB)
Installing collected packages: vectara-skunk-client
  Attempting uninstall: vectara-skunk-client
    Found existing installation: vectara-skunk-client 0.1.9
    Uninstalling vectara-skunk-client-0.1.9:
      Successfully uninstalled vectara-skunk-client-0.1.9
Successfully installed vectara-skunk-client-0.2.0
Note: you may need to restart the kernel to use updated packages.


## Setup Logging
The Client has extensive logging but we need to make sure it's activated within our Python environment

In [1]:
import logging

logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S %z')

## Create the Client Factory
In the code below we create the client factory without any arguments, this will then use the configuration file in the users home directory.

The full factory flow is shown here:

<img src="https://github.com/davidglevy/vectara-skunk-client/raw/main/resources/images/factory-build-flow.png" alt="Factory Build Flow" width="600px"/>

In [2]:
from vectara.core import Factory

client = Factory().build()

17:26:29 +1100 INFO:initializing builder
17:26:29 +1100 INFO:Factory will load configuration from home directory
17:26:29 +1100 INFO:Loading configuration from users home directory [C:\Users\david]
17:26:29 +1100 INFO:Loading default configuration [default]
17:26:29 +1100 INFO:Parsing config
17:26:29 +1100 INFO:We are processing authentication type [OAuth2]
17:26:29 +1100 INFO:Using provided OAuth2 URL [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
17:26:29 +1100 INFO:OAuth2 URL is [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
17:26:29 +1100 INFO:initializing Client


In [3]:
# Quick test to verify we can see all corpora
admin_service = client.admin_service
corpora = admin_service.list_corpora()
for corpus in corpora:
    logging.info(f"Found corpus [{corpus.name}] with id [{corpus.id}]")

17:26:29 +1100 INFO:Current timestamp 2023-12-28 17:26:29.951260
17:26:29 +1100 INFO:First time requesting token, authenticating
17:26:32 +1100 INFO:Received OAuth token, will expire [12/28/2023, 18:26:32]
17:26:32 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748392]
17:26:32 +1100 INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
17:26:34 +1100 INFO:Found corpus [Australia Broadband] with id [6]
17:26:34 +1100 INFO:Found corpus [Australian Importation Laws] with id [10]
17:26:34 +1100 INFO:Found corpus [Building] with id [73]
17:26:34 +1100 INFO:Found corpus [CorpusFilterAttributesIntTest-testWithFilterAttrib] with id [72]
17:26:34 +1100 INFO:Found corpus [High Court of Australia] with id [17]
17:26:34 +1100 INFO:Found corpus [Public Reports] with id [8]
17:26:34 +1100 INFO:Found corpus [SE] with id [7]
17:26:34 +1100 INFO:Found corpus [South Australian State Law] with id [18]
17:26:34 +1100 INFO:Found corpus [Test] with id [80]


## Check for Existing Corpus
We'll now check if our test corpus exists, and if so, delete it so we start with a clean slate.

In [4]:
admin_service = client.admin_service

corpora = admin_service.list_corpora("01-first-client-example")

if len(corpora) >= 1:
    for corpus in corpora:
        admin_service.delete_corpus(corpus.id)
else:
    logging.info("No existing corpus with the name client-example")

17:26:34 +1100 INFO:Current timestamp 2023-12-28 17:26:34.139285
17:26:34 +1100 INFO:Expiry            2023-12-28 18:26:32
17:26:34 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748392]
17:26:34 +1100 INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
17:26:35 +1100 INFO:Current timestamp 2023-12-28 17:26:35.826940
17:26:35 +1100 INFO:Expiry            2023-12-28 18:26:32
17:26:35 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748392]
17:26:35 +1100 INFO:URL for operation delete-corpus is: https://api.vectara.io/v1/delete-corpus


## Create New Corpus
We now use the simple signature vectara.admin.AdminService#create_corpus

In [5]:
create_corpus_result = admin_service.create_corpus("01-first-client-example", description="Example Corpus for use from Jupyter")
logging.info(f"New corpus created {create_corpus_result}")
corpus_id = create_corpus_result.corpusId

17:26:38 +1100 INFO:Current timestamp 2023-12-28 17:26:38.818151
17:26:38 +1100 INFO:Expiry            2023-12-28 18:26:32
17:26:38 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748392]
17:26:38 +1100 INFO:URL for operation create-corpus is: https://api.vectara.io/v1/create-corpus
17:26:41 +1100 INFO:Created new corpus with 111
17:26:41 +1100 INFO:New corpus created CreateCorpusResponse(corpusId=111, status=Status(code=<StatusCode.OK: 0>, statusDetail='Corpus Created'))


## Load the Corpus
We'll now load the corpus using a file on our computer. This file is a word document which will be automatically parsed by Vectara

In [6]:
indexer_service = client.indexer_service
indexer_service.upload(corpus_id, "C:\\Users\\david\\OneDrive\\Documents\\Publications\\RAG Options.docx")

17:26:41 +1100 INFO:Headers: {"c": "1623270172", "o": "111"}
17:26:41 +1100 INFO:Current timestamp 2023-12-28 17:26:41.474492
17:26:41 +1100 INFO:Expiry            2023-12-28 18:26:32
17:26:41 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748392]
RAG Options.docx: 695kB [00:08, 82.1kB/s]                                                                            


UploadDocumentResponse(response=UploadDocumentResponseInner(status=None, quotaConsumed=StorageQuota(numChars='7936', numMetadataChars='3592')), document=None)

## Query the Corpus
Lets run a basic query on the corpus with only a single document in it - answers improve with more relevant material but this shows the complete loop.

In [7]:
query_service = client.query_service
query_result = query_service.query("Why is Vectara a good RAG system?", corpus_id)


17:26:50 +1100 INFO:Response Language set to [en]
17:26:50 +1100 INFO:Current timestamp 2023-12-28 17:26:50.207380
17:26:50 +1100 INFO:Expiry            2023-12-28 18:26:32
17:26:50 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748392]
17:26:50 +1100 INFO:URL for operation query is: https://api.vectara.io/v1/query


Query is:
{
    "query": [
        {
            "query": "Why is Vectara a good RAG system?",
            "numResults": 10,
            "corpusKey": [
                {
                    "customerId": 1623270172,
                    "corpusId": 111
                }
            ],
            "summary": [
                {
                    "summarizerPromptName": "vectara-summary-ext-v1.2.0",
                    "responseLang": "en",
                    "maxSummarizedResults": 5
                }
            ]
        }
    ]
}



In [9]:
print(query_result.summary[0].text)

Vectara is considered a good RAG system because it was specifically designed to serve RAG workloads [1]. It is the only ready-to-go system for this purpose [1]. Vectara was founded in 2019 with the goal of being a focused RAG solution for application builders [2]. It offers easy APIs and aims to be the Snowflake of RAG [2][4]. The platform provides advanced RAG features and an easy path to Generative AI solutions [5]. Its focused delivery by application builders has resulted in a comprehensive and user-friendly system [5]. Overall, Vectara stands out as a reliable RAG system for its specialized approach and features [1][2][4][5].


In [10]:
from dataclasses import asdict
from vectara.util import _custom_asdict_factory
import json

print(json.dumps(asdict(query_result, dict_factory=_custom_asdict_factory),indent=4))

{
    "response": [
        {
            "text": "Scoring Category Through the lens of \u201cSaaS specifically for serving RAG operationally\u201d, Vectara is the only ready-to-go system for serving these workloads.",
            "score": 0.8000339,
            "metadata": [
                {
                    "name": "lang",
                    "value": "eng"
                },
                {
                    "name": "section",
                    "value": "1"
                },
                {
                    "name": "offset",
                    "value": "3786"
                },
                {
                    "name": "len",
                    "value": "155"
                }
            ],
            "corpusKey": {
                "corpusId": 111,
                "customerId": 0,
                "semantics": "DEFAULT",
                "dim": [],
                "metadataFilter": "",
                "lexicalInterpolationConfig": null
            },
          