# Vectara-Skunk-Client Example
In the code below we show how easy it is to get started with the client. Before these steps, please ensure you have:
1. Generate either an API Key or OAuth2 App from within Vectara's console.
2. Put these into a configuration ".vec_auth.yaml" in your home directory

Format for configuration should should match the following:
```yaml
default:
  customer_id : "1999999999"
  auth:
      # For API Key, you only need the API key
      api_key : "abcdabcdabcdabcdabcdabcdababcdabcd"
admin:
  customer_id : "1999999999" # Customer Id as a string
  # For OAuth2, you need app_client_id, app_client_secret, auth_url
  auth:
      app_client_id : "abcdabcdabcdabcdabcdabcdab"
      app_client_secret : "abcdabcdabcdabcdabcdabcdababcdabcdabcdabcdabcdabcdab"
```

In [10]:
%pip install -q vectara-skunk-client==0.2.5

Note: you may need to restart the kernel to use updated packages.


## Setup Logging
The Client has extensive logging but we need to make sure it's activated within our Python environment

In [2]:
import logging

logging.basicConfig(format='%(asctime)s %(name)-20s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S %z')

## Create the Client Factory
In the code below we create the client factory without any arguments, this will then use the configuration file in the users home directory.

The full factory flow is shown here:

<img src="https://github.com/davidglevy/vectara-skunk-client/raw/main/resources/images/factory-build-flow.png" alt="Factory Build Flow" width="600px"/>

In [3]:
from vectara.core import Factory

client = Factory().build()

12:06:19 +1100 Factory              INFO:initializing builder
12:06:19 +1100 Factory              INFO:Factory will load configuration from home directory
12:06:19 +1100 HomeConfigLoader     INFO:Loading configuration from users home directory [C:\Users\david]
12:06:19 +1100 HomeConfigLoader     INFO:Loading default configuration [default]
12:06:19 +1100 HomeConfigLoader     INFO:Parsing config
12:06:19 +1100 root                 INFO:We are processing authentication type [OAuth2]
12:06:19 +1100 OAuthUtil            INFO:Using provided OAuth2 URL [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
12:06:19 +1100 OAuthUtil            INFO:OAuth2 URL is [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
12:06:19 +1100 root                 INFO:initializing Client


In [4]:
# Quick test to verify we can see all corpora
admin_service = client.admin_service
corpora = admin_service.list_corpora()
for corpus in corpora:
    logging.info(f"Found corpus [{corpus.name}] with id [{corpus.id}]")

12:06:21 +1100 OAuthUtil            INFO:Current timestamp 2023-12-29 12:06:21.526598
12:06:21 +1100 OAuthUtil            INFO:First time requesting token, authenticating
12:06:22 +1100 OAuthUtil            INFO:Received OAuth token, will expire [12/29/2023, 13:06:22]
12:06:22 +1100 OAuthUtil            INFO:Already authenticated with non-expired token, expiry is [1703815582]
12:06:22 +1100 RequestUtil          INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
12:06:23 +1100 root                 INFO:Found corpus [Australia Broadband] with id [6]
12:06:23 +1100 root                 INFO:Found corpus [Australian Importation Laws] with id [10]
12:06:23 +1100 root                 INFO:Found corpus [Building] with id [73]
12:06:23 +1100 root                 INFO:Found corpus [CorpusFilterAttributesIntTest-testWithFilterAttrib] with id [72]
12:06:23 +1100 root                 INFO:Found corpus [High Court of Australia] with id [17]
12:06:23 +1100 root           

## Check for Existing Corpus
We'll now check if our test corpus exists, and if so, delete it so we start with a clean slate.

In [5]:
admin_service = client.admin_service

corpora = admin_service.list_corpora("01-first-client-example")

if len(corpora) >= 1:
    for corpus in corpora:
        admin_service.delete_corpus(corpus.id)
else:
    logging.info("No existing corpus with the name client-example")

12:06:26 +1100 OAuthUtil            INFO:Current timestamp 2023-12-29 12:06:26.627267
12:06:26 +1100 OAuthUtil            INFO:Expiry            2023-12-29 13:06:22
12:06:26 +1100 OAuthUtil            INFO:Already authenticated with non-expired token, expiry is [1703815582]
12:06:26 +1100 RequestUtil          INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
12:06:27 +1100 OAuthUtil            INFO:Current timestamp 2023-12-29 12:06:27.397251
12:06:27 +1100 OAuthUtil            INFO:Expiry            2023-12-29 13:06:22
12:06:27 +1100 OAuthUtil            INFO:Already authenticated with non-expired token, expiry is [1703815582]
12:06:27 +1100 RequestUtil          INFO:URL for operation delete-corpus is: https://api.vectara.io/v1/delete-corpus


## Create New Corpus
We now use the simple signature vectara.admin.AdminService#create_corpus

In [6]:
create_corpus_result = admin_service.create_corpus("01-first-client-example", description="Example Corpus for use from Jupyter")
logging.info(f"New corpus created {create_corpus_result}")
corpus_id = create_corpus_result.corpusId

12:06:35 +1100 OAuthUtil            INFO:Current timestamp 2023-12-29 12:06:35.234444
12:06:35 +1100 OAuthUtil            INFO:Expiry            2023-12-29 13:06:22
12:06:35 +1100 OAuthUtil            INFO:Already authenticated with non-expired token, expiry is [1703815582]
12:06:35 +1100 RequestUtil          INFO:URL for operation create-corpus is: https://api.vectara.io/v1/create-corpus
12:06:37 +1100 AdminService         INFO:Created new corpus with 119
12:06:37 +1100 root                 INFO:New corpus created CreateCorpusResponse(corpusId=119, status=Status(code=<StatusCode.OK: 0>, statusDetail='Corpus Created'))


## Load the Corpus
We'll now load the corpus using a file on our computer. This file is a word document which will be automatically parsed by Vectara

In [7]:
indexer_service = client.indexer_service
indexer_service.upload(corpus_id, "C:\\Users\\david\\OneDrive\\Documents\\Publications\\RAG Options.docx")

12:06:38 +1100 IndexerService       INFO:Headers: {"c": "1623270172", "o": "119"}
12:06:38 +1100 OAuthUtil            INFO:Current timestamp 2023-12-29 12:06:38.263313
12:06:38 +1100 OAuthUtil            INFO:Expiry            2023-12-29 13:06:22
12:06:38 +1100 OAuthUtil            INFO:Already authenticated with non-expired token, expiry is [1703815582]
RAG Options.docx: 695kB [00:03, 180kB/s]                                                                               


UploadDocumentResponse(response=UploadDocumentResponseInner(status=None, quotaConsumed=StorageQuota(numChars='7936', numMetadataChars='3592')), document=None)

## Query the Corpus
Lets run a basic query on the corpus with only a single document in it - answers improve with more relevant material but this shows the complete loop.

In [8]:
query_service = client.query_service
query = "Why is Vectara a good RAG system?"
response = query_service.query(query, corpus_id)


12:06:44 +1100 OAuthUtil            INFO:Current timestamp 2023-12-29 12:06:44.512343
12:06:44 +1100 OAuthUtil            INFO:Expiry            2023-12-29 13:06:22
12:06:44 +1100 OAuthUtil            INFO:Already authenticated with non-expired token, expiry is [1703815582]
12:06:44 +1100 RequestUtil          INFO:URL for operation query is: https://api.vectara.io/v1/query


## Show Result in Markdown
We'll now use the utility method "renderMarkdown" to render the Python dict

In [9]:
from vectara.util import render_markdown
from IPython.display import display, Markdown
rendered = render_markdown(query, response)
display(Markdown(rendered))


# Query: Why is Vectara a good RAG system?

Vectara is considered a good RAG system because it is specifically designed for serving RAG workloads and is the only ready-to-go system for this purpose [1]. It stands out among other options in the market, which are either platforms that can be used for RAG or libraries [1]. Vectara aims to be the "Snowflake" of RAG, providing a high level of abstraction for application builders with easy APIs [3]. Its focused delivery for application builders has resulted in an incredibly easy path to Generative AI solutions [5]. Additionally, Vectara offers advanced features like ACLs and multi-corpus searches, allowing organizations to handle their IP effectively [4].

 1. **RAG Options.docx** (RAG Options.docx): · ACLs on the Corpus? <b>Scoring Category Through the lens of “SaaS specifically for serving RAG operationally”, Vectara is the only ready-to-go system for serving these workloads.</b> The rest are either “platforms that can be used for” or “libraries” rather than a capability. *score: 0.93919635*
 2. **RAG Options.docx** (RAG Options.docx): RAG Options What the thing is: Choosing a RAG in today’s market Options · Cohere · OpenAI · Azure Search · Google AI · LangChain · Llama Index · Databricks Summarise · Background Sentence · Who each tech is created for (App Builders | Data Engineers | DSs) · Cost model (scale to zero) · Abstraction Strengths/Weaknesses · Ease of Use (right abstraction etc) · Operational Cost · Trust · Model Coupling · Advanced Features (Hybrid) Conclusion Introduction Wow what a difference a day makes!! <b>When Vectara was founded the Retrieval Augmented Generation (RAG) was a very lonely space.</b> But from the explosion of LLMs in 2022 have also seen the number of solutions which propose a RAG element to also increase. *score: 0.844621*
 3. **RAG Options.docx** (RAG Options.docx): Databricks bridged that gap with Serverless and Unity Catalog, but kudos where kudos is warranted, Snowflake is still the optimum level of abstraction for many Data Warehouse folk coming from 20 years on legacy platforms. <b>Vectara’s goal is to be the Snowflake of RAG, being the equivalent for Application Builders.</b> Team Costs How many cooks does it take to get your RAG pipeline working … and to continue working. *score: 0.84389603*
 4. **RAG Options.docx** (RAG Options.docx): Amr would have been great representing District 12!! <b>Vectara Founded in 2019 which has been designed from the ground up to be a focused RAG solution for application builders with easy APIs.</b> The “Snowflake” of the category – purpose built to allow organisations to think about their IP in terms of Corpus with advanced features like ACLs and multi-corpus searches. *score: 0.83304876*
 5. **RAG Options.docx** (RAG Options.docx): Who would have thought. <b>But seriously though: Vectara’s RAG features are on another level and the focused delivery of a platform for application builders by application builders has resulted in an incredibly easy path to Generative AI solutions.</b> image1.png *score: 0.8089236*
 6. **RAG Options.docx** (RAG Options.docx): https://twitter.com/elonmusk/status/1626516035863212034 All that being said, they are the leaders in terms of General LLMs and have made moves towards RAG with their Assistant feature. <b>Azure Search This is a “build it yourself” RAG system – achievable but you own the most of the steps and leveraging Open AI’s features to their Cognitive Search.</b> I mean, at least they didn’t completely rip off one of their Open Source partners tech stacks and rebadge it as their own this time. *score: 0.7482908*
 7. **RAG Options.docx** (RAG Options.docx): Completeness In the realm of “Don’t make me think”, how much of each platform is done for our users? <b>Again Vectara is platform that “just works” whereas others require many of the additional steps to be both done and wired in.</b> Again, Vecata scores well here. *score: 0.74707174*
 8. **RAG Options.docx** (RAG Options.docx): If the onus is entirely on the client to perform their own evaluation, this scores poorly amongst “application builders” who just want something that works … and works well. <b>Though Vectara scored well (I mean, yes we’re a bit biased) we still didn’t put a 10 as the market vertical is still too new – maybe after another year of great results and expanded use of Boomerang we’ll self-report a 9.</b> Advanced RAG Features What additional features doe the vendor support specifically for RAG? *score: 0.74091953*
 9. **RAG Options.docx** (RAG Options.docx): <b>RAG Options What the thing is: Choosing a RAG in today’s market Options · Cohere · OpenAI · Azure Search · Google AI · LangChain · Llama Index · Databricks Summarise · Background Sentence · Who each tech is created for (App Builders | Data Engineers | DSs) · Cost model (scale to zero) · Abstraction Strengths/Weaknesses · Ease of Use (right abstraction etc) · Operational Cost · Trust · Model Coupling · Advanced Features (Hybrid) Conclusion Introduction Wow what a difference a day makes!!</b> When Vectara was founded the Retrieval Augmented Generation (RAG) was a very lonely space. *score: 0.71499735*
 10. **RAG Options.docx** (RAG Options.docx): Final Scores (recalculate with missing): 1. <b>First: Vectara (58) 2.</b> Fourth: Databricks (23.2) 5. *score: 0.6951151*
