# Summarise in Arabic

We'll now continue the example with a recipe from the Vectara website where we search our employee handbook.

See here for the REST version of this: https://docs.vectara.com/docs/api-recipes

In [5]:
%pip install -q vectara-skunk-client==0.2.0

Note: you may need to restart the kernel to use updated packages.


In [6]:
import logging

logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S %z')

In [7]:
from vectara.core import Factory

client = Factory().build()

17:29:39 +1100 INFO:initializing builder
17:29:39 +1100 INFO:Factory will load configuration from home directory
17:29:39 +1100 INFO:Loading configuration from users home directory [C:\Users\david]
17:29:39 +1100 INFO:Loading default configuration [default]
17:29:39 +1100 INFO:Parsing config
17:29:39 +1100 INFO:We are processing authentication type [OAuth2]
17:29:39 +1100 INFO:Using provided OAuth2 URL [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
17:29:39 +1100 INFO:OAuth2 URL is [https://vectara-prod-1623270172.auth.us-west-2.amazoncognito.com/oauth2/token]
17:29:39 +1100 INFO:initializing Client


In [8]:
admin_service = client.admin_service

corpora = admin_service.list_corpora("01-diff-language-output")

if len(corpora) >= 1:
    for corpus in corpora:
        admin_service.delete_corpus(corpus.id)
else:
    logging.info("No existing corpus with the name client-example")

17:29:59 +1100 INFO:Current timestamp 2023-12-28 17:29:59.973323
17:29:59 +1100 INFO:First time requesting token, authenticating
17:30:02 +1100 INFO:Received OAuth token, will expire [12/28/2023, 18:30:02]
17:30:02 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748602]
17:30:02 +1100 INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
17:30:04 +1100 INFO:No existing corpus with the name client-example


## Create Our Demo Corpus
We now create our new corpus called, "02-employee-handbook" where we'll upload our documents for this test.

In [9]:
create_corpus_result = admin_service.create_corpus("01-diff-language-output", description="Example Corpus to test different language output")
logging.info(f"New corpus created {create_corpus_result}")
corpus_id = create_corpus_result.corpusId

17:30:30 +1100 INFO:Current timestamp 2023-12-28 17:30:30.119009
17:30:30 +1100 INFO:Expiry            2023-12-28 18:30:02
17:30:30 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748602]
17:30:30 +1100 INFO:URL for operation create-corpus is: https://api.vectara.io/v1/create-corpus
17:30:33 +1100 INFO:Created new corpus with 113
17:30:33 +1100 INFO:New corpus created CreateCorpusResponse(corpusId=113, status=Status(code=<StatusCode.OK: 0>, statusDetail='Corpus Created'))


## Upload Handbook to Corpus
We now upload our PDF to the corpus which will parse and then encode the document, storing the embeddings in our vector database.

In [11]:
from pathlib import Path

handbook_path = Path("../resources/vectara_employee_handbook.pdf")
indexer_service = client.indexer_service
indexer_service.upload(corpus_id, handbook_path)

17:30:49 +1100 INFO:Headers: {"c": "1623270172", "o": "113"}
17:30:49 +1100 INFO:Current timestamp 2023-12-28 17:30:49.342964
17:30:49 +1100 INFO:Expiry            2023-12-28 18:30:02
17:30:49 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748602]
vectara_employee_handbook.pdf: 52.7kB [00:05, 10.3kB/s]                                                              


UploadDocumentResponse(response=UploadDocumentResponseInner(status=None, quotaConsumed=StorageQuota(numChars='9215', numMetadataChars='4903')), document=None)

In [17]:
query_service = client.query_service
response = query_service.query("Can I bring any birds to the Vectara office?", corpus_id, response_lang="ar")

17:34:04 +1100 INFO:Response Language set to [ar]
17:34:04 +1100 INFO:Current timestamp 2023-12-28 17:34:04.900158
17:34:04 +1100 INFO:Expiry            2023-12-28 18:30:02
17:34:04 +1100 INFO:Already authenticated with non-expired token, expiry is [1703748602]
17:34:04 +1100 INFO:URL for operation query is: https://api.vectara.io/v1/query


Query is:
{
    "query": [
        {
            "query": "Can I bring any birds to the Vectara office?",
            "numResults": 10,
            "corpusKey": [
                {
                    "customerId": 1623270172,
                    "corpusId": 113
                }
            ],
            "summary": [
                {
                    "summarizerPromptName": "vectara-summary-ext-v1.2.0",
                    "responseLang": "ar",
                    "maxSummarizedResults": 5
                }
            ]
        }
    ]
}



In [20]:
print(response.summary[0].text)

نأسف لإبلاغك بأنّ الحيوانات الأليفة الشائعة مثل القطط والكلاب غير مسموح بها في مكاتب فكتارا[1]. ومع ذلك، تعتقد فكتارا في التزامها بحماية الحياة البرية، وترحب بوجود الطيور في مكان العمل لتجسيد روح الارتفاع إلى مستويات جديدة[2]. وتعمل فكتارا على دعم الحفاظ على الحياة البرية[3]. علاوة على ذلك، يوجد حديقة صغيرة في فكتارا تضم مجموعة من الحيوانات الاستثنائية[5].


In [19]:
from dataclasses import asdict
from vectara.util import _custom_asdict_factory
import json

print(json.dumps(asdict(response, dict_factory=_custom_asdict_factory),indent=4))

{
    "response": [
        {
            "text": "We regret to inform you that common household pets such as cats and dogs are not allowed on\nthe Vectara campuses.",
            "score": 0.73535573,
            "metadata": [
                {
                    "name": "lang",
                    "value": "eng"
                },
                {
                    "name": "section",
                    "value": "4"
                },
                {
                    "name": "offset",
                    "value": "0"
                },
                {
                    "name": "len",
                    "value": "113"
                }
            ],
            "corpusKey": {
                "corpusId": 113,
                "customerId": 0,
                "semantics": "DEFAULT",
                "dim": [],
                "metadataFilter": "",
                "lexicalInterpolationConfig": null
            },
            "resultOffset": 0
        },
        {
            