# 05 - Prompt Engineering
We'll now exlore the possibilities with prompt engineering

## Lab Setup
We'll setup our lab and use the public reports from a few Australia public sector agencies for our corpus.


In [12]:
%pip install -q vectara-skunk-client==0.4.13

Note: you may need to restart the kernel to use updated packages.


In [13]:
from lab_setup import create_lab_corpus
corpus_id = create_lab_corpus("05-lab-prompt-engineering", quiet=True)

09:48:35 +1100 lab_setup            INFO:User prefix for lab: david
09:48:35 +1100 lab_setup            INFO:Setting up lab corpus with name [david-05-lab-prompt-engineering]
09:48:35 +1100 Factory              INFO:initializing builder
09:48:35 +1100 Factory              INFO:Factory will load configuration from home directory
09:48:35 +1100 root                 INFO:We are processing authentication type [OAuth2]
09:48:35 +1100 root                 INFO:initializing Client
09:48:43 +1100 AdminService         INFO:Created new corpus with 257
09:48:43 +1100 root                 INFO:New corpus created CreateCorpusResponse(corpusId=257, status=Status(code=<StatusCode.OK: 0>, statusDetail='Corpus Created', cause=None))
09:48:43 +1100 lab_setup            INFO:New lab created with id [257]


In [14]:
from pathlib import Path
from vectara_client.core import Factory

resources_dir = Path("./resources/05_prompt_engineering/vectara")
client = Factory().build()
indexer_service = client.indexer_service

for p in resources_dir.glob("*.pdf"):
    indexer_service.upload(corpus_id, p)

09:48:43 +1100 Factory              INFO:initializing builder
09:48:43 +1100 Factory              INFO:Factory will load configuration from home directory
09:48:43 +1100 root                 INFO:We are processing authentication type [OAuth2]
09:48:43 +1100 root                 INFO:initializing Client
09:48:43 +1100 IndexerService       INFO:Headers: {"c": "1623270172", "o": "257"}
Avoiding Hallucinations in LLMs.pdf: 3.31MB [00:05, 630kB/s]                                                         
09:48:50 +1100 IndexerService       INFO:Headers: {"c": "1623270172", "o": "257"}
RAG done right - part 1 - chunking.pdf: 4.69MB [00:06, 757kB/s]                                                      
09:48:56 +1100 IndexerService       INFO:Headers: {"c": "1623270172", "o": "257"}
RAG done right - part 2 - retrieval.pdf: 3.32MB [00:07, 466kB/s]                                                     


In [16]:
from vectara_client.core import Factory

client = Factory().build()

09:49:06 +1100 Factory              INFO:initializing builder
09:49:06 +1100 Factory              INFO:Factory will load configuration from home directory
09:49:06 +1100 root                 INFO:We are processing authentication type [OAuth2]
09:49:06 +1100 root                 INFO:initializing Client


## Summarization Using Default Prompt
The query below uses our default summarizer (vectara-summary-ext-v1.2.0 aka GPT 3.5) to return the results using RAG only.

In [17]:
from lab_setup import render_response

query = "Why is Vectara's platform better than other LLM options?"
qs = client.query_service
resp = qs.query(query, corpus_id)
render_response(query, resp, show_search_results=False)


# Query: Why is Vectara's platform better than other LLM options?

Vectara's platform stands out among other LLM options due to its focus on avoiding hallucinations in LLM-powered applications [1]. The platform utilizes Retrieval Augmented Generation (RAG) in a comprehensive and effective manner [2][3]. RAG is done right by Vectara, ensuring accurate chunking and retrieval [3]. These features contribute to the reliability and superiority of Vectara's platform compared to other LLM options.


## Summarization Using Default Prompt
We're now creating a custom prompt with the following preferences:

1. We want to tailor the style by setting `persona="Head of Investor Relations"`
2. We show a more concise answer instead of citing each result by setting `cite=False`
3. We will also allow _general_context_ to creep in from the base LLM by setting the `just_rag=False` . Whilst this may increase the risk of hallucinations, if done right it can bolster the results from the retrieval model.
4. We create a more concise result by setting `max_word_count=100`

In [18]:
from vectara_client.util import SimplePromptFactory
import logging

prompt_factory = SimplePromptFactory(persona="Head of Investor Relations", cite=False, max_word_count=100, just_rag=False)
prompt_text = prompt_factory.build()

logging.info(f"Here's our tailored \"promptText\" we'll be supplying on the query:\n\n{prompt_text}\n")

resp = qs.query(query, corpus_id, promptText=prompt_text)
render_response(query, resp, show_search_results=False)

09:49:14 +1100 root                 INFO:Here's our tailored "promptText" we'll be supplying on the query:

[ {"role": "system", "content": "You are a Head of Investor Relations who takes the search results and only return the most relevant answer. Do not iterate over each question, preferably based on the search results in this chat. You may allow additional information you know in the results. Respond in the language denoted by ISO 639 code \"$vectaraLangCode\"."}, 
#foreach ($qResult in $vectaraQueryResults) 
   #if ($foreach.first) 
   {"role": "user", "content": "Search for \"$esc.java(${vectaraQuery})\", and give me the first search result."}, 
   {"role": "assistant", "content": "$esc.java(${qResult.getText()})" }, 
   #else 
   {"role": "user", "content": "Give me the \"$vectaraIdxWord[$foreach.index]\" search result."}, 
   {"role": "assistant", "content": "$esc.java(${qResult.getText()})" }, 
   #end 
 #end 
{"role": "user", "content": "Generate a detailed answer (that is no 


# Query: Why is Vectara's platform better than other LLM options?

Vectara's platform stands out among other LLM options due to its effective approach in avoiding hallucinations in applications powered by LLM. The use of Retrieval Augmented Generation (RAG) done right ensures accurate and reliable results. Vectara's platform also incorporates chunking, which further enhances the quality of information retrieval. With a focus on preventing hallucinations and ensuring robust retrieval and generation, Vectara offers a superior LLM solution that provides reliable and trustworthy outcomes for its users.


## Persona: ELI5
Now lets respond as though we're doing it using "Explain Like I'm 5" language.

In [19]:
prompt_factory = SimplePromptFactory(persona="Explaning to someone in ELI5 language", cite=False, max_word_count=100, just_rag=False)
prompt_text = prompt_factory.build()
resp = qs.query(query, corpus_id, promptText=prompt_text)
render_response(query, resp, show_search_results=False)


# Query: Why is Vectara's platform better than other LLM options?

Vectara's platform stands out among other LLM options due to its ability to prevent hallucinations in LLM-powered applications. This is achieved through a retrieval augmented generation (RAG) approach, which ensures accurate and reliable results. Vectara's platform also excels in chunking, enhancing the retrieval process. With its focus on maintaining the fidelity of generated content, Vectara offers a superior solution compared to other LLM options. Its commitment to addressing the limitations of LLM technology sets it apart, making it a preferred choice for those seeking trustworthy and dependable results.


In [20]:
query = "Explain what RAG is?"
prompt_factory = SimplePromptFactory(persona="Explaning to someone in ELI5 language", cite=True, max_word_count=100, just_rag=True)
prompt_text = prompt_factory.build()
resp = qs.query(query, corpus_id, promptText=prompt_text)
render_response(query, resp, show_search_results=False)


# Query: Explain what RAG is?

RAG, short for Retrieval Augmented Generation, is a concept that involves combining retrieval-based and generation-based approaches in natural language processing. It aims to improve the quality and coherence of generated text by incorporating information from retrieved sources. In the context of RAG, chunking and retrieval are essential components for achieving better results in text generation[1][2]. The use of RAG helps avoid hallucinations and enhances the reliability of LLM-powered applications[3].


In [21]:
query = "What is steam power?"
prompt_factory = SimplePromptFactory(persona="Explaning to someone in ELI5 language", cite=True, max_word_count=100, just_rag=False)
prompt_text = prompt_factory.build()
resp = qs.query(query, corpus_id, promptText=prompt_text)
render_response(query, resp, show_search_results=False)


# Query: What is steam power?

Steam power refers to the use of steam as a source of energy to generate power. It is commonly used in various applications and industries. The first search result, "Avoiding hallucinations in LLM-powered Applications," does not directly provide information about steam power. However, based on the context, steam power is not related to this result. The second and third search results, "Retrieval Augmented Generation (RAG) Done Right_ Retrieval" and "Retrieval Augmented Generation (RAG) Done Right_ Chunking," also do not seem to be relevant to the query. Therefore, based on the search results in this chat, we couldn't find a specific answer regarding steam power.
