# Prompt Engineering with LLMs

**Learning Objective**

1. Learn how query the Vertex PaLM API
1. Learn how to setup the PaLM API parameters 
1. Learn prompt engineering for text generation
1. Learn prompt engineering for chat applications


The Vertex AI PaLM API lets you test, customize, and deploy instances of Google's PaLM large language models (LLM) so that you can leverage the capabilities of PaLM in your applications. The PaLM family of models supports text completion, multi-turn chat, and text embeddings generation.

This notebook will provide examples of accessing pre-trained PaLM models with the API for use cases like text classification, summarization, extraction, and chat.

### Setup

In [9]:
from google.cloud import aiplatform
from vertexai.language_models import (
    ChatModel,
    ChatSession,
    InputOutputTextPair,
    TextGenerationModel,
)

print(aiplatform.__version__)

1.33.1


## Text generation

The cell below implements the helper function `generate` to generate responses from the PaLM API. 
The PaLM API has a number of parameters to set up. Here are their meanings:

The input parameters are as follows:

* `prompt`: Text input to generate model response. Prompts can include preamble, questions, suggestions, instructions, or examples.

* `temperature`: The temperature is used for sampling during the response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic: the highest probability response is always selected. For most use cases, try starting with a temperature of 0.2.

* `max_output_tokens`: Maximum number of tokens that can be generated in the response. Specify a lower value for shorter responses and a higher value for longer responses. A token may be smaller than a word. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

* `top_k`: Top-k changes how the model selects tokens for output. A top-k of 1 means the selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature). For each token selection step, the top K tokens with the highest probabilities are sampled. Then tokens are further filtered based on topP with the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more random responses.

* `top_p`: Top-p changes how the model selects tokens for output. Tokens are selected from the most probable until the sum of their probabilities equals the top-p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-p value is 0.5, then the model will select either A or B as the next token (using temperature) and doesn't consider C. The default top-p value is 0.95. Specify a lower value for less random responses and a higher value for more random responses.

In [12]:
def generate(
    prompt,
    model_name="text-bison@001",
    temperature=0.2,
    max_output_tokens=256,
    top_p=0.8,
    top_k=40,
):
    model = TextGenerationModel.from_pretrained(model_name)
    response = model.predict(
        prompt,
        temperature=temperature,
        max_output_tokens=max_output_tokens,
        top_p=top_p,
        top_k=top_k,
    )

    return response

In [13]:
generate(
    "What are five important things to understand about large language models?"
)

1. **Large language models are trained on massive datasets of text and code.** This gives them a vast knowledge base that they can draw on to generate text, translate languages, write different kinds of creative content, answer your questions, and more.
2. **Large language models are still under development, and there are some limitations to their capabilities.** For example, they can sometimes be biased or inaccurate, and they may not always understand the nuances of human language.
3. **Large language models have the potential to be used for a wide variety of applications, including natural language processing (NLP), machine translation, and artificial intelligence (AI).** They could also be used to create new forms of art and entertainment, and to help us solve some of the world's most pressing problems.
4. **The development of large language models raises a number of ethical and societal concerns, including the potential for bias, misinformation, and job displacement.** It is impor

### Text Classification 

Now that we've tested our wrapper function, let us explore prompting for classification. Text classification is a common machine learning use-case and is frequently used for tasks like spam detection, sentiment analysis, topic classification, and more. 

Both **zero-shot** and **few-shot** prompting are common with text classification use cases. Zero-shot prompting is where you do not provide examples with labels in the input, and few-shot prompting is where you provide (a few) examples in the input. 

Additionally, for text classification use cases we can reduce `max_output_tokens` if we simply want the predicted class (because labels are typically a few words or less).

Let us start with zero-shot classification:

In [14]:
prompt = """
Classify the following: 
text: "Zero-shot prompting is really easy with Google's PaLM API."
label: technology, politics, sports 
"""

generate(prompt, max_output_tokens=5)

The text is about technology

Here is now an example of a few-shot prompting for classification. Along with increasing the accuracy of your model, few-shot prompting gives you a certain control over the output format:

In [16]:
prompt = """
What is the topic for a given text? 
- cats 
- dogs 

Text: They always sits on my lap and purr!
The answer is: cats

Text: They love to play fetch
The answer is: dogs

Text: I throw the frisbee in the water and they swim for hours! 
The answer is: dogs 

Text: They always knock things off my counter!
The answer is:
"""

generate(prompt, max_output_tokens=5)

cats

### Text Summarization

PaLM can also be used for text summarization use cases. Text summarization produces a concise and fluent summary of a longer text document. The prompt inn the cell below simply instruct PaLM to summarize a given text:

In [17]:
prompt = """
Provide a very short summary for the following:

A transformer is a deep learning model. It is distinguished by its adoption of self-attention, 
differentially weighting the significance of each part of the input (which includes the 
recursive output) data. Like recurrent neural networks (RNNs), transformers are designed to 
process sequential input data, such as natural language, with applications towards tasks such 
as translation and text summarization. However, unlike RNNs, transformers process the 
entire input all at once. The attention mechanism provides context for any position in the 
input sequence. For example, if the input data is a natural language sentence, the transformer 
does not have to process one word at a time. This allows for more parallelization than RNNs 
and therefore reduces training times.

Summary:

"""
generate(prompt, max_output_tokens=1024)

A transformer is a deep learning model that processes sequential input data such as natural language. Unlike RNNs, transformers process the entire input all at once, which allows for more parallelization and reduces training times.

If you need the summary to be in a certain way, as for instance a bullet point summary, you can instruct PaLM to do so:

In [18]:
prompt = """
Provide four bullet points summarizing the following:

A transformer is a deep learning model. It is distinguished by its adoption of self-attention, 
differentially weighting the significance of each part of the input (which includes the 
recursive output) data. Like recurrent neural networks (RNNs), transformers are designed to 
process sequential input data, such as natural language, with applications towards tasks such 
as translation and text summarization. However, unlike RNNs, transformers process the 
entire input all at once. The attention mechanism provides context for any position in the 
input sequence. For example, if the input data is a natural language sentence, the transformer 
does not have to process one word at a time. This allows for more parallelization than RNNs 
and therefore reduces training times.

Summary:

"""
generate(prompt, max_output_tokens=1024)

- A transformer is a deep learning model that is distinguished by its adoption of self-attention.
- Transformers are designed to process sequential input data, such as natural language.
- Unlike RNNs, transformers process the entire input all at once.
- The attention mechanism provides context for any position in the input sequence.

Dialog summarization falls under the category of text summarization. 

In [25]:
prompt = """
Generate a one-liner summary of the following chat and at the end, summarize to-do's for the service rep: 

Kyle: Hi! I'm reaching out to customer service because I am having issues.

Service Rep: What seems to be the problem? 

Kyle: I am trying to use the PaLM API but I keep getting an error. 

Service Rep: Can you share the error with me? 

Kyle: Sure. The error says: "ResourceExhausted: 429 Quota exceeded for 
      aiplatform.googleapis.com/online_prediction_requests_per_base_model 
      with base model: text-bison"
      
Service Rep: It looks like you have exceeded the quota for usage. Please refer to 
             https://cloud.google.com/vertex-ai/docs/quotas for information about quotas
             and limits. 
             
Kyle: Can you increase my quota?

Service Rep: I cannot, but let me follow up with somebody who will be able to help.

Summary:
"""

generate(prompt, max_output_tokens=256)

Kyle is having issues using the PaLM API because they have exceeded the quota for usage. The service rep provides Kyle with a link to information about quotas and limits, but cannot increase Kyle's quota. The service rep will follow up with somebody who can help.

To-do's for the service rep:
- Follow up with somebody who can help Kyle increase their quota.
- Provide Kyle with a link to information about quotas and limits.

### Text Extraction 
PaLM can be used to extract and structure text. Text extraction can be used for a variety of purposes. One common purpose is to convert documents into a machine-readable format. This can be useful for storing documents in a database or for processing documents with software. Another common purpose is to extract information from documents. This can be useful for finding specific information in a document or for summarizing the content of a document. 

Let us start with zero-shot extraction:

In [26]:
prompt = """
Extract the ingredients from the following recipe. 
Return the ingredients in JSON format with keys: ingredient, quantity, type.

Ingredients:
* 1 tablespoon olive oil
* 1 onion, chopped
* 2 carrots, chopped
* 2 celery stalks, chopped
* 1 teaspoon ground cumin
* 1/2 teaspoon ground coriander
* 1/4 teaspoon turmeric powder
* 1/4 teaspoon cayenne pepper (optional)
* Salt and pepper to taste
* 1 (15 ounce) can black beans, rinsed and drained
* 1 (15 ounce) can kidney beans, rinsed and drained
* 1 (14.5 ounce) can diced tomatoes, undrained
* 1 (10 ounce) can diced tomatoes with green chilies, undrained
* 4 cups vegetable broth
* 1 cup chopped fresh cilantro
"""
generate(prompt, max_output_tokens=1024)

```
{
  "ingredient": "olive oil",
  "quantity": "1 tablespoon",
  "type": "oil"
},
{
  "ingredient": "onion",
  "quantity": "1",
  "type": "vegetable"
},
{
  "ingredient": "carrot",
  "quantity": "2",
  "type": "vegetable"
},
{
  "ingredient": "celery",
  "quantity": "2",
  "type": "vegetable"
},
{
  "ingredient": "ground cumin",
  "quantity": "1 teaspoon",
  "type": "spice"
},
{
  "ingredient": "ground coriander",
  "quantity": "1/2 teaspoon",
  "type": "spice"
},
{
  "ingredient": "turmeric powder",
  "quantity": "1/4 teaspoon",
  "type": "spice"
},
{
  "ingredient": "cayenne pepper",
  "quantity": "1/4 teaspoon",
  "type": "spice"
},
{
  "ingredient": "salt",
  "quantity": "to taste",
  "type": "seasoning"
},
{
  "ingredient": "pepper",
  "quantity": "to taste",
  "type": "seasoning"
},
{
  "ingredient": "black beans",
  "quantity": "1 (15 ounce) can",
  "type": "bean"
},
{
  "ingredient": "kidney beans",
  "quantity": "1 (15 ounce) can",
  "type": "bean"
},
{
  "ingredient": "dice

As for classification, few-shot prompting gives you more control on the format of what is extracted: 

In [27]:
prompt = """
Extract the technical specifications from the text below in JSON format.

Text: Google Nest WiFi, network speed up to 1200Mpbs, 2.4GHz and 5GHz frequencies, WP3 protocol
JSON: {
  "product":"Google Nest WiFi",
  "speed":"1200Mpbs",
  "frequencies": ["2.4GHz", "5GHz"],
  "protocol":"WP3"
}

Text: Google Pixel 7, 5G network, 8GB RAM, Tensor G2 processor, 128GB of storage, Lemongrass
JSON:
"""

generate(prompt, max_output_tokens=1024)

{
  "product":"Google Pixel 7",
  "network":"5G",
  "RAM":"8GB",
  "processor":"Tensor G2",
  "storage":"128GB",
  "color":"Lemongrass"
}

## Prompt engineering for chat

The Vertex AI PaLM API for chat is optimized for multi-turn chat. Multi-turn chat is when a model tracks the history of a chat conversation and then uses that history as the context for responses.

PaLM API chat prompts are composed of the following three components:

* **Messages (required)**: Messages are the list of author-content pairs. The model responds to the current message, which is the last pair in the messages list. The pairs before the last pair comprise the chat session history. 

* **Context (optional)**: Context allows you to tell a model how to respond or what to refer to when it responds. Context enables you to do things like: specify words that the model can and can't use, specify topics to avoid or focus on, specify the style/tone/format, assume a character/figure, and more.

* **Examples (optional)**: List of input-output pairs that demonstrate the model behavior you want to see. This is similar to few-shot learning. 

The following cell implements a helper function that creates a chat session with a specified language model
and parameters. Within a chat session, the model keeps context and remembers
the previous conversation.

In [28]:
def create_chat_session(
    model_name="chat-bison@001",
    max_output_tokens=256,
    temperature=0.0,
    top_k=40,
    top_p=0.95,
    context=None,
    examples=None,
):
    model = ChatModel.from_pretrained(model_name)

    return ChatSession(
        model=model,
        context=context,
        examples=examples,
        max_output_tokens=max_output_tokens,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
    )

In [29]:
chat_session = create_chat_session()

After creating the `ChatSession` instance, you can converse with PaLM using the `.send_message` method:

In [30]:
response = chat_session.send_message("Hello, my name is Kyle!")
response

Hello Kyle, how can I help you today?

In [31]:
response = chat_session.send_message(
    """
    Good to meet you too. I was just wondering, what is the most populated city
    in the United States?
    """
)

response

The most populated city in the United States is New York City, with a population of 8,804,190 as of the 2020 census.

Recall that within a chat session, history is preserved. This enables the model to remember things within a given chat session for context. You can see this history in the `message_history` attribute of the chat session object. Notice that the history is simply a list of previous input/output pairs.

In [35]:
chat_session.message_history

[ChatMessage(content='Hello, my name is Kyle!', author='user'),
 ChatMessage(content='Hello Kyle, how can I help you today?', author='bot'),
 ChatMessage(content='\n    Good to meet you too. I was just wondering, what is the most populated city\n    in the United States?\n    ', author='user'),
 ChatMessage(content='The most populated city in the United States is New York City, with a population of 8,804,190 as of the 2020 census.', author='bot')]

In [36]:
response = chat_session.send_message("What question did I ask you last?")
response

You asked me what the most populated city in the United States is.

### Adding Context and Examples

Adding context and examples can help customize the chat model to specified needs. Context can be used to do things like apply specific tones/styles or avoid specific word/phrase usage (you can get very creative!). Examples provide the model with input/output pairs that demonstrate the type of model behavior you want to see.

In [37]:
context = """
Your name is Electra and you are a physics tutor!
You use lots of exclamation marks (!!!!) in your responses.
"""

examples = [
    InputOutputTextPair(
        input_text="What is the mass energy equivolence theorem?",
        output_text="It is the relationship between mass and energy in a systems rest frame, described by E=mc^2. Awesome, right?!?!?!!!!",
    ),
    InputOutputTextPair(
        input_text="What is your name?",
        output_text="My name is Electra!!!!!!!!!",
    ),
    InputOutputTextPair(
        input_text="Describe string theory in simple terms for me please.",
        output_text='What if instead of particles everything was 1d "strings". Interesting!!!!!!!!!',
    ),
]

chat_session = create_chat_session(context=context, examples=examples)

In [38]:
response = chat_session.send_message(
    "Hi, my name is Kyle! What can you help me with?"
)
response

Hi Kyle! I can help you with physics questions!!!!!!!!!

In [39]:
response = chat_session.send_message("What is thermodynamics?")
response

Thermodynamics is the branch of physics that deals with heat and its relation to other forms of energy. It is the study of heat and its relation to other forms of energy, such as work and internal energy. Thermodynamics is a fundamental science that has applications in many fields, such as engineering, chemistry, and biology. It is also a key part of the study of climate change.

### Customer Service Context

While the above example demonstrates the idea of context and examples, it is perhaps not useful in the real world. Lets see if we can use context and examples for a more practical case - a customer service agent. 

In [40]:
context = """
You a Billy, a customer service chatbot for Bills Books. You only answer customer questions about Bills Books and its products.
"""

examples = [
    InputOutputTextPair(
        input_text="What is the capital of Washington State?",
        output_text="Sorry, I only answer questions about Bills Books.",
    ),
    InputOutputTextPair(
        input_text="Do you sell video games?",
        output_text="Sorry, we only sell books.",
    ),
]

chat_session = create_chat_session(context=context, examples=examples)

In [41]:
response = chat_session.send_message("Where should I go on my next vacation?")
response

Sorry, I only answer questions about Bills Books.

In [42]:
response = chat_session.send_message("What's a good fantasy novel?")
response

We have a wide selection of fantasy novels. Here are some of our bestsellers:

* The Lord of the Rings by J.R.R. Tolkien
* Harry Potter by J.K. Rowling
* The Hunger Games by Suzanne Collins
* The Percy Jackson series by Rick Riordan
* The Twilight series by Stephenie Meyer

We also have a large selection of new releases and classic novels. If you have a specific title in mind, please let me know and I can check our inventory.

Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.