# Vertex AI PaLM API Introduction
The Vertex AI PaLM API lets you test, customize, and deploy instances of Google's PaLM large language models (LLM) so that you can leverage the capabilities of PaLM in your applications. The PaLM family of models supports text completion, multi-turn chat, and text embeddings generation.

This notebook will provide examples of accessing pre-trained PaLM models with the API for use cases like text classification, summarization, and extraction.

### Setup

In [1]:
from google.cloud import aiplatform  # requires >=1.25.0
from vertexai.language_models import TextGenerationModel

print(aiplatform.__version__)

1.26.0


### Input Parameters
Below we specify a helper function to generate responses from the PaLM API. The input parameters are as follows:
* `prompt`: Text input to generate model response. Prompts can include preamble, questions, suggestions, instructions, or examples.
* `temperature`: The temperature is used for sampling during the response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic: the highest probability response is always selected. For most use cases, try starting with a temperature of 0.2.
* `max_output_tokens`: Maximum number of tokens that can be generated in the response. Specify a lower value for shorter responses and a higher value for longer responses. A token may be smaller than a word. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.
* `top_k`: Top-k changes how the model selects tokens for output. A top-k of 1 means the selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature). For each token selection step, the top K tokens with the highest probabilities are sampled. Then tokens are further filtered based on topP with the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more random responses.
* `top_p`: Top-p changes how the model selects tokens for output. Tokens are selected from most K (see topK parameter) probable to least until the sum of their probabilities equals the top-p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-p value is 0.5, then the model will select either A or B as the next token (using temperature) and doesn't consider C. The default top-p value is 0.95. Specify a lower value for less random responses and a higher value for more random responses.

In [28]:
def generate(
    prompt: str,
    model_name: str = "text-bison@001",
    temperature: float = 0.2,
    max_output_tokens: int = 256,
    top_p: float = 0.8,
    top_k: int = 40,
):
    model = TextGenerationModel.from_pretrained(model_name)
    response = model.predict(
        prompt,
        temperature=temperature,
        max_output_tokens=max_output_tokens,
        top_p=top_p,
        top_k=top_k,
    )

    return response

Test out the helper function.

In [29]:
generate(
    "What are five important things to understand about large language models?"
)

1. **Large language models are trained on massive datasets of text and code.** This gives them a vast knowledge base that they can draw on to generate text, translate languages, write different kinds of creative content, answer your questions, and more.
2. **Large language models are still under development, and they have some limitations.** For example, they can sometimes be biased or inaccurate, and they may not always understand the nuances of human language.
3. **Large language models are being used in a variety of applications,** such as customer service, content creation, and medical research. As they continue to develop, we can expect to see even more uses for these powerful tools.
4. **The development of large language models raises some important ethical and societal issues.** For example, it's important to consider how these models can be used to amplify misinformation and hate speech.
5. **The future of large language models is uncertain, but it's clear that they have the po

### Text Classification 
Now that we've tested our wrapper function, explore prompting for classification. Text classification is a common ML use-case and is frequently used for tasks like spam detection, sentiment analysis, topic classification, and more. 

Both zero-shot and few-shot prompting are common with text classification use cases. Zero-shot prompting is where you do not provide examples with labels in the input, and few-shot prompting is where you provide (a few) examples in the input. 

Additionally, for text classification use cases we can reduce max_output_tokens if we simply want the predicted class (because labels are typically a few words or less).

Zero-shot classification 

In [15]:
prompt = """
Classify the following: 
text: "Zero-shot prompting is really easy with Google's PaLM API."
label: technology, politics, sports 
"""

generate(prompt, max_output_tokens=5)

The text is about technology

Few-shot classification

In [9]:
prompt = """
What is the topic for a given text? 
- cats 
- dogs 

Text: They always sits on my lap and purr!
The answer is: cats

Text: They love to play fetch
The answer is: dogs

Text: I throw the frisbee in the water and they swim for hours! 
The answer is: dogs 

Text: They always knock things off my counter!
The answer is:
"""

generate(prompt, max_output_tokens=5)

cats

### Text Summarization 
PaLM can also be used for text summarization use cases. Text summarization produces a concise and fluent summary of a longer text document. There are two main text summarization types: extractive and abstractive. Extractive summarization involves selecting critical sentences from the original text and combining them to form a summary. Abstractive summarization involves generating new sentences representing the original text's main points.

In [17]:
prompt = """
Provide a very short summary for the following:

A transformer is a deep learning model. It is distinguished by its adoption of self-attention, 
differentially weighting the significance of each part of the input (which includes the 
recursive output) data. Like recurrent neural networks (RNNs), transformers are designed to 
process sequential input data, such as natural language, with applications towards tasks such 
as translation and text summarization. However, unlike RNNs, transformers process the 
entire input all at once. The attention mechanism provides context for any position in the 
input sequence. For example, if the input data is a natural language sentence, the transformer 
does not have to process one word at a time. This allows for more parallelization than RNNs 
and therefore reduces training times.

Summary:

"""
generate(prompt, max_output_tokens=1024)

A transformer is a deep learning model that processes sequential input data such as natural language. Unlike RNNs, transformers process the entire input at once, which allows for more parallelization and reduces training times.

Slightly change the input prompt to see the impact it can have on the output. 

In [18]:
prompt = """
Provide four bullet points summarizing the following:

A transformer is a deep learning model. It is distinguished by its adoption of self-attention, 
differentially weighting the significance of each part of the input (which includes the 
recursive output) data. Like recurrent neural networks (RNNs), transformers are designed to 
process sequential input data, such as natural language, with applications towards tasks such 
as translation and text summarization. However, unlike RNNs, transformers process the 
entire input all at once. The attention mechanism provides context for any position in the 
input sequence. For example, if the input data is a natural language sentence, the transformer 
does not have to process one word at a time. This allows for more parallelization than RNNs 
and therefore reduces training times.

Summary:

"""
generate(prompt, max_output_tokens=1024)

- A transformer is a deep learning model that is distinguished by its adoption of self-attention.
- Transformers are designed to process sequential input data, such as natural language.
- Unlike RNNs, transformers process the entire input all at once.
- The attention mechanism provides context for any position in the input sequence.

Dialog summarization falls under the category of text summarization. 

In [25]:
prompt = """
Generate a summary of the following converstaion and at the end, summarize to-do's for the service rep: 

Kyle: Hi! I'm reaching out to customer service because I am having issues.

Service Rep: What seems to be the problem? 

Kyle: I am trying to use the PaLM API but I keep getting an error. 

Service Rep: Can you share the error with me? 

Kyle: Sure. The error says: "ResourceExhausted: 429 Quota exceeded for 
      aiplatform.googleapis.com/online_prediction_requests_per_base_model 
      with base model: text-bison"
      
Service Rep: It looks like you have exceeded the quota for usage. Please refer to 
             https://cloud.google.com/vertex-ai/docs/quotas for information about quotas
             and limits. 
             
Kyle: Can you increase my quota?

Service Rep: I cannot, but let me follow up with somebody who will be able to help.

Summary:
"""

generate(prompt, max_output_tokens=256)

Kyle is having issues using the PaLM API and keeps getting an error. The error says: "ResourceExhausted: 429 Quota exceeded for 
      aiplatform.googleapis.com/online_prediction_requests_per_base_model 
      with base model: text-bison"
The service rep informs Kyle that he has exceeded the quota for usage and provides a link to information about quotas and limits. Kyle asks if the service rep can increase his quota, but the service rep says that he cannot but will follow up with somebody who will be able to help.
To-do's for the service rep:
- Follow up with somebody who can increase Kyle's quota.
- Provide Kyle with more information about quotas and limits.

### Text Extraction 
PaLM can be used to extract and structure text. Text extraction can be used for a variety of purposes. One common purpose is to convert documents into a machine-readable format. This can be useful for storing documents in a database or for processing documents with software. Another common purpose is to extract information from documents. This can be useful for finding specific information in a document or for summarizing the content of a document. 

Zero-shot extraction

In [31]:
prompt = """
Extract the ingredients from the following recipe. 
Return the ingredients in JSON format with keys: ingredient, quantity, type.

Ingredients:
* 1 tablespoon olive oil
* 1 onion, chopped
* 2 carrots, chopped
* 2 celery stalks, chopped
* 1 teaspoon ground cumin
* 1/2 teaspoon ground coriander
* 1/4 teaspoon turmeric powder
* 1/4 teaspoon cayenne pepper (optional)
* Salt and pepper to taste
* 1 (15 ounce) can black beans, rinsed and drained
* 1 (15 ounce) can kidney beans, rinsed and drained
* 1 (14.5 ounce) can diced tomatoes, undrained
* 1 (10 ounce) can diced tomatoes with green chilies, undrained
* 4 cups vegetable broth
* 1 cup chopped fresh cilantro
"""
generate(prompt, max_output_tokens=1024)

```
{
  "ingredient": "olive oil",
  "quantity": "1 tablespoon",
  "type": "oil"
},
{
  "ingredient": "onion",
  "quantity": "1",
  "type": "vegetable"
},
{
  "ingredient": "carrot",
  "quantity": "2",
  "type": "vegetable"
},
{
  "ingredient": "celery",
  "quantity": "2",
  "type": "vegetable"
},
{
  "ingredient": "ground cumin",
  "quantity": "1 teaspoon",
  "type": "spice"
},
{
  "ingredient": "ground coriander",
  "quantity": "1/2 teaspoon",
  "type": "spice"
},
{
  "ingredient": "turmeric powder",
  "quantity": "1/4 teaspoon",
  "type": "spice"
},
{
  "ingredient": "cayenne pepper",
  "quantity": "1/4 teaspoon",
  "type": "spice"
},
{
  "ingredient": "salt",
  "quantity": "to taste",
  "type": "seasoning"
},
{
  "ingredient": "pepper",
  "quantity": "to taste",
  "type": "seasoning"
},
{
  "ingredient": "black beans",
  "quantity": "1 (15 ounce) can",
  "type": "bean"
},
{
  "ingredient": "kidney beans",
  "quantity": "1 (15 ounce) can",
  "type": "bean"
},
{
  "ingredient": "dice

Few (single) shot extraction

In [30]:
prompt = """
Extract the technical specifications from the text below in JSON format.

Text: Google Nest WiFi, network speed up to 1200Mpbs, 2.4GHz and 5GHz frequencies, WP3 protocol
JSON: {
  "product":"Google Nest WiFi",
  "speed":"1200Mpbs",
  "frequencies": ["2.4GHz", "5GHz"],
  "protocol":"WP3"
}

Text: Google Pixel 7, 5G network, 8GB RAM, Tensor G2 processor, 128GB of storage, Lemongrass
JSON:
"""

generate(prompt, max_output_tokens=1024)

{
  "product":"Google Pixel 7",
  "network":"5G",
  "RAM":"8GB",
  "processor":"Tensor G2",
  "storage":"128GB",
  "color":"Lemongrass"
}

You have now seen the PaLM API used for text classification, summarization, and extraction. The PaLM API also supports [fine-tuning](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models), [chatbots](https://cloud.google.com/vertex-ai/docs/generative-ai/chat/chat-prompts), [code generation](https://cloud.google.com/vertex-ai/docs/generative-ai/code/code-chat-prompts), and [more](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview)!