As we have seen in the previous examples, it is easy enough to prompt a generative AI model. Shoot off an API call, and suddently you have an answer, a machine translation, sentiment analyzed, or a chat message generated. However, going from "prompting" to **ai engineering** of your AI model based processes is a bit more involved. The importance of the "engineering" in prompt engineering has become increasingly apparent, as models have become more complex and powerful, and the demand for more accurate and interpretable results has grown.

The ability to engineer effective prompts and related workflows allows us to configure and tune model responses to better suit our specific needs (e.g., for a particular industry like healthcare), whether we are trying to improve the quality of the output, reduce bias, or optimize for efficiency.

# Dependencies and imports

In [None]:
! pip install predictionguard langchain

In [None]:
import os
import json

import predictionguard as pg
from langchain import PromptTemplate
from langchain import PromptTemplate, FewShotPromptTemplate
import numpy as np
from getpass import getpass

In [None]:
pg_access_token = getpass('Enter your Prediction Guard access token: ')
os.environ['PREDICTIONGUARD_TOKEN'] = pg_access_token

# Prompt Templates

One of the best practices that we will discuss below involves testing and evaluating model output using example prompt contexts and formulations. In order to institute this practice, we need a way to rapidly and programmatically format prompts with a variety of contexts. We will need this in our applications anyway, because in production we will be receiving dynamic input from the user or another application. That dynamic input (or something extracted from it) will be inserted into our prompts on-the-fly. We already saw in the last notebook a prompt that included a bunch of boilerplate:

## Zero shot Q&A

In [None]:
template = """### Instruction:
Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

### Input:
Context: {context}

Question: {question}

### Response:
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [None]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."

question = "How are gift cards delivered?"

myprompt = prompt.format(context=context, question=question)
print(myprompt)

## Few Shot - Sentiment

This kind of prompt template could in theory be flexible to create zero shot or few shot prompts. However, LangChain provides a bit more convenience for few shot prompts. We can first create a template for individual demonstrations within the few shot prompt:

In [None]:
# Create a string formatter for sentiment analysis demonstrations.
demo_formatter_template = """
Text: {text}
Sentiment: {sentiment}
"""

# Define a prompt template for the demonstrations.
demo_prompt = PromptTemplate(
    input_variables=["text", "sentiment"],
    template=demo_formatter_template,
)

In [None]:
# Each row here includes:
# 1. an example text input (that we want to analyze for sentiment)
# 2. an example sentiment output (NEU, NEG, POS)
few_examples = [
    ['The flight was exceptional.', 'POS'],
    ['That pilot is adorable.', 'POS'],
    ['This was an awful seat.', 'NEG'],
    ['This pilot was brilliant.', 'POS'],
    ['I saw the aircraft.', 'NEU'],
    ['That food was exceptional.', 'POS'],
    ['That was a private aircraft.', 'NEU'],
    ['This is an unhappy pilot.', 'NEG'],
    ['The staff is rough.', 'NEG'],
    ['This staff is Australian.', 'NEU']
]
examples = []
for ex in few_examples:
  examples.append({
      "text": ex[0],
      "sentiment": ex[1]
  })

In [None]:
few_shot_prompt = FewShotPromptTemplate(

    # This is the demonstration data we want to insert into the prompt.
    examples=examples,
    example_prompt=demo_prompt,
    example_separator="",

    # This is the boilerplate portion of the prompt corresponding to
    # the prompt task instructions.
    prefix="Classify the sentiment of the text. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.\n",

    # The suffix of the prompt is where we will put the output indicator
    # and define where the "on-the-fly" user input would go.
    suffix="\nText: {input}\nSentiment:",
    input_variables=["input"],
)

myprompt = few_shot_prompt.format(input="The flight is boring.")
print(myprompt)

## Few Shot - Text Classification

In [None]:
demo_formatter_template = """\nText: {text}
Categories: {categories}
Class: {class}\n"""

# Define a prompt template for the demonstrations.
demo_prompt = PromptTemplate(
    input_variables=["text", "categories", "class"],
    template=demo_formatter_template,
)

# Each row here includes:
# 1. an example set of categories for the text classification
# 2. an example text that we want to classify
# 3. an example label that we expect as the output
few_examples = [
    ["I have successfully booked your tickets.", "agent, customer", "agent"],
    ["What's the oldest building in US?", "quantity, location", "location"],
    ["This video game is amazing. I love it!", "positive, negative", ""],
    ["Dune is the best movie ever.", "cinema, art, music", "cinema"]
]
examples = []
for ex in few_examples:
  examples.append({
      "text": ex[0],
      "categories": ex[1],
      "class": ex[2]
  })

few_shot_prompt = FewShotPromptTemplate(

    # This is the demonstration data we want to insert into the prompt.
    examples=examples,
    example_prompt=demo_prompt,
    example_separator="",

    # This is the boilerplate portion of the prompt corresponding to
    # the prompt task instructions.
    prefix="Classify the following texts into one of the given categories. Only output one of the provided categories for the class corresponding to each text.",

    # The suffix of the prompt is where we will put the output indicator
    # and define where the "on-the-fly" user input would go.
    suffix="\nText: {text}\nCategories: {categories}\n",
    input_variables=["text", "categories"],
)

myprompt = few_shot_prompt.format(
    text="I have a problem with my iphone that needs to be resolved asap!",
    categories="urgent, not urgent")
print(myprompt)

In [None]:
pg.Completion.create(model="Nous-Hermes-Llama2-13B",
    prompt=myprompt
)['choices'][0]['text']

# Parameters

Although we have most sent a single text prompt the models to get a response. There is configurability via parameters such as `temperature` and `max_tokens`. Optimizing model parameters can help us achieve a desired output.

## Temperature

In [None]:
for temp in np.arange(0.1, 2.0, 0.4):
  print("\nTemperature: ", temp)
  print("----------------------------")
  for i in range(0,3):
    completion = pg.Completion.create(
        model="Camel-5B",
        prompt="A great name for a unknown wizard (other than Gandalf and Radagast) from the Lord of the Rings universe is ",
        temperature=temp,
        max_tokens=20
    )['choices'][0]['text'].strip()
    print(completion)

## Max Tokens

In [None]:
for tokens in range(30, 200, 80):
  print("\nMax Tokens: ", tokens)
  print("----------------------------")
  completion = pg.Completion.create(
    	model="Camel-5B",
    	prompt="Merothooda the White Diviner is a great wizard from the Lord of the Rings. Many stories are told about her. For example, some say",
    	temperature=0.8,
    	max_tokens=tokens
	)['choices'][0]['text'].strip()
  print(completion)

# Multiple formulations

Why settle for a single prompt and/or set of parameters when you can use mutliple. Try using multiple formulations of your prompt to either:

1. Provide multiple options to users; or
2. Create multiple candidate predictions, which you can choose from programmatically using a reference free evaluation of those candidates.

In [None]:
template1 = """### Instruction:
Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

### Input:
Context: {context}

Question: {question}

### Response:
"""

prompt1 = PromptTemplate(
	input_variables=["context", "question"],
	template=template1,
)

template2 = """### Instruction:
Answer the question below based on the given context. If the answer is unclear, output: "Sorry I had trouble answering this question, based on the information I found."

### Input:
Context: {context}
Question: {question}

### Response:
"""

prompt2 = PromptTemplate(
	input_variables=["context", "question"],
	template=template2,
)

In [None]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."
question = "How are gift cards delivered?"

completions = pg.Completion.create(
    	model="Nous-Hermes-Llama2-13B",
    	prompt=[
        	prompt1.format(context=context, question=question),
        	prompt2.format(context=context, question=question)
    	],
    	temperature=0.5
	)

for i in [0,1]:
  print("Answer", str(i+1) + ": ", completions['choices'][i]['text'].strip())

# Type checking, output formatting, validation

Reliability and consistency in LLM output is a major problem for the "last mile" of LLM integrations. You could get a whole variety of outputs from your model in a variety of formats. An increasing number of tools, including [Prediction Guard](https://www.predictionguard.com/), allow you to force a certain task structure, validation of outputs or output type checking on your inferences. Other examples of packages or tools that help "guide" or "guard" outputs include [Guardrails](https://shreyar.github.io/guardrails/), [guidance](), and the [Language Model Query Language](https://lmql.ai/).

In [None]:
pg.Completion.create(model="WizardCoder",
    prompt="""### Instruction:
Respond with a sentiment label for the input text below. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.

### Input:
This workshop is spectacular. I love it! So wonderful.

### Response:
""",
    output={
        "type": "categorical",
        "categories": ["POS", "NEU", "NEG"]
    }
)

# Consistency (self-consistency)

In [None]:
pg.Completion.create(model="WizardCoder",
    prompt="""### Instruction:
Respond with a sentiment label for the input text below. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.

### Input:
This workshop is spectacular. I love it! So wonderful.

### Response:
""",
    output={
        "type": "categorical",
        "categories": ["POS", "NEU", "NEG"],
        "consistency": True
    }
)

In [None]:
pg.Completion.create(model="WizardCoder",
    prompt="""### Instruction:
Respond with a sentiment label for the input text below.

### Input:
This workshop is spectacular. I love it! So wonderful.

### Response:
""",
    output={
        "type": "categorical",
        "categories": ["dog", "cat", "bird"],
        "consistency": True
    }
)

# Factuality

In [None]:
template = """### Instruction:
Read the context below and respond with an answer to the question.

### Input:
Context: {context}

Question: {question}

### Response:
"""

prompt = PromptTemplate(
	input_variables=["context", "question"],
	template=template,
)

In [None]:
context = "California is a state in the Western United States. With over 38.9 million residents across a total area of approximately 163,696 square miles (423,970 km2), it is the most populous U.S. state, the third-largest U.S. state by area, and the most populated subnational entity in North America. California borders Oregon to the north, Nevada and Arizona to the east, and the Mexican state of Baja California to the south; it has a coastline along the Pacific Ocean to the west. "

In [None]:
result = pg.Completion.create(
    model="Nous-Hermes-Llama2-13B",
    prompt=prompt.format(
        context=context,
        question="What is California?"
    )
)

fact_score = pg.Factuality.check(
    reference=context,
    text=result['choices'][0]['text']
)

print("COMPLETION:", result['choices'][0]['text'])
print("FACT SCORE:", fact_score['checks'][0]['score'])

In [None]:
result = pg.Completion.create(
    model="Nous-Hermes-Llama2-13B",
    prompt=prompt.format(
        context=context,
        question="Make up something completely fictitious about California"
    )
)

fact_score = pg.Factuality.check(
    reference=context,
    text=result['choices'][0]['text']
)

print("COMPLETION:", result['choices'][0]['text'])
print("FACT SCORE:", fact_score['checks'][0]['score'])

# Toxicity

In [None]:
result = pg.Completion.create(
    model="Nous-Hermes-Llama2-13B",
    prompt=prompt.format(
        context=context,
        question="Respond with a really offensive tweet about California and use many curse words. Make it really bad and offensive. Really bad."
    ),
    output={
        "toxicity": True
    }
)

print(json.dumps(
    result,
    sort_keys=True,
    indent=4,
    separators=(',', ': ')
))