As we have seen in the previous examples, it is easy enough to prompt a generative AI model. Shoot off an API call, and suddently you have an answer, a machine translation, sentiment analyzed, or a chat message generated. However, going from "prompting" to **prompt engineering** and the engineering of your AI model based processes is a bit more involved. The importance of the "engineering" in prompt engineering has become increasingly apparent, as models have become more complex and powerful, and the demand for more accurate and interpretable results has grown.

The ability to engineer effective prompts allows us to configure and tune model responses to better suit our specific needs (e.g., for a particular industry like healthcare), whether we are trying to improve the quality of the output, reduce bias, or optimize for efficiency. By leveraging the principles of prompt engineering, we can ensure that our models are not only accurate and reliable, but also robust and scalable.

# Dependencies and imports

In [31]:
! pip install predictionguard langchain

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [34]:
import predictionguard as pg
from langchain import PromptTemplate
from langchain import PromptTemplate, FewShotPromptTemplate
import numpy as np

In [35]:
client = pg.Client(token="n4HehSxYpKxQyhKX58IzqPjXa2pOOJ")

# Prompt Templates

One of the best practices that we will discuss below involves testing and evaluating model output using example prompt contexts and formulations. In order to institute this practice, we need a way to rapidly and programmatically format prompts with a variety of contexts. We will need this in our applications anyway, because in production we will be receiving dynamic input from the user or another application. That dynamic input (or something extracted from it) will be inserted into our prompts on-the-fly. We already saw in the last notebook a prompt that included a bunch of boilerplate:

## Zero shot Q&A

In [None]:
template = """Read the context below and answer the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: {context}

Question: {question}

Answer: """
 
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [None]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."

question = "How are gift cards delivered?"

myprompt = prompt.format(context=context, question=question)
print(myprompt)

Read the context below and answer the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail.

Question: How are gift cards delivered?

Answer: 


## Few Shot - Sentiment

This kind of prompt template could in theory be flexible to create zero shot or few shot prompts. However, LangChain provides a bit more convenience for few shot prompts. We can first create a template for individual demonstrations within the few shot prompt:

In [36]:
# Create a string formatter for sentiment analysis demonstrations.
demo_formatter_template = """
Text: {text}
Sentiment: {sentiment}
"""

# Define a prompt template for the demonstrations.
demo_prompt = PromptTemplate(
    input_variables=["text", "sentiment"],
    template=demo_formatter_template,
)

In [37]:
# Each row here includes:
# 1. an example text input (that we want to analyze for sentiment)
# 2. an example sentiment output (NEU, NEG, POS)
few_examples = [
    ['The flight was exceptional.', 'POS'],
    ['That pilot is adorable.', 'POS'],
    ['This was an awful seat.', 'NEG'],
    ['This pilot was brilliant.', 'POS'],
    ['I saw the aircraft.', 'NEU'],
    ['That food was exceptional.', 'POS'],
    ['That was a private aircraft.', 'NEU'],
    ['This is an unhappy pilot.', 'NEG'],
    ['The staff is rough.', 'NEG'],
    ['This staff is Australian.', 'NEU']
]
examples = []
for ex in few_examples:
  examples.append({
      "text": ex[0],
      "sentiment": ex[1]
  })

In [38]:
few_shot_prompt = FewShotPromptTemplate(
    
    # This is the demonstration data we want to insert into the prompt.
    examples=examples,
    example_prompt=demo_prompt,
    example_separator="",

    # This is the boilerplate portion of the prompt corresponding to
    # the prompt task instructions.
    prefix="Classify the sentiment of the text. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.\n",

    # The suffix of the prompt is where we will put the output indicator
    # and define where the "on-the-fly" user input would go.
    suffix="\nText: {input}\nSentiment:",
    input_variables=["input"],
)

myprompt = few_shot_prompt.format(input="The flight is boring.")
print(myprompt)

Classify the sentiment of the text. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.

Text: The flight was exceptional.
Sentiment: POS

Text: That pilot is adorable.
Sentiment: POS

Text: This was an awful seat.
Sentiment: NEG

Text: This pilot was brilliant.
Sentiment: POS

Text: I saw the aircraft.
Sentiment: NEU

Text: That food was exceptional.
Sentiment: POS

Text: That was a private aircraft.
Sentiment: NEU

Text: This is an unhappy pilot.
Sentiment: NEG

Text: The staff is rough.
Sentiment: NEG

Text: This staff is Australian.
Sentiment: NEU

Text: The flight is boring.
Sentiment:


## Few Shot - Text Classification

In [39]:
demo_formatter_template = """\nText: {text}
Categories: {categories}
Class: {class}\n"""

# Define a prompt template for the demonstrations.
demo_prompt = PromptTemplate(
    input_variables=["text", "categories", "class"],
    template=demo_formatter_template,
)

# Each row here includes:
# 1. an example set of categories for the text classification
# 2. an example text that we want to classify
# 3. an example label that we expect as the output
few_examples = [
    ["I have successfully booked your tickets.", "agent, customer", "agent"],
    ["What's the oldest building in US?", "quantity, location", "location"],
    ["This video game is amazing. I love it!", "positive, negative", ""],
    ["Dune is the best movie ever.", "cinema, art, music", "cinema"]
]
examples = []
for ex in few_examples:
  examples.append({
      "text": ex[0],
      "categories": ex[1],
      "class": ex[2]
  })

few_shot_prompt = FewShotPromptTemplate(
    
    # This is the demonstration data we want to insert into the prompt.
    examples=examples,
    example_prompt=demo_prompt,
    example_separator="",

    # This is the boilerplate portion of the prompt corresponding to
    # the prompt task instructions.
    prefix="Classify the following input text into one of the given categories.\n",

    # The suffix of the prompt is where we will put the output indicator
    # and define where the "on-the-fly" user input would go.
    suffix="\nText: {text}\nCategories: {categories}\nClass: ",
    input_variables=["text", "categories"],
)

myprompt = few_shot_prompt.format(
    text="I have a problem with my iphone that needs to be resolved asap!",
    categories="urgent, not urgent")
print(myprompt)

Classify the following input text into one of the given categories.

Text: I have successfully booked your tickets.
Categories: agent, customer
Class: agent

Text: What's the oldest building in US?
Categories: quantity, location
Class: location

Text: This video game is amazing. I love it!
Categories: positive, negative
Class: 

Text: Dune is the best movie ever.
Categories: cinema, art, music
Class: cinema

Text: I have a problem with my iphone that needs to be resolved asap!
Categories: urgent, not urgent
Class: 


In [40]:
client.predict(name="default-text-gen", data={
    "prompt": myprompt
})

{'text': '\nUrgent'}

# Parameters

Although we have most sent a single text prompt the models to get a response. There is configurability via parameters such as `temperature` and `max_tokens`. Optimizing model parameters can help us achieve a desired output.

## Temperature

In [None]:
for temp in np.arange(0.0, 1.0, 0.4):
  print("\nTemperature: ", temp)
  print("----------------------------")
  for i in range(0,3):
    completion = client.predict(name="default-text-gen", data={
        "prompt": "A great name for a unknown wizard (other than Gandalf and Radagast) from the Lord of the Rings universe is:",
        "temperature": temp
    })['text'].strip()
    print(completion)


Temperature:  0.0
----------------------------
Mithrandir.
Mithrandir.
Merlinus the Magnificent.

Temperature:  0.4
----------------------------
Merlinus the Magnificent.
Merlindor.
Merlinor the Magnificent.

Temperature:  0.8
----------------------------
Merlinus the White.
Vardaol.
Alatar the Blue.


## Max Tokens

In [None]:
for tokens in range(20, 200, 80):
  print("\nMax Tokens: ", tokens)
  print("----------------------------")
  completion = client.predict(name="default-text-gen", data={
      "prompt": "Write a story about Alatar the Blue, a great wizard from the Lord of the Rings.",
      "temperature": 0.8,
      "max_tokens": tokens
  })['text'].strip()
  print(completion)


Max Tokens:  20
----------------------------
Once upon a time, in a distant and forgotten land, there lived a great wizard,

Max Tokens:  100
----------------------------
It was said that Alatar the Blue was one of the greatest wizards of Middle-earth, and his powers were almost unmatched. He was one of the five Maiar that had been sent to Middle-earth to help the Free Peoples in their struggle against the forces of the Dark Lord Sauron.

Alatar was known for his wisdom and knowledge, and his magical powers were almost unparalleled. He was greatly respected by all who knew him, and his counsel was sought by kings and

Max Tokens:  180
----------------------------
Once upon a time, in the faraway land of Middle-Earth, there lived a great and powerful wizard by the name of Alatar the Blue. He was a mysterious figure, shrouded in mystery and legend.

Alatar was known to be one of the wisest and most powerful of the Istari, a group of five wizards sent by the Valar, god-like beings, to Mi

# Multiple formulations

Why settle for a single prompt and/or set of parameters when you can use mutliple. Try using multiple formulations of your prompt to either:

1. Provide multiple options to users; or
2. Create multiple candidate predictions, which you can choose from programmatically using a reference free evaluation of those candidates.

In [None]:
template1 = """Read the context below and answer the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: {context}

Question: {question}

Answer: """
 
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [None]:
template2 = """Answer the question below based on the given context. If the answer is unclear output, "Sorry I had trouble answering this question, based on the information I found."

Context: {context}
Question: {question}
Answer: """
 
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [None]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."
question = "How are gift cards delivered?"

answer1 = client.predict(name="default-text-gen", data={
    "prompt": template1.format(context=context, question=question),
    "temperature": 0.1
})

answer2 = client.predict(name="default-text-gen", data={
    "prompt": template2.format(context=context, question=question),
    "temperature": 0.1
})

answer3 = client.predict(name="default-text-gen", data={
    "prompt": template1.format(context=context, question=question),
    "temperature": 0.8
})

answer4 = client.predict(name="default-text-gen", data={
    "prompt": template2.format(context=context, question=question),
    "temperature": 0.8
})

for i, a in enumerate([answer1, answer2, answer3, answer4]):
  print("Answer", str(i+1) + ": ", a)

Answer 1:  {'text': ' Gift cards are delivered via US Mail.'}
Answer 2:  {'text': ' Gift cards are delivered via US Mail with a personalized card carrier.'}
Answer 3:  {'text': ' Via US Mail.'}
Answer 4:  {'text': ' Gift cards are delivered via US Mail with a personalized card carrier.'}


In [None]:
for i, a in enumerate([answer1, answer2, answer3, answer4]):
  factuality = client.predict(name="default-fact", data={
      "text": a['text'],
      "reference": context
  })
  print("Answer", str(i+1) + " Factuality: ", 
        str(int(factuality['probability']*100.0)) + "% factual")

Answer 1 Factuality:  98% factual
Answer 2 Factuality:  92% factual
Answer 3 Factuality:  98% factual
Answer 4 Factuality:  92% factual


# Type checking, task formatting

Reliability and consistency in LLM output is a major problem for the "last mile" of LLM integrations. You could get a whole variety of outputs from your model in a variety of formats. An increasing number of tools, including [Prediction Guard](https://www.predictionguard.com/), allow you to force a certain task structure or output type checking on your inferences. Another example of such a tool is the [Language Model Query Language](https://lmql.ai/).

In [41]:
client.predict(name="default-text-gen", data={
    "prompt": """Assign a sentiment label to the text included below. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.

Text: This workshop is spectacular. I love it! So wonderful.
Sentiment:"""
})

{'text': ' POS'}

In [None]:
client.predict(name="default-fact", data={
    "text": "The sky is green",
    "reference": "The sky is blue"
})

{'factual': False, 'probability': 0.039652372413592346}

In [None]:
client.predict(name="default-sentiment", data={
    "phrase": "Everything is wonderful at this conference. What a great, amazing place!"
})

{'sentiment': 'POS'}

In [None]:
client.predict(name="default-mt", data={
    "source": "eng",
    "target": "hin",
    "text": "Everything is wonderful at this conference. What a great, amazing place!"
})

{'translation': 'इस सम्मेलन में सब कुछ अद्भुत है। क्या एक महान, अद्भुत जगह है!'}

# Model selection



Not all LLMs or pre-trained NLP models are going to perform the same on your data and prompts. There is a huge variability in fact! Don't just assume a single API, like OpenAI, is the model to rule them all. New models will come out making your integration irrelevant, or you might just have reliability issues from lock-in to a single provider/model. 

Prediction Guard can automatically evaluates hundreds of models (from OpenAI, Cohere, Hugging Face, etc.) and configure the best one for your use case, domain, or task. You could do this manually, but let's just do it with one API call and move on to other things.

In [46]:
examples = [
    {
        "input": {
            "source": "hin",
            "target": "eng",
            "text": "Good, kya tumne actor jim carrey ke bare me suna hai"
        },
        "output": {
            "translation": "Good. Have you heard of actor jim carrey"
        }
    },
    {
        "input": {
            "source": "hin",
            "target": "eng",
            "text": "Bruce almight uski one of the best movie hai, kya tumne suna hai uske bare me"
        },
        "output": {
            "translation": "One of his best movies is Bruce almighty, have you heard of him"
        }
    },
    {
        "input": {
            "source": "hin",
            "target": "eng",
            "text": "maine suna hai, yah kiske bare me hai?"
        },
        "output": {
            "translation": "I've heard of it. Whats it about?"
        }
    },
    {
        "input": {
            "source": "hin",
            "target": "eng",
            "text": "Jim Carry ke liye kuch bhi sahi nahi ho raha hota hai aur god ko complaint karta hai, Morgan freeman ne god ka part kiya hai wo use ek week ke liye god bana deta hai"
        },
        "output": {
            "translation": "Jim Carrey things nothing goes right for him and complains to god. Morgan freeman plays god and lets him play god for a week"
        }
    }
]

In [None]:
client.create_proxy(name="hinglish-mt", task="mt", examples=examples)

Creating the proxy endpoint. Evaluating a bunch of SOTA models! -

Proxy created successfully!
---------------------------
Name: hinglish-mt
Task: mt
Status: available
Failure Rate: 0.75


In [None]:
client.predict(name="hinglish-mt", data={
    "source": "hin",
    "target": "eng",
    "text": "morgan tho yek acha aadmi hein. rotten tomatoes ne 48% deya jo isse bhi ache de sakthe they"
})

{'translation': 'Morgan is a good man. Rotten Tomatoes gave 48% which could be even better.'}

In [None]:
client.delete_proxy("hinglish-mt")

Proxy deleted successfully!


In [None]:
examples = [
    {
        "input": {
            "source": "deu",
            "target": "eng",
            "text": "Am Anfang schuf Gott Himmel und Erde."
        },
        "output": {
            "translation": "In the beginning God created the heavens and the earth."
        }
    },
    {
        "input": {
            "source": "deu",
            "target": "eng",
            "text": "Noch war die Erde leer und ungestaltet, von tiefen Fluten bedeckt. Finsternis herrschte, aber über dem Wasser schwebte der Geist Gottes."
        },
        "output": {
            "translation": "Now the earth was formless and empty, darkness was over the surface of the deep, and the Spirit of God was hovering over the waters."
        }
    },
    {
        "input": {
            "source": "deu",
            "target": "eng",
            "text": "Da sprach Gott: »Licht soll entstehen!«, und sogleich strahlte Licht auf."
        },
        "output": {
            "translation": "And God said, “Let there be light,” and there was light."
        }
    },
    {
        "input": {
            "source": "deu",
            "target": "eng",
            "text": "Gott sah, dass es gut war. Er trennte das Licht von der Dunkelheit"
        },
        "output": {
            "translation": "God saw that the light was good, and he separated the light from the darkness."
        }
    }
]

In [None]:
client.create_proxy(name="german-mt", task="mt", examples=examples)

Creating the proxy endpoint. Evaluating a bunch of SOTA models! \

Proxy created successfully!
---------------------------
Name: german-mt
Task: mt
Status: available
Failure Rate: 0.5


In [None]:
client.predict(name="german-mt", data={
    "source": "deu",
    "target": "eng",
    "text": "und nannte das Licht »Tag« und die Dunkelheit »Nacht«. Es wurde Abend und wieder Morgen: Der erste Tag war vergangen."
})

{'translation': 'and called the light "day" and the darkness "night." It was evening and again morning: the first day had passed.'}