[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/prompt-engineering.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/prompt-engineering.ipynb)

# Prompt Engineering

In this notebook we'll explore the fundamentals of prompt engineering. We'll start by installing the `openai` library, which we'll be using throughout these examples. However, note that we can use other LLMs here, like those offered by Cohere or open source alternatives available via Hugging Face.

In [1]:
!pip install -qU openai==0.27.7


[notice] A new release of pip is available: 23.0.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Structure of a Prompt

A prompt can consist of multiple components:

* Instructions
* External information or context
* User input or query
* Output indicator

Not all prompts require all of these components, but often a good prompt will use two or more of them. Let's define what they all are more precisely.

**Instructions** tell the model what to do, typically how it should use inputs and/or external information to produce the output we want.

**External information or context** are additional information that we either manually insert into the prompt, retrieve via a vector database (long-term memory), or pull in through other means (API calls, calculations, etc).

**User input or query** is typically a query directly input by the user of the system.

**Output indicator** is the *beginning* of the generated text. For a model generating Python code we may put `import ` (as most Python scripts begin with a library `import`), or a chatbot may begin with `Chatbot: ` (assuming we format the chatbot script as lines of interchanging text between `User` and `Chatbot`).

Each of these components should usually be placed the order we've described them. We start with instructions, provide context (if needed), then add the user input, and finally end with the output indicator.

In [1]:
prompt = """Answer the question based on the context below. If the
question cannot be answered using the information provided answer
with "I don't know".

Context: Large Language Models (LLMs) are the latest models used in NLP.
Their superior performance over smaller models has made them incredibly
useful for developers building NLP enabled applications. These models
can be accessed via Hugging Face's `transformers` library, via OpenAI
using the `openai` library, and via Cohere using the `cohere` library.

Question: Which libraries and model providers offer LLMs?

Answer: """

In this example we have:

```
Instructions

Context

Question (user input)

Output indicator ("Answer: ")
```

Let's try sending this to a GPT-3 model. For this, you will need [an OpenAI API key](https://beta.openai.com/account/api-keys).

We initialize a `text-davinci-003` model like so:

In [2]:
import os
import openai

# get API key from top-right dropdown on OpenAI website
openai.api_key = os.getenv("OPENAI_API_KEY") or "OPENAI_API_KEY"

openai.Engine.list()  # check we have authenticated

APIRemovedInV1: 

You tried to access openai.Engine, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


And make a generation from our prompt.

In [3]:
# now query text-davinci-003
res = openai.Completion.create(
    engine='gpt-3.5-turbo-instruct',
    prompt=prompt,
    max_tokens=256
)

print(res['choices'][0]['text'].strip())

APIRemovedInV1: 

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


Alternatively, if we do have the correct information withing the `context`, the model should reply with `"I don't know"`, let's try.

In [4]:
prompt = """Answer the question based on the context below. If the
question cannot be answered using the information provided answer
with "I don't know".

Context: Libraries are places full of books.

Question: Which libraries and model providers offer LLMs?

Answer: """

res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    max_tokens=256
)

print(res['choices'][0]['text'].strip())

APIRemovedInV1: 

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


Perfect, our instructions are being understood by the model. In most real use-cases we won't be providing the external information / context to the model manually. Instead, it will be an automatic process using something like [long-term memory](https://www.pinecone.io/learn/openai-gen-qa/) to retrieve relevant information from an external source.

For now, that's beyond the scope of what we're exploring here, you can find more on that in the link above.

In summary, a prompt often consists of those four components: instructions, context(s), user input, and the output indicator. Now we'll take a look at creative vs. stricter generation.

## Generation Temperature

The `temperature` parameter used in generation models tells us how "random" the model can be. It represents the probability of a model to choose a word which is *not* the first choice of the model.

This works because the model is actually assigning a probability prediction across all tokens within it's vocabulary with each _"step"_ of the model (each new word or sub-word).

With each new step forwards the model considers the previous tokens fed into the model, creates an embedding by encoding the information from these tokens over many model encoder layers, then passes this encoding to a decoder. The decoder then predicts the probability of each token that the model knows (ie is within the model *vocabulary*) based on the information encoded within the embedding.

At a temperature of `0.0` the decoder will always select the top predicted token. At a temperature of `1.0` the model will always select a word that *is predicted* considering it's assigned probability.

Considering all of this, if we have a conservative, fact based Q&A like in the previous example, it makes sense to set a lower `temperature`. However, if we're wanting to produce some creative writing or chatbot conversations, we might want to experiment and increase `temperature`. Let's try it.

In [6]:
prompt = """The below is a conversation with a funny chatbot. The
chatbot's responses are amusing and entertaining.

Chatbot: Hi there! I'm a chatbot.
User: Hi, what are you doing today?
Chatbot: """

res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    max_tokens=256,
    temperature=0.0  # set the temperature, default is 1
)

print(res['choices'][0]['text'].strip())

Oh, just hanging out and having a good time. What about you?


In [7]:
prompt = """The below is a conversation with a funny chatbot. The
chatbot's responses are amusing and entertaining.

Chatbot: Hi there! I'm a chatbot.
User: Hi, what are you doing today?
Chatbot: """

res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    max_tokens=512,
    temperature=1.0
)

print(res['choices'][0]['text'].strip())

I'm making people smile! What about you?


The second response is far more creative and demonstrates the type of difference we can expect between low `temperature` and high `temperature` generations.

## Few-shot Training

Sometimes we might find that a model doesn't seem to get what we'd like it to do. We can see this in the following example:

In [5]:
prompt = """The following is a conversation with an AI assistant.
The assistant is typically sarcastic and witty, producing creative 
and funny responses to the users questions. 

User: What is the meaning of life?
AI: """

res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    max_tokens=256,
    temperature=1.0
)

print(res['choices'][0]['text'].strip())

APIRemovedInV1: 

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


In this case we're asking for something amusing, a joke in return of our serious question. But we get a serious response even with the `temperature` set to `1.0`. To help the model, we can give it a few examples of the type of answers we'd like:

In [9]:
prompt = """The following are exerpts from conversations with an AI assistant.
The assistant is typically sarcastic and witty, producing creative 
and funny responses to the users questions. Here are some examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

User: What is the meaning of life?
AI: """

res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    max_tokens=256,
    temperature=1.0
)

print(res['choices'][0]['text'].strip())

All I can say is 42...just kidding! The meaning of life is the journey to find your own.


This is a much better response and the way we did this was by providing a *few* examples that included the example inputs and outputs that we'd expect. We refer to this as _"few-shot learning"_.

## Adding Multiple Contexts

In some use-cases like question-answering we can use an external source of information to improve the reliability or *factfulness* of model responses. We refer to this information as _"source knowledge"_, which is any knowledge fed into the model via the input prompt.

We'll create a list of "dummy" external information. In reality we'd likely use [long-term memory](https://www.pinecone.io/learn/openai-gen-qa/) or some form of information grabbing APIs.

In [10]:
contexts = [
    (
        "Large Language Models (LLMs) are the latest models used in NLP. " +
        "Their superior performance over smaller models has made them incredibly " +
        "useful for developers building NLP enabled applications. These models " +
        "can be accessed via Hugging Face's `transformers` library, via OpenAI " +
        "using the `openai` library, and via Cohere using the `cohere` library."
    ),
    (
        "To use OpenAI's GPT-3 model for completion (generation) tasks, you " +
        "first need to get an API key from " +
        "'https://beta.openai.com/account/api-keys'."
    ),
    (
        "OpenAI's API is accessible via Python using the `openai` library. " +
        "After installing the library with pip you can use it as follows: \n" +
        "```import openai\nopenai.api_key = 'YOUR_API_KEY'\nprompt = \n" +
        "'<YOUR PROMPT>'\nres = openai.Completion.create(engine='text-davinci" +
        "-003', prompt=prompt, max_tokens=100)\nprint(res)"
    ),
    (
        "The OpenAI endpoint is available for completion tasks via the " +
        "LangChain library. To use it, first install the library with " +
        "`pip install langchain openai`. Then, import the library and " +
        "initialize the model as follows: \n" +
        "```from langchain.llms import OpenAI\nopenai = OpenAI(" +
        "model_name='text-davinci-003', openai_api_key='YOUR_API_KEY')\n" +
        "prompt = 'YOUR_PROMPT'\nprint(openai(prompt))```"
    )
]

We would feed this external information into our prompt between the initial *instructions* and the *user input*. For OpenAI models it's recommended to separate the contexts from the rest of the prompt using `###` or `"""`, and each independent context can be separated with a few newlines and `##`, like so:

In [11]:
context_str = '\n\n##\n\n'.join(contexts)

print(f"""Answer the question based on the contexts below. If the
question cannot be answered using the information provided answer
with "I don't know".

###

Contexts:
{context_str}

###

Question: Give me two examples of how to use OpenAI's GPT-3 model
using Python from start to finish

Answer: """)

Answer the question based on the contexts below. If the
question cannot be answered using the information provided answer
with "I don't know".

###

Contexts:
Large Language Models (LLMs) are the latest models used in NLP. Their superior performance over smaller models has made them incredibly useful for developers building NLP enabled applications. These models can be accessed via Hugging Face's `transformers` library, via OpenAI using the `openai` library, and via Cohere using the `cohere` library.

##

To use OpenAI's GPT-3 model for completion (generation) tasks, you first need to get an API key from 'https://beta.openai.com/account/api-keys'.

##

OpenAI's API is accessible via Python using the `openai` library. After installing the library with pip you can use it as follows: 
```import openai
openai.api_key = 'YOUR_API_KEY'
prompt = 
'<YOUR PROMPT>'
res = openai.Completion.create(engine='text-davinci-003', prompt=prompt, max_tokens=100)
print(res)

##

The OpenAI endpoint is avai

In [12]:
prompt = f"""Answer the question based on the contexts below. If the
question cannot be answered using the information provided answer
with "I don't know".

###

Contexts:
{context_str}

###

Question: Give me two examples of how to use OpenAI's GPT-3 model
using Python from start to finish

Answer: """

res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    max_tokens=256,
    temperature=0.0
)

print(res['choices'][0]['text'].strip())

1. import openai
openai.api_key = 'YOUR_API_KEY'
prompt = '<YOUR PROMPT>'
res = openai.Completion.create(engine='text-davinci-003', prompt=prompt, max_tokens=100)
print(res)

2. from langchain.llms import OpenAI
openai = OpenAI(model_name='text-davinci-003', openai_api_key='YOUR_API_KEY')
prompt = 'YOUR_PROMPT'
print(openai(prompt))


Not bad, but are these contexts actually helping? Maybe the model is able to answer these questions without the additional information (source knowledge) as is able to rely solely on information stored within the model's internal parameters (parametric knowledge). Let's ask again without the external information.

In [13]:
prompt = f"""Answer the question based on the contexts below. If the
question cannot be answered using the information provided answer
with "I don't know".

Question: Give me two examples of how to use OpenAI's GPT-3 model
using Python from start to finish

Answer: """

res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    max_tokens=256,
    temperature=0.0
)

print(res['choices'][0]['text'].strip())

1. Using OpenAI's GPT-3 model with Python to generate text: 
    - Install the OpenAI Python package
    - Load the GPT-3 model
    - Generate text using the GPT-3 model

2. Using OpenAI's GPT-3 model with Python to generate images: 
    - Install the OpenAI Python package
    - Load the GPT-3 model
    - Generate images using the GPT-3 model


These are not really what we asked for, and are definitely not very specific. So clearly adding some source knowledge to our prompts can result in some much better results.

## Maximum Prompt Sizes

Considering that we might want to feed in external information to our prompts, they can naturally become quite large. With this we need to ask how large our prompts can be, because there is a maxiumum size.

The maxiumum *context window* of a LLM refers to tokens across both the *prompt* and the *completion* text. For `text-davinci-003` this is `4097` tokens.

We can set the maximum completion length of our model using `openai.max_tokens = 123`. However, measuring the total number of input tokens is more complex.

Because tokens don't map directly to words, we can only measure the number of tokens from text by actually tokenizing the text. GPT models use [OpenAI's TikToken tokenizer](https://github.com/openai/tiktoken). We can install the library via Pip:

In [14]:
!pip install -qU tiktoken==0.4.0

Taking the earlier prompt we can measure the number of tokens like so:

In [15]:
import tiktoken

prompt = f"""Answer the question based on the contexts below. If the
question cannot be answered using the information provided answer
with "I don't know".

###

Contexts:
{'##'.join(contexts)}

###

Question: Give me two examples of how to use OpenAI's GPT-3 model
using Python from start to finish

Answer: """

encoder_name = 'p50k_base'
tokenizer = tiktoken.get_encoding(encoder_name)

len(tokenizer.encode(prompt))

412

When feeding this prompt into `text-davinci-003` it will use `412` of our maximum context window of `4097`, leaving us with `4097 - 412 == 3685` tokens for our completion.

---

*Not all OpenAI models use the `p50k_base` encoder, a table of different encoders for different models can be found [here](), as of this writing they are:*

| Encoding name | OpenAI models |
| --- | --- |
| `gpt2` (or `r50k_base`) | Most GPT-3 models (and GPT-2) |
| `p50k_base` | Code models, `text-davinci-002`, `text-davinci-003` |
| `cl100k_base` | `text-embedding-ada-002` |

---

By default the maximum number of tokens used for completion is `256`. We can increase this upto the maximum calculated above of `3685`:

In [16]:
res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=prompt,
    temperature=0.0,
    max_tokens=3685
)

print(res['choices'][0]['text'].strip())

1. Import the `openai` library with pip, set the API key, and use the `Completion.create()` method to generate a response to a prompt: 
```import openai
openai.api_key = 'YOUR_API_KEY'
prompt = '<YOUR PROMPT>'
res = openai.Completion.create(engine='text-davinci-003', prompt=prompt, max_tokens=100)
print(res)```

2. Install the LangChain library with `pip install langchain openai`, import the library, and initialize the model with the API key: 
```from langchain.llms import OpenAI
openai = OpenAI(model_name='text-davinci-003', openai_api_key='YOUR_API_KEY')
prompt = 'YOUR_PROMPT'
print(openai(prompt))```


The model doesn't need the full size of completion and doesn't try to fill the full space, but because we increased the value of `openai.max_tokens`, inference does take notably longer.

If we exceed the maximum context window allowed, we'll see an error.

In [17]:
try:
    res = openai.Completion.create(
        engine='text-davinci-003',
        prompt=prompt,
        temperature=0.0,
        max_tokens=3686
    )
except openai.InvalidRequestError as e:
    print(e)

This model's maximum context length is 4097 tokens, however you requested 4098 tokens (412 in your prompt; 3686 for the completion). Please reduce your prompt; or completion length.


So it can be a good idea to integrate this type of check into our code if we expect to exceed the maximum context window at any point.