# Steps 3 & 4: Querying a Completion Model with a Custom Text Prompt

Add your API key to the cell below then run it.

In [2]:
import openai
openai.api_key = ""

The code below loads in the data sorted by cosine distance that you previously created. Run it as-is.

In [3]:
import pandas as pd

df = pd.read_csv("distances.csv", index_col=0)
df

Unnamed: 0,text,embeddings,distances
3,"The confirmed death toll in Turkey was 53,537;...",[ 0.00134106 -0.02500695 -0.01328005 ... 0.00...,0.090159
0,"On 6 February 2023, at 04:17 TRT (01:17 UTC), ...",[-0.00792725 -0.01488955 -0.01365658 ... -0.00...,0.116934
88,Immediately after the earthquakes the Turkish ...,[-0.0144761 -0.02395907 -0.00486731 ... 0.01...,0.122007
145,The Turkish Government was criticized on socia...,[-0.00442697 0.00012691 -0.00424572 ... -0.00...,0.122824
184,"Mahase, Elisabeth (7 February 2023). ""Death to...",[-0.01114416 -0.0232568 -0.0060126 ... 0.00...,0.123806
...,...,...,...
71,\t\t\t,[-0.01765627 -0.02742681 -0.0249322 ... -0.01...,0.295954
64,\t\t\t,[-0.01765627 -0.02742681 -0.0249322 ... -0.01...,0.295954
63,\t\t\t,[-0.01765627 -0.02742681 -0.0249322 ... -0.01...,0.295954
67,\t\t\t,[-0.01765627 -0.02742681 -0.0249322 ... -0.01...,0.295954


## TODO 1: Build the Custom Text Prompt

Run the cell below as-is:

In [5]:
import tiktoken
# Create a tokenizer that is designed to align with our embeddings
tokenizer = tiktoken.get_encoding("cl100k_base")

token_limit = 1000
USER_QUESTION = """What were the estimated damages of the 2023 \
Turkey-Syria earthquake?"""

Now your task is to compose the custom text prompt.

The overall structure of the prompt should look like this:

```
Answer the question based on the context below, and if the
question can't be answered based on the context, say "I don't
know"

Context:

{context}

---

Question: {question}
Answer:
```

In the place marked `context`, provide as much information from `df['text']` as possible without exceeding `token_limit`. In the place marked `question`, add `USER_QUESTION`.

Your overall goal is to create a string called `prompt` that contains all of the relevant information.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
                        len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else:
        # Break once we're over the token limit
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)
```

</details>

In [8]:
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    token_count += len(tokenizer.encode(text))
    # Append text to context_list if there is enough room
    if token_count <= token_limit:
        context_list.append(text)
    else:
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)


Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

The confirmed death toll in Turkey was 53,537; estimates of the number of dead in Syria were between 5,951 and 8,476. It is the deadliest earthquake in what is now present-day Turkey since the 526 Antioch earthquake and the deadliest natural disaster in its modern history. It is also the deadliest in present-day Syria since the 1822 Aleppo earthquake; the deadliest worldwide since the 2010 Haiti earthquake; and the fifth-deadliest of the 21st century. Damages were estimated at US$148.8 billion in Turkey, or nine-percent of the country's GDP, and US$14.8 billion in Syria.

###

On 6 February 2023, at 04:17 TRT (01:17 UTC), a Mw 7.8 earthquake struck southern and central Turkey and northern and western Syria. The epicenter was 37 km (23 mi) west–northwest of Gaziantep. The earthquake had a maximum Mercalli intensity of XII (Extreme) around the epic

## TODO 2: Send Custom Text Prompt to Completion Model

Using the `prompt` string you created, query an OpenAI `Completion` model to get an answer. Specify a `max_tokens` of 150.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)
```

</details>

In [16]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(model=COMPLETION_MODEL_NAME, prompt=prompt, max_tokens=150, logprobs=5, top_p=0.01)
answer = response['choices'][0]["text"].strip()
print(answer)

The estimated damages were US$148.8 billion in Turkey and US$14.8 billion in Syria.


In [17]:
response

<OpenAIObject text_completion id=cmpl-9dGOuKEWlsTg3xsazWnygbGSHGnFu at 0x15d5d3d10> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {
        "text_offset": [
          3726,
          3730,
          3740,
          3748,
          3753,
          3756,
          3757,
          3760,
          3761,
          3762,
          3770,
          3773,
          3780,
          3784,
          3787,
          3788,
          3790,
          3791,
          3792,
          3800,
          3803,
          3809
        ],
        "token_logprobs": [
          -0.8970265,
          -0.16500674,
          -0.015015941,
          -0.55660945,
          -0.48361897,
          -0.0091695525,
          -0.017323367,
          -3.0471343e-05,
          -7.9940866e-05,
          -0.00016504127,
          -0.0021435972,
          -6.5278815e-05,
          -0.0513892,
          -0.00021772196,
          -0.00042066345,
          -0.0009349247,
          -

## 🎉 Congratulations 🎉

You have now completed the prompt engineering process using unsupervised ML to get a custom answer from an OpenAI model!