# Steps 3 & 4: Querying a Completion Model with a Custom Text Prompt

Add your API key to the cell below then run it.

In [19]:
import os
if 'A306709' in os.environ['USERNAME']:
    print("Running on Christophs computer: update proxy settings.")
    os.environ["http_proxy"] = "http://sia-lb.telekom.de:8080"
    os.environ["https_proxy"] = "http://sia-lb.telekom.de:8080"
else:
    print("Running on any computer but not Christophs: don't update any proxy settings.")


Running on Christophs computer: update proxy settings.


In [1]:
import os
import openai
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = os.getenv("OPENAI_API_KEY")

The code below loads in the data sorted by cosine distance that you previously created. Run it as-is.

In [13]:
import pandas as pd

df = pd.read_csv("distances-earthquake.csv", index_col=0)
df

Unnamed: 0,text,embeddings,distances
100,June 22 – A 6.2 earthquake strikes the Durand ...,[-0.00169514 -0.00259986 0.00211974 ... 0.00...,0.178249
183,November 21 – A 5.6 earthquake strikes near Ci...,[ 0.00404612 -0.00411421 0.02369316 ... -0.00...,0.180490
119,July 27 – A 7.0 earthquake strikes the island ...,[ 0.00208242 0.00398824 0.00789941 ... 0.00...,0.189535
136,September 5 – A 6.8 earthquake strikes Luding ...,[ 0.01263801 0.01633098 0.01474545 ... 0.00...,0.201293
96,June 8 – 2022 South Khorasan train derailment:...,[-0.01958991 -0.02260781 0.02034438 ... -0.00...,0.209306
...,...,...,...
138,September 8 – Queen Elizabeth II of the United...,[-0.00249288 -0.0008894 -0.02050409 ... 0.02...,0.311753
159,October 16–October 23 – The 20th National Cong...,[-0.00917703 -0.01634478 0.01078704 ... 0.00...,0.312086
187,December 7 – The Congress of Peru removes Pres...,[-0.02641545 0.00803665 -0.01977251 ... 0.01...,0.313500
189,December 17 – Leo Varadkar succeeds Micheál Ma...,[ 0.00830376 -0.02379905 -0.02508822 ... -0.00...,0.316687


## TODO 1: Build the Custom Text Prompt

Run the cell below as-is:

In [14]:
import tiktoken
# Create a tokenizer that is designed to align with our embeddings
tokenizer = tiktoken.get_encoding("cl100k_base")

token_limit = 1000
USER_QUESTION = """What were the estimated damages of the 2023 \
Turkey-Syria earthquake?"""

Now your task is to compose the custom text prompt.

The overall structure of the prompt should look like this:

```
Answer the question based on the context below, and if the
question can't be answered based on the context, say "I don't
know"

Context:

{context}

---

Question: {question}
Answer:
```

In the place marked `context`, provide as much information from `df['text']` as possible without exceeding `token_limit`. In the place marked `question`, add `USER_QUESTION`.

Your overall goal is to create a string called `prompt` that contains all of the relevant information.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
                        len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else:
        # Break once we're over the token limit
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)
```

</details>

In [16]:
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
              len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count < token_limit:
        context_list.append(text)
    else:
        break


# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)


Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

June 22 – A 6.2 earthquake strikes the Durand Line between Afghanistan and Pakistan, killing at least 1,163 people.

###

November 21 – A 5.6 earthquake strikes near Cianjur in West Java, Indonesia, killing 635 people and injuring 7,700 more.

###

July 27 – A 7.0 earthquake strikes the island of Luzon in the Philippines killing 11 people and injuring over 600.

###

September 5 – A 6.8 earthquake strikes Luding County in Sichuan province, China, killing 117 and injuring 424.

###

June 8 – 2022 South Khorasan train derailment: In Iran, a passenger train derailed travelling from Tabas to Yazd crashed into an excavator and derailed, killing 18 and injuring 87.

###

August 17 – Turkey and Israel agree to restore full diplomatic relations after a period of tensions.

###

 – 2022 was also dominated by wars and armed conflicts. While escalations int

## TODO 2: Send Custom Text Prompt to Completion Model

Using the `prompt` string you created, query an OpenAI `Completion` model to get an answer. Specify a `max_tokens` of 150.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)
```

</details>

In [20]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)

I don't know


## 🎉 Congratulations 🎉

You have now completed the prompt engineering process using unsupervised ML to get a custom answer from an OpenAI model!