# Steps 3 & 4: Querying a Completion Model with a Custom Text Prompt

Add your API key to the cell below then run it.

In [1]:
import openai
openai.api_key = "OPENAI_API_KEY"

The code below loads in the data sorted by cosine distance that you previously created. Run it as-is.

In [2]:
import pandas as pd

df = pd.read_csv("distances.csv", index_col=0)
df

Unnamed: 0,text,embeddings,distances
3,"The confirmed death toll in Turkey was 53,537;...",[ 0.00133044 -0.02500667 -0.01326032 ... 0.00...,0.090168
0,"On 6 February 2023, at 04:17 TRT (01:17 UTC), ...",[-0.00790528 -0.01477885 -0.01355286 ... -0.00...,0.116899
88,Immediately after the earthquakes the Turkish ...,[-0.0144761 -0.02395907 -0.00486731 ... 0.01...,0.122131
145,The Turkish Government was criticized on socia...,[-0.00442697 0.00012691 -0.00424572 ... -0.00...,0.122801
184,"Mahase, Elisabeth (7 February 2023). ""Death to...",[-0.01114416 -0.0232568 -0.0060126 ... 0.00...,0.123756
...,...,...,...
71,\t\t\t,[-0.01771244 -0.02737251 -0.02497482 ... -0.01...,0.295928
72,\t\t\t,[-0.01771244 -0.02737251 -0.02497482 ... -0.01...,0.295928
75,\t\t\t,[-0.01771244 -0.02737251 -0.02497482 ... -0.01...,0.295928
67,\t\t\t,[-0.01771244 -0.02737251 -0.02497482 ... -0.01...,0.295928


## TODO 1: Build the Custom Text Prompt

Run the cell below as-is:

In [3]:
import tiktoken
# Create a tokenizer that is designed to align with our embeddings
tokenizer = tiktoken.get_encoding("cl100k_base")

token_limit = 1000
USER_QUESTION = """What were the estimated damages of the 2023 \
Turkey-Syria earthquake?"""

Now your task is to compose the custom text prompt.

The overall structure of the prompt should look like this:

```
Answer the question based on the context below, and if the
question can't be answered based on the context, say "I don't
know"

Context:

{context}

---

Question: {question}
Answer:
```

In the place marked `context`, provide as much information from `df['text']` as possible without exceeding `token_limit`. In the place marked `question`, add `USER_QUESTION`.

Your overall goal is to create a string called `prompt` that contains all of the relevant information.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
                        len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else:
        # Break once we're over the token limit
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)
```

</details>

In [5]:
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(USER_QUESTION))+len(tokenizer.encode(prompt_template))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count<=token_limit:
        context_list.append(text)

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)


Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

The confirmed death toll in Turkey was 53,537; estimates of the number of dead in Syria were between 5,951 and 8,476. It is the deadliest earthquake in what is now present-day Turkey since the 526 Antioch earthquake and the deadliest natural disaster in its modern history. It is also the deadliest in present-day Syria since the 1822 Aleppo earthquake; the deadliest worldwide since the 2010 Haiti earthquake; and the fifth-deadliest of the 21st century. Damages were estimated at US$148.8 billion in Turkey, or nine-percent of the country's GDP, and US$14.8 billion in Syria.

###

On 6 February 2023, at 04:17 TRT (01:17 UTC), a Mw 7.8 earthquake struck southern and central Turkey and northern and western Syria. The epicenter was 37 km (23 mi) west–northwest of Gaziantep. The earthquake had a maximum Mercalli intensity of XII (Extreme) around the epic

## TODO 2: Send Custom Text Prompt to Completion Model

Using the `prompt` string you created, query an OpenAI `Completion` model to get an answer. Specify a `max_tokens` of 150.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)
```

</details>

In [7]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(prompt, engine=COMPLETION_MODEL_NAME)
answer = response['choices'][0]['text'] 
print(answer)

APIConnectionError: Error communicating with OpenAI: Invalid leading whitespace, reserved character(s), or returncharacter(s) in header value: 'Bearer \nAnswer the question based on the context below, and if the \nquestion can\'t be answered based on the context, say \n"I don\'t know"\n\nContext: \n\nThe confirmed death toll in Turkey was 53,537; estimates of the number of dead in Syria were between 5,951 and 8,476. It is the deadliest earthquake in what is now present-day Turkey since the 526 Antioch earthquake and the deadliest natural disaster in its modern history. It is also the deadliest in present-day Syria since the 1822 Aleppo earthquake; the deadliest worldwide since the 2010 Haiti earthquake; and the fifth-deadliest of the 21st century. Damages were estimated at US$148.8 billion in Turkey, or nine-percent of the country\'s GDP, and US$14.8 billion in Syria.\n\n###\n\nOn 6 February 2023, at 04:17 TRT (01:17 UTC), a Mw 7.8 earthquake struck southern and central Turkey and northern and western Syria. The epicenter was 37 km (23 mi) west–northwest of Gaziantep. The earthquake had a maximum Mercalli intensity of XII (Extreme) around the epicenter and in Antakya. It was followed by a Mw\u202f7.7 earthquake at 13:24. This earthquake was centered 95 km (59 mi) north-northeast from the first. There was widespread damage and tens of thousands of fatalities.\n\n###\n\nImmediately after the earthquakes the Turkish lira value struck a record low of 18.85 against the US dollar, but rebounded to its starting position at the end of the day. Turkish stock markets fell; main equities benchmark fell as much as 5 percent and banks fell 5.5 percent but recovered from the losses. The country\'s main stock market dropped 1.35 percent on 6 February. The Borsa Istanbul fell 8.6 percent on 7 February, and declined by more than 7 percent on the morning of 8 February before trading was suspended; the exchange then announced it would close for five days. When the exchange reopened, Turkey\'s stock soared nearly 10 percent while the lira fell to a record low of 18.9010 against the dollar. Total cost of earthquake damage in Turkey was estimated by TÜRKONFED at $84.1 billion US dollars; $70.75 billion on rebuilding, $10.4 billion loss in national income, and an additional $2.91 billion loss in workforce. Turkish president Recep Tayyip Erdogan said rebuilding would cost $105 billion. The European Bank for Reconstruction and Development said potential losses may be up to 1 percent of Turkey\'s GDP in 2023. The Turkish government released a preliminary report estimating the total damage cost at $103.6 billion; corresponding to 9 percent of its GDP in 2023. About half of residential property in the affected area is thought to be covered by Compulsory Earthquake Insurance.\n\n###\n\nThe Turkish Government was criticized on social media for allegedly trying to cover up the fact that there were not two, but three mainshocks above Mw\u202f7. However, professor Hasan Sözbilir, Director of Dokuz Eylül University (DEU) Earthquake Research and Application Center, told Anadolu Agency that there were only 2 mainshocks reaching above Mw\u202f7 between 6 and 17 February 2023, but of the smaller quakes, there was one that reached Mw\u202f6.7. Additional allegations were made when the death toll in Turkey was at 41,000, could in fact be up to five times higher. The Justice and Development Party (AKP) government was accused of manipulating the death toll of the earthquakes to mask the scale of the disaster amid growing criticism due to what many say was a delayed and ineffective response to the tragedy.\n\n###\n\nMahase, Elisabeth (7 February 2023). "Death toll rises after two earthquakes hit Turkey and Syria in 12 hours" (PDF). BMJ. 380 (380): 304. doi:10.1136/bmj.p304. PMID 36750243. S2CID 256630400.\n\n---\n\nQuestion: What were the estimated damages of the 2023 Turkey-Syria earthquake?\nAnswer:'

## 🎉 Congratulations 🎉

You have now completed the prompt engineering process using unsupervised ML to get a custom answer from an OpenAI model!