EXPLANATION: https://www.youtube.com/watch?v=fmN19jXkH5o&t=1s

# Steps 3 & 4: Querying a Completion Model with a Custom Text Prompt

Add your API key to the cell below then run it.

In [1]:
import openai
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = "CRAP"

The code below loads in the data sorted by cosine distance that you previously created. Run it as-is.

In [2]:
import pandas as pd

df = pd.read_csv("distances.csv", index_col=0)
df

Unnamed: 0,text,embeddings,distances
2,There was widespread damage in an area of abou...,[-0.00367865 -0.02011255 -0.01324833 ... 0.00...,0.087936
45,The USGS Prompt Assessment of Global Earthquak...,[-0.00564726 -0.02702904 0.00706243 ... 0.00...,0.102309
46,The United Nations Development Programme estim...,[ 0.01216849 -0.00288201 -0.00684733 ... -0.00...,0.116451
0,"On 6 February 2023, at 04:17 TRT (01:17 UTC), ...",[-0.00786579 -0.01488738 -0.01354739 ... -0.00...,0.116957
70,The Turkish Government was criticized on socia...,[-5.34295512e-04 -8.55211547e-05 -6.40815916e-...,0.122179
...,...,...,...
27,\t\t\t,[-0.01989811 -0.02768373 -0.02324662 ... -0.01...,0.296139
28,\t\t\t,[-0.01989811 -0.02768373 -0.02324662 ... -0.01...,0.296139
36,\t\t\t,[-0.01992117 -0.02773259 -0.02331025 ... -0.01...,0.296254
31,\t\t\t,[-0.01992117 -0.02773259 -0.02331025 ... -0.01...,0.296254


## TODO 1: Build the Custom Text Prompt

Run the cell below as-is:

In [3]:
import tiktoken
# Create a tokenizer that is designed to align with our embeddings
tokenizer = tiktoken.get_encoding("cl100k_base")

token_limit = 1000
USER_QUESTION = """What were the estimated damages of the 2023 \
Turkey-Syria earthquake?"""

ModuleNotFoundError: No module named 'tiktoken'

Now your task is to compose the custom text prompt.

The overall structure of the prompt should look like this:

```
Answer the question based on the context below, and if the
question can't be answered based on the context, say "I don't
know"

Context:

{context}

---

Question: {question}
Answer:
```

In the place marked `context`, provide as much information from `df['text']` as possible without exceeding `token_limit`. In the place marked `question`, add `USER_QUESTION`.

Your overall goal is to create a string called `prompt` that contains all of the relevant information.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = len(tokenizer.encode(prompt_template)) + \
                        len(tokenizer.encode(USER_QUESTION))

# Create a list to store text for context
context_list = []

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    token_count += len(tokenizer.encode(text))
    if token_count <= token_limit:
        context_list.append(text)
    else:
        # Break once we're over the token limit
        break

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)
```

</details>

In [None]:
# Count the number of tokens in the prompt template and question
prompt_template = """
Answer the question based on the context below, and if the 
question can't be answered based on the context, say 
"I don't know"

Context: 

{}

---

Question: {}
Answer:"""
token_count = 

# Create a list to store text for context
context_list = 

# Loop over rows of the sorted dataframe
for text in df["text"].values:
    
    # Append text to context_list if there is enough room
    

# Use string formatting to complete the prompt
prompt = prompt_template.format(
    "\n\n###\n\n".join(context_list),
    USER_QUESTION
)
print(prompt)

## TODO 2: Send Custom Text Prompt to Completion Model

Using the `prompt` string you created, query an OpenAI `Completion` model to get an answer. Specify a `max_tokens` of 150.

If you're getting stuck, you can click to reveal the solution then copy and paste this into the cell below.

---

<details>
    <summary style="cursor: pointer"><strong>Solution (click to show/hide)</strong></summary>

```python
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=prompt,
    max_tokens=150
)
answer = response["choices"][0]["text"].strip()
print(answer)
```

</details>

In [None]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
response = 
answer = 
print(answer)

## 🎉 Congratulations 🎉

You have now completed the prompt engineering process using unsupervised ML to get a custom answer from an OpenAI model!