# 08: Text Embedding for Question and Answering

> "Discover the power of Artificial Intelligence with OpenAI's powerful API to generate text and images in projects". Udemy course.

Text embedding:
* text embedding != fine tunning

In [147]:
import os
import openai
import numpy as np

In [148]:
#os.environ["OPENAI_API_KEY"] = "sk-"
openai.api_key = os.getenv("OPENAI_API_KEY")

In [149]:
def gpt_call(prompt):
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "user", "content": prompt},
        ],
        temperature=0,
        max_tokens=1024,
    )
    return response.choices[0].message.content

# Model hallucination

In [150]:
ans = gpt_call('Q:How many champions league does Manchester City won?\nA: ') # Chelsea won 1-0 vs City
print(ans)

Manchester City has won the UEFA Champions League once, in the 2020-2021 season.


In [151]:
prompt = '''Only answer the question if you are 100\% sure about the answer. If you are not sure, please say so. 

Q: How many champions league does Manchester City won?
A: 
'''

In [152]:
ans = gpt_call(prompt)
print(ans)

Manchester City has won the UEFA Champions League once, in the 2020-2021 season.


## Text Embedding context added

In [153]:
import pandas as pd 
df = pd.read_csv('champions.csv')
df.head()

Unnamed: 0,Date,Club Club Manager
0,22/23,\tManchester City FC\tManchester City FC\tPep ...
1,21/22,\tReal Madrid CF\tReal Madrid CF\tCarlo Ancelotti
2,20/21,\tChelsea FC\tChelsea FC\tThomas Tuchel
3,19/20,\tFC Bayern Munique\tFC Bayern Munique\tHansi ...
4,18/19,\tFC Liverpool\tFC Liverpool\tJürgen Klopp


In [154]:
def summary(date, winner_str):
    print(winner_str)
    winner_list = winner_str.strip().split('\\t')
    print(winner_list)
    _, winner, _, manager = winner_list
    prompt = f'''The {date} Champions league winner was {winner} and the manager was {manager}.'''
    return prompt

In [155]:
key = '20/21'
row = df.loc[df['Date'] == key]
row2 = row.iloc[:, 1]
print(row2.to_string(index=False, header=False))

 \tChelsea FC\tChelsea FC\tThomas Tuchel


In [156]:
summ = summary(key, row2.to_string(index=False, header=False))
print(summ)

 \tChelsea FC\tChelsea FC\tThomas Tuchel
['', 'Chelsea FC', 'Chelsea FC', 'Thomas Tuchel']
The 20/21 Champions league winner was Chelsea FC and the manager was Thomas Tuchel.


## Embedding

In [157]:
def get_embedding(text, model="text-embedding-3-small"):
    text = text.replace("\n", " ")
    return openai.embeddings.create(input = [text], model=model).data[0].embedding

In [158]:
summ_emb = get_embedding(summ)
prompt_emb = get_embedding(prompt)
len(summ_emb), len(prompt_emb)

(1536, 1536)

## Vector similarity

In [159]:
def numpy_similarity(emb1, emb2):
    return np.dot(np.array(emb1), np.array(emb2))

In [160]:
sim = numpy_similarity(summ_emb, prompt_emb)

sim

0.3387965771147252

In [161]:
prompt2 = f'Context:{summ}\n{prompt}'
print(prompt2)

Context:The 20/21 Champions league winner was Chelsea FC and the manager was Thomas Tuchel.
Only answer the question if you are 100\% sure about the answer. If you are not sure, please say so. 

Q: How many champions league does Manchester City won?
A: 



In [162]:
print(gpt_call(prompt2))

Manchester City has won the Champions League 0 times.


**Real world cenario you add this "similarity" to your table with each line. Then you add the context with most similarity to the question.**