# Load the dataset

In [1]:
import pandas as pd

In [None]:
#### Get this file from this notebook embedding_wikipedia.ipynb ####

df=pd.read_csv("embeddings_2022_fifa_world_cup.csv",index_col=False)

## When loading df from csv, need to convert string embedding to list type
import ast
df['embedding'] = df['embedding'].apply(ast.literal_eval)
df.head()

Unnamed: 0,text,embedding
0,2022 FIFA World Cup\n\nThe 2022 FIFA World Cup...,"[-0.002434000140056014, -0.012627567164599895,..."
1,2022 FIFA World Cup\n\nOverview\n\nSchedule\n\...,"[-0.0006040519219823182, -0.020295508205890656..."
2,2022 FIFA World Cup\n\nHost selection\n\nHost ...,"[0.01563253253698349, -0.009458768181502819, 0..."
3,2022 FIFA World Cup\n\nVenues\n\nThe first fiv...,"[0.01284846756607294, -0.007050588261336088, 0..."
4,2022 FIFA World Cup\n\nSquads\n\nBefore submit...,"[0.014988874085247517, -0.0017697860021144152,..."


In [None]:
df['embedding'].values[0][:10]

[-0.002434000140056014,
 -0.012627567164599895,
 0.02372991293668747,
 -0.005074540618807077,
 -0.026182977482676506,
 0.015760626643896103,
 -0.021048063412308693,
 -0.0010223754215985537,
 -0.01607838273048401,
 -0.039554089307785034]

# Similarity score

In [None]:
pip install openai

In [None]:
import openai

openai.api_key="YOUR_API_KEY"

We can calculate a similarity score between the question we ask and every item in our corpus, 
which enables us to provide the GPT-3.5 model with the most pertinent context for generating an answer to our inquiry.

In [None]:
from scipy import spatial

def similarity_score_top_inputs(
    query: str, #our question
    df: pd.DataFrame, #inputs corpus
    top_n: int = 50,
    embedding_model : str="text-embedding-ada-002",
    similarity_score_func=lambda x, y: 1 - spatial.distance.cosine(x, y),
) -> tuple[list[float],list[str]]:
    """Returns top_n inputs similar to the query."""

    #Embedding vector of our query
    query_embedding_response = openai.Embedding.create(
        model=embedding_model,
        input=query,
    )
    query_embedding = query_embedding_response["data"][0]["embedding"]
  
    strings_similarity_score = [
        (similarity_score_func(query_embedding, row["embedding"]), row["text"])
        for i, row in df.iterrows()
    ]
    
    strings_similarity_score.sort(key=lambda x: x[0], reverse=True)
    similarity_score, inputs= zip(*strings_similarity_score)
    
    return similarity_score[:top_n], inputs[:top_n]

In [None]:
similarity_scores, inputs= similarity_score_top_inputs("Who is the winner of world cup 2022?", df, top_n=10)
for similarity_score, string in zip(similarity_scores, inputs):
    print(f"{similarity_score=:.3f}, \n{string[:200]}\n")

similarity_score=0.858, 
2022 FIFA World Cup

The 2022 FIFA World Cup was the 22nd FIFA World Cup, the quadrennial world championship for national football teams organized by FIFA. It took place in Qatar from 20 November to 1

similarity_score=0.856, 
2022 FIFA World Cup seeding

Personnel involved

The 32 nations involved in the 2022 World Cup (29 of which were known) were drawn into eight groups of four. Two of the remaining three spots were fill

similarity_score=0.855, 
2022 FIFA World Cup final

Background

Argentina had won the World Cup twice before, in 1978 and 1986. They had also finished as losing finalists thrice, in 1930, 1990 and 2014. After the 2014 final l

similarity_score=0.848, 
2022 FIFA World Cup Group H

Standings

In the round of 16:

The winners of Group H, Portugal, advanced to play the runners-up of Group G, Switzerland.
The runners-up of Group H, South Korea, advanced

similarity_score=0.847, 
2022 FIFA World Cup Group D

Group D of the 2022 FIFA World Cup too

# Prompt preparation

In [None]:
def num_tokens(text: str, model: str = "gpt-3.5-turbo") -> int:
    """Return the number of tokens in a string."""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def message_preparation(
    query: str,
    df: pd.DataFrame,
    token_budget: int,
    model: str,
    top_n: int = 10
) -> str:
    """Return an organized message to ask to GPT, with relevant inputs text pulled from the corpus 'df'."""
    similarity_score, inputs_string = similarity_score_top_inputs(query, df, top_n)
    
    question = f"\n\nQuestion: {query}"

    introduction = 'Use the below articles from Wikipedia to answer the question. If the answer cannot be found in the articles, write "I could not find an answer."'
    message = introduction + '\n\nWikipedia article section:\n'
    
    for input_str in inputs_string:
        next_article = f'"""\n{input_str}\n"""\n'
        whole_message = message + next_article + question
        if (
            num_tokens(whole_message, model=model)
            > token_budget
        ):
            break
        else:
            message += next_article
    return message + question

In [None]:
query = "Who is the winner of the World Cup 2022?"
print(message_preparation(query, df, token_budget=4096, model='gpt-3.5-turbo',top_n=3))

Use the below articles from Wikipedia to answer the question. If the answer cannot be found in the articles, write "I could not find an answer."

Wikipedia article section:
"""
2022 FIFA World Cup

The 2022 FIFA World Cup was the 22nd FIFA World Cup, the quadrennial world championship for national football teams organized by FIFA. It took place in Qatar from 20 November to 18 December 2022, after the country was awarded the hosting rights in 2010. It was the first World Cup to be held in the Arab world and Muslim world, and the second held entirely in Asia after the 2002 tournament in South Korea and Japan.This tournament was the last with 32 participating teams, with the number of teams being increased to 48 for the 2026 edition. To avoid the extremes of Qatar's hot climate, the event was held during November and December. It was held over a reduced time frame of 29 days with 64 matches played in eight venues across five cities. Qatar entered the event—their first World Cup—automatica

# CHAT with GPT-3.5-TURBO

In [None]:
def chat(
    query: str,
    df: pd.DataFrame = df,
    model: str = "gpt-3.5-turbo",
    token_budget: int = 4096,
    top_n: int = 50
) -> str:
    """Ask GPT-3.5-Turbo model our question by giving a detailed message with a relevant context."""
    message = message_preparation(query, df, token_budget=token_budget, model=model, top_n=top_n)

    messages = [
        {"role": "user", "content": message},
    ]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0
    )
    response_message = response["choices"][0]["message"]["content"]
    return response_message

In [None]:
chat('Who is the winner of world cup 2022?')

'Argentina.'

In [None]:
chat('Who is the winner of world cup 2022? answer with details' )

"Argentina won the 2022 FIFA World Cup, defeating France 4-2 on penalties in the final after a 3-3 draw in extra time. It was Argentina's third title and their first since 1986, as well as being the first nation from outside of Europe to win the tournament since 2002. French player Kylian Mbappé became the first player to score a hat-trick in a World Cup final since Geoff Hurst in the 1966 final and won the Golden Boot as he scored the most goals (eight) during the tournament. Argentine captain Lionel Messi was voted the tournament's best player, winning the Golden Ball. Teammates Emiliano Martínez and Enzo Fernández won the Golden Glove, awarded to the tournament's best goalkeeper, and the Young Player Award, awarded to the tournament's best young player, respectively."

In [None]:
chat('Who is the defeated team at the final of world cup 2022?')

'The defeated team at the final of the 2022 FIFA World Cup was France.'

In [None]:
chat('At which stage of the competition the german team was defeated in the world cup 2022?')

'The German team was eliminated from the group stage of the 2022 FIFA World Cup.'

In [None]:
chat('How well did the Moroccan team in the world cup 2022?')

'The Moroccan team advanced to the round of 16 and later reached the semi-finals in the 2022 FIFA World Cup. They became the first African team to win their group since Nigeria in 1998. It was also the first time Morocco advanced to the knockout stage since 1986.'

In [None]:
chat('What are the different teams that Morocco has met during the world cup 2022?')

'Morocco has played against Croatia, Belgium, and Canada during the 2022 FIFA World Cup.'

In [None]:
chat('At which stage of the competition the Moroccan team has been defeated in the world cup 2022 before playing the third place?')

'I could not find an answer.'

# Asking ChatGPT without Context

In [None]:
messages = [
        {"role": "user", "content": 'Who is the winner of world cup 2022?'},
]
response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0
    )
response_message = response["choices"][0]["message"]["content"]
response_message

'As an AI language model, I do not have the ability to predict future events. Therefore, I cannot provide an answer to this question.'