In [79]:
import numpy as np
import openai
import pandas as pd

COMPLETIONS_MODEL = 'gpt-3.5-turbo'
EMBEDDING_MODEL = 'text-embedding-ada-002'
openai.api_key = '' # FIXME don't post your API key in a public repo 🤪

# Load in the card database

As we will be using a knowledge base to aid the model in giving factual answers we will need to load said database in, I have opted to use a pandas dataframe for said knowledge base.

In [7]:
df = pd.read_csv('../data/embedding.csv')
df = df.set_index(['name'])
print(f'length: {len(df)}')
df.sample(2)

length: 226


Unnamed: 0_level_0,description
name,Unnamed: 1_level_1
Tome of Fyendal,Tome of Fyendal is a 'Generic Action' card fro...
Nimblism (Blue),Nimblism (Blue) is a 'Generic Action' card fro...


# Create Embeddings

By creating an embedding of each card description within the database we can transform the linguistic data contained about each card and transform them into vectors within high dimensional space.

In [8]:
def get_embedding(text: str, model: str=EMBEDDING_MODEL) -> list[float]:
    result = openai.Embedding.create(
      model=model,
      input=text
    )
    return result['data'][0]['embedding']

def compute_doc_embeddings(df: pd.DataFrame) -> dict[tuple[str, str], list[float]]:
    """
    Create an embedding for each row in the dataframe using the OpenAI Embeddings API.

    Return a dictionary that maps between each embedding vector and the index of the row that it corresponds to.
    """
    return {
        idx: get_embedding(r.description) for idx, r in df.iterrows()
    }

# Compute embeddings

In [9]:
document_embedding = compute_doc_embeddings(df)

In [10]:
# An example embedding:
example_entry = list(document_embedding.items())[0]
print(f"{example_entry[0]} : {example_entry[1][:5]}... ({len(example_entry[1])} entries)")

Alpha Rampage : [-0.008852995000779629, -0.020217960700392723, -0.01150753628462553, -0.011460011824965477, -0.007990778423845768]... (1536 entries)


# Find the most similar document

Once the cards have been transformed into vectors we are able to calculate their relatedness with traditional distance functions such as Euclidean distance. In this case because the OpenAI embeddings are normalized, a Cosine similarity will be able to be performed slightly faster but will result in an identical ranking as Euclidean distance.

In [11]:
def vector_similarity(x: list[float], y: list[float]) -> float:
    """
    Returns the similarity between two vectors.

    Because OpenAI Embeddings are normalized to length 1, the cosine similarity is the same as the dot product.
    """
    return np.dot(np.array(x), np.array(y))

def order_document_sections_by_query_similarity(query: str, contexts: dict[(str, str), np.array]) -> list[(float, (str, str))]:
    """
    Find the query embedding for the supplied query, and compare it against all of the pre-calculated document embeddings
    to find the most relevant sections.

    Return the list of document sections, sorted by relevance in descending order.
    """
    query_embedding = get_embedding(query)

    document_similarities = sorted([
        (vector_similarity(query_embedding, doc_embedding), doc_index) for doc_index, doc_embedding in contexts.items()
    ], reverse=True)

    return document_similarities

In [12]:
order_document_sections_by_query_similarity("In the card game Flesh and Blood, what does the card Ancestral Empowerment do?", document_embedding)[:5]

[(0.8664948512095075, 'Ancestral Empowerment'),
 (0.8032107622899627, 'Emerging Power (Red)'),
 (0.8022488010386051, 'Emerging Power (Blue)'),
 (0.7955817057647661, 'Overpower (Red)'),
 (0.7944881357889741, 'Overpower (Blue)')]

# Add most relevant section to the query prompt

When constructing prompts for the system we can calculate the distance between the prompt and the cards in the knowledge base, and fetch the most relevant cards. By including the nth most relevant context we are able to aid the system in providing more factual answers.

In [13]:
SEPARATOR = "\n\n* "
ENCODING = "gpt2"  # encoding for text-davinci-003

In [14]:
def construct_prompt(question: str, context_embeddings: dict, df: pd.DataFrame) -> str:
    """
    Fetch relevant
    """
    most_relevant_document_sections = order_document_sections_by_query_similarity(question, context_embeddings)

    chosen_sections = []
    chosen_sections_len = 0
    chosen_sections_indexes = []

    # Add two most relevant contexts
    for _, section_index in most_relevant_document_sections[:2]:
        document_section = df.loc[section_index]

        chosen_sections.append(SEPARATOR + document_section.description.replace("\r\n", " "))
        chosen_sections_indexes.append(str(section_index))

    # Useful diagnostic information
    print(f"Selected {len(chosen_sections)} document sections:")
    print("\n".join(chosen_sections_indexes))

    header = """Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."\n\nContext:"""

    return header + "".join(chosen_sections) + "\n\n Q: " + question + "\n A:"

In [15]:
prompt = construct_prompt(
    'In the card game Flesh and Blood, what does the card Ancestral Empowerment do? '
    'Make sure to include information when appropriate for the class, card type, cost, pitch, defence, power, and any abilities.',
    document_embedding,
    df
)

print("===\n", prompt)

Selected 2 document sections:
Ancestral Empowerment
Emerging Power (Blue)
===
 Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

Context:

* Ancestral Empowerment is a 'Ninja – Attack Reaction' card from the 'Welcome to Rathe' set. It costs 0, pitches for 1, defends for 3, has None power, and has the abilities; Target Ninja attack action card gains +1{p}.  Draw a card.

* Emerging Power (Blue) is a 'Guardian Action – Aura' card from the 'Welcome to Rathe' set. It costs 2, pitches for 3, defends for 3, has None power, and has the abilities; **Go again**  At the beginning of your action phase, destroy Emerging Power then the next Guardian attack action card you play this turn gains +1{p}.

 Q: In the card game Flesh and Blood, what does the card Ancestral Empowerment do? Make sure to include information when appropriate for the class, card type, cost, pitch, defence, power, and any abil

# Use the prompt with context

In [35]:
COMPLETIONS_API_PARAMS = {
    # We use temperature of 0.0 because it gives the most predictable, factual answer.
    "temperature": 0.0,
    "max_tokens": 300,
    "model": COMPLETIONS_MODEL,
}

In [72]:
def answer_query(
    query: str,
    dataframe: pd.DataFrame,
    document_embeddings: dict[(str, str), np.array],
    show_prompt: bool = False,
    use_embedding: bool = True
) -> str:

    if use_embedding:
        query = construct_prompt(
            query,
            document_embeddings,
            dataframe
        )

    if show_prompt:
        print(query)

    response = openai.ChatCompletion.create(
        messages=[{"role": "user", "content": query}],
        **COMPLETIONS_API_PARAMS
    )

    return response['choices'][0]['message']['content']

## Examples

### Card within vector database

In [73]:
answer_query(query='In the card game Flesh and Blood, what does the card '
                   'Ancestral Empowerment do? '
                   'Make sure to include information when appropriate for the '
                   'class, card type, cost, pitch, defence, power, and any abilities.',
             dataframe=df,
             document_embeddings=document_embedding,
             use_embedding=False)

"Ancestral Empowerment is a generic action card in Flesh and Blood. It costs 1 resource point to play and has a pitch value of 2. The card has no defence value and does not deal any damage.\n\nWhen played, Ancestral Empowerment allows the player to draw two cards from their deck. If the player's hero is a Runeblade, they may also reveal a Runeblade card from their hand and put it on top of their deck.\n\nThis card is useful for players who want to quickly cycle through their deck and find the cards they need to win the game. It is especially powerful for Runeblade heroes, who can use it to set up their next turn and ensure they draw the cards they need to execute their strategy."

We can see that when the system is not given context it 'hallucinates' plausible sounding information, however, it is unfortunately completely fabricated.

In [74]:
answer_query(query='In the card game Flesh and Blood, what does the card '
                   'Ancestral Empowerment do? '
                   'Make sure to include information when appropriate for the '
                   'class, card type, cost, pitch, defence, power, and any abilities.',
             dataframe=df,
             document_embeddings=document_embedding,
             use_embedding=True)

Selected 2 document sections:
Ancestral Empowerment
Emerging Power (Blue)


"Ancestral Empowerment is a 'Ninja – Attack Reaction' card from the 'Welcome to Rathe' set. It costs 0, pitches for 1, defends for 3, has None power, and has the abilities; Target Ninja attack action card gains +1{p}. Draw a card."

When provided with the relevant context we see that the 'hallucination' problem is greatly mitigated, however, we do see that the model is copying the information within the context verbatim, which is a major limitation of this approach.

### Card not in vector database

In [75]:
answer_query(query='In the card game Flesh and Blood, what does the card '
                   'Underworld Dreams do? '
                   'Make sure to include information when appropriate for the '
                   'class, card type, cost, pitch, defence, power, and any abilities.',
             dataframe=df,
             document_embeddings=document_embedding,
             use_embedding=False)

'Underworld Dreams is a generic card in Flesh and Blood that belongs to the class of Shadow. It is an action card that costs 2 resources to play and requires a pitch of 1 Shadow. The card has no defence value and no power value.\n\nThe ability of Underworld Dreams is as follows: "At the beginning of your end phase, if Underworld Dreams is in your graveyard, you may banish it. If you do, each hero discards a card."\n\nThis ability allows the player to use the card multiple times if it is repeatedly put into the graveyard and then banished. The effect of forcing each hero to discard a card can be a powerful disruption tool, especially if used at the right time to disrupt the opponent\'s strategy.\n\nOverall, Underworld Dreams is a useful card for Shadow decks that want to disrupt their opponent\'s hand and gain an advantage in the game.'

In [76]:
answer_query(query='In the card game Flesh and Blood, what does the card '
                   'Underworld Dreams do? '
                   'Make sure to include information when appropriate for the '
                   'class, card type, cost, pitch, defence, power, and any abilities.',
             dataframe=df,
             document_embeddings=document_embedding,
             use_embedding=True)

Selected 2 document sections:
Sink Below (Red)
Sink Below (Blue)


"I don't know."

We can see the vector similarity at work as logically we can imagine 'Underworld Dreams' as relatively close to 'Sink Below'.

### Natural Language

In [78]:
answer_query(query='In the card game Flesh and Blood, what does the card '
                   'Ancestral Empowerment do? '
                   'Make sure to include information when appropriate for the '
                   'class, card type, cost, pitch, defence, power, and any abilities.'
                   'Do not copy the context verbatim but present the information with natural language.',
             dataframe=df,
             document_embeddings=document_embedding,
             use_embedding=True)

Selected 2 document sections:
Ancestral Empowerment
Emerging Power (Blue)


"Ancestral Empowerment is a card in the Flesh and Blood card game that belongs to the Ninja class and is an Attack Reaction card from the 'Welcome to Rathe' set. It costs 0 to play and can be pitched for 1. It has a defence value of 3 and no power. The card has the ability to give a target Ninja attack action card +1{p} and allows the player to draw a card."

Prompt engineering can be used to prevent model from copying verbatim and report what a card's functionality in a more natural manner. Note that some minor issues are still present with loss of symbolic effects such as {p} representing the power icon.