# TTRPG Chatbot

Really this bot can easily be made into anything by changing the prompt slightly.  It works with pinecone to keep track of the conversation as it happens and retrieve the most useful bits of the conversation for the bot.  Note that I am saving a good amount (including the vector) to the json logs saved in the logs directory.

Note that Chroma is running in memory.  Anytime that you restart the chatbot it is not expected to persist.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
import time
import datetime
import json
import re
from tqdm.notebook import tqdm

from PyPDF2 import PdfReader
import pinecone
import openai
import chromadb
from chromadb.config import Settings

openai_api_key = os.getenv("OPENAI_API_KEY")
pinecone_api_key = os.getenv("PINECONE_API_KEY")

embedding_dimensions = 1536
model_engine = "text-embedding-ada-002"
pinecone_name = "dnd-rules-lawyer"
pinecone_region = "asia-southeast1-gcp" # Pinecone calls this the environement ? strange

pinecone.init(api_key=pinecone_api_key, environment=pinecone_region)
index = pinecone.Index(pinecone_name)
openai.api_key = openai_api_key

pdf_directory = "./pdfs/Rules"
bestiary_directory = "./pdfs/Bestiary"

persistant_directory = "./chroma"
client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory=persistant_directory))
collection_name = "dnd_documents"
collection = None

  from tqdm.autonotebook import tqdm
Using embedded DuckDB with persistence: data will be stored in: ./chroma


Below are the helper functions for using the chatbot.

Beware the commented code.  It will delete your vector database.

# Chroma Helper functions

Beware the commented code.  It will delete your collection.

In [2]:
#client.delete_collection(collection_name)

In [3]:
def get_file_list(directory_path):
    return [f for f in os.listdir(directory_path) if f.endswith(".pdf")]

In [4]:
def build_key(page_index, sentence_index, prefix=None):
    if prefix is not None:
        return prefix + "_p" + str(page_index) + "_s" + str(sentence_index)
    else:
        return "p" + str(page_index) + "_s" + str(sentence_index)

In [5]:
def tokenize_page_to_sentences(page, page_index, line_min_length=5, name_in_key=None):
    sentences_tuples = list()
    page_text = page.extract_text()
    lines = page_text.splitlines()
    page_text = "\n".join([line for line in lines if len(line.split()) > line_min_length])  # only take lines that are larger than k
    page_text = re.sub(r'E L T[\W\w]*\n', ' ', page_text)  # E L T pattern removed.  used in the pdf for tables
    page_text = re.sub(r'[\n]|[  ]|[• ]', ' ', page_text)  # Remove line breaks, double spaces, and dots
    page_text = re.sub(r'(?!\.)  (?!\.)', ' ', page_text)  # Remove double spaces in between letters
    sentences = page_text.split('.')  # break into sentences
    sentences = [sentence.strip() + "." for sentence in sentences if len(sentence) > 1]  # re-add "." to the end of the sentence
    for sentence_index, sentence in enumerate(sentences):
        sentences_tuples.append((build_key(page_index, sentence_index, name_in_key), sentence, page_index, sentence_index))
    return sentences_tuples

In [6]:
def pdf_to_document_tuples(file_path, line_min_length=5, name_in_key=None):
    sentences_tuples = []
    
    pdf_reader = PdfReader(file_path)
    pages = pdf_reader.pages
    for page_index, page in tqdm(enumerate(pages)):
        new_tuples = tokenize_page_to_sentences(page, page_index, line_min_length, name_in_key=name_in_key)
        for t in new_tuples:
            sentences_tuples.append(t)
        
    return sentences_tuples

In [7]:
def get_column_from_tuples(tuples_list, column_index):
    return [tuples[column_index] for tuples in tuples_list]

In [8]:
def vectorize_pdfs_in_directory_to_chroma(directory_path, collection):
    files = get_file_list(directory_path)
    
    for file in tqdm(files):
        # Get all documents in directory, tokenize them into sentences for embedding
        document_tuples = pdf_to_document_tuples(os.path.join(directory_path,file), name_in_key=file)
        # I pass a tuple for the extra metadata.  This used to be in the id of the call and required additional parsing after query.
        document_keys = get_column_from_tuples(document_tuples, 0)
        document_sentences = get_column_from_tuples(document_tuples, 1)
        document_page_index = get_column_from_tuples(document_tuples, 2)
        document_sentence_index = get_column_from_tuples(document_tuples, 3)
        # Create metadata
        metadata = [{"file_name":file, "page_index":document_page_index[i], "sentence_index":document_sentence_index[i]} for i in range(len(document_page_index))]
        # Add to chroma collection
        collection.add(documents=document_sentences, metadatas=metadata, ids=document_keys)

# Embedding to chroma.

In [10]:
# Take all documents in a directory and add it to the chroma vector database
collection = client.get_or_create_collection(collection_name)
vectorize_pdfs_in_directory_to_chroma(pdf_directory, collection)

No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction


  0%|          | 0/4 [00:00<?, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

# Chatbot Helper Functions

In [12]:
#results = index.query(vector=[0 for i in range(1536)], top_k=1000)
#print(results)
#index.delete([x["id"] for x in results["matches"]])

In [13]:
def create_metadata(content, user, vector):
    create_time = time.time()
    return {
        "Timestamp": create_time,
        "User": "User",
        "Message": content,
        "Vector": vector
    }

In [14]:
def create_message_hash(content):
    return str(hash(content))

In [15]:
def get_openai_embeddings(content, engine="text-embedding-ada-002"):
    content = content.encode(encoding="ASCII", errors="ignore").decode()  # fix unicode errors
    response = openai.Embedding.create(input=content, engine=engine)
    vector = response['data'][0]['embedding']
    return vector

In [16]:
def openai_completion(prompt, model="text-davinci-003", temperature=0, top_p=1.0, max_tokens=400, freq_pen=0.0, pres_pen=0.0, max_retry=5):
        retry = 0
        prompt = prompt.encode(encoding="ASCII", errors="ignore").decode()
        while True:
            try:
                response = openai.Completion.create(
                    model=model,
                    prompt=prompt,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    top_p=top_p,
                    frequency_penalty=freq_pen,
                    presence_penalty=pres_pen
                )
                text = response["choices"][0]["text"].strip()
                text = re.sub(r'[\r\n]+', "\n", text)
                text = re.sub(r'[\t]+', " ", text)
                return text
                    
            except Exception as e:
                retry += 1
                if retry >= max_retry:
                    return f"GPT3 error: {e}"
                print("Error in communication with openai.")
                time.sleep(1)

In [17]:
def load_json(filepath):
    with open(filepath, "r", encoding="utf-8") as open_file:
        return json.load(open_file)

In [18]:
def load_conversation(results):
    result = list()
    for m in results["matches"]:
        if m['id'] == "GPT3 error: Unrecognized request argument supplied: stops": # For some reason pinecone will not remove this from the database
            continue
        info = load_json(f'./logs/{m["id"]}.json')
        result.append(info)
    ordered = sorted(result, key=lambda d: d['Timestamp'], reverse=False)
    messages = [i["Message"] for i in ordered]
    return "\n".join(messages).strip()

In [19]:
def get_prompt(file_name):
    with open(file_name) as open_file:
        return open_file.read()

In [20]:
def save_json(filepath, content):
    with open(filepath + ".json", "w+") as f:
        json.dump(content, f)

In [21]:
def build_key(page_index, sentence_index, prefix=None):
    if prefix is not None:
        return prefix + "_p" + str(page_index) + "_s" + str(sentence_index)
    else:
        return "p" + str(page_index) + "_s" + str(sentence_index)

In [22]:
def load_page(directory_path, file_name, page_number):
    pdf_reader = PdfReader(os.path.join(directory_path, file_path))
    return pdf_reader.pages[page_number]

In [29]:
def get_close_sentences(directory_path, file_name, page_number, sentence_number, buffer_size=1):
    pdf_reader = PdfReader(os.path.join(directory_path, file_name))
    page = pdf_reader.pages[page_number]
    sentences_tuples = tokenize_page_to_sentences(page, page_number)
    
    if sentence_number + buffer_size + 1 >= len(sentences_tuples):
        next_page = pdf_reader.pages[page_number + 1]
        next_sentence_tuples = tokenize_page_to_sentences(next_page, page_number + 1)
        for index in range(buffer_size):
            sentences_tuples.append(next_sentence_tuples[index])
            
    if sentence_number - buffer_size <= 0:
        previous_page = pdf_reader.pages[page_number - 1]
        previous_sentence_tuples = tokenize_page_to_sentences(previous_page, page_number - 1)
        for index in range(buffer_size):
            sentences_tuples.insert(0, previous_sentence_tuples[len(previous_sentence_tuples) - buffer_size - 1 + index])
    
    return sentences_tuples[sentence_number-buffer_size:sentence_number+buffer_size+1]

In [24]:
def gather_close_context(directory_path, collection_query):
    close_context = list()
    for index in range(len(collection_query["ids"][0])):
        sentence = collection_query["documents"][0][index]
        metadata = collection_query["metadatas"][0][index]
        file_name = collection_query["metadatas"][0][index]["file_name"]
        page_number = collection_query["metadatas"][0][index]["page_index"]
        sentence_number = collection_query["metadatas"][0][index]["sentence_index"]
        close_sentences = get_close_sentences(directory_path, file_name, page_number, sentence_number, buffer_size=1)
        close_sentences = "  ".join([sentence[1] for sentence in close_sentences])
        close_context.append(close_sentences)
    return "\n".join(close_context)

# Prompting

Here is the interesting part of the bot.  This prompt below can be changed to fit your needs and used to create chatbots that will answer specific questions.

In [25]:
get_prompt("prompt.txt")

'The Rule Lich: I am a chatbot named The Rule Lich.  My goals are to help you make the right choice in the Table Top Role-Playing games.  I will read the recent messages and then I will provide a long, detailed answer, using the additional information from the books.\n\nADDITIONAL BOOK INFORMATION:\n\n<ADDITIONAL_INFORMATION>\n\nPREVIOUS CONVERSATION:\n\n<PREVIOUS_CONVERSATION>\n\nUSER: <MESSAGE>\n\n\n\n'

# Running the bot

In [30]:
user = "USER"
context_top_k = 3
top_k = 15
prompt_file = "prompt.txt"
logs_path = "./logs/"
pdf_directory = "./pdfs/Rules"

print("Welcome to the TTRPG chatbot.  Ask a ruling here!")
while True:
    payload = list()
    
    # Message Meta
    message = input("\n\n USER: ")
    message_hash = create_message_hash(message)

    # Create embedding of new message
    message_vector = get_openai_embeddings(message)
    
    # Save metadata about vector
    metadata = create_metadata(message, user, message_vector)
    save_json(logs_path + str(message_hash), metadata)
    
    # Append to payload for later indexing
    # Send to Pinecone after gpt message
    payload.append((message_hash, message_vector))
    
    # Query Chroma for additional rules from the books of the game
    if collection is not None:
        collection_query = collection.query(query_texts=[message], n_results=context_top_k)
        close_context = gather_close_context(pdf_directory, collection_query)
    else:
        close_context = ""
    
    # Query Pinecone for additional information from your messages or the chatbots
    results = index.query(vector=message_vector, top_k=top_k)
    conversation = load_conversation(results)
    prompt = get_prompt(prompt_file).replace("<PREVIOUS_CONVERSATION>", conversation).replace("<MESSAGE>", message).replace("<ADDITIONAL_INFORMATION>", close_context)
    # print(prompt)
    
    # Generate the response from the large lang model
    output = openai_completion(prompt)
    output_hash = create_message_hash(output)
    
    # Embed the output
    output_vector = get_openai_embeddings(output)

    # Save the output metadata
    metadata = create_metadata(output, "The Rule Lich", output_vector)
    save_json(logs_path + str(output_hash), metadata)
    
    # Append to the payload the response from gpt
    payload.append((output_hash, output_vector))
    
    # Upsert to the pinecone database
    index.upsert(payload)
    
    # Print responce to the message
    print(f'\n {output}')
    

Welcome to the TTRPG chatbot.  Ask a ruling here!




 USER:  I am playing pathifnder 2e, does that change anything?


The Rule Lich: I am a chatbot named The Rule Lich.  My goals are to help you make the right choice in the Table Top Role-Playing games.  I will read the recent messages and then I will provide a long, detailed answer, using the additional information from the books.

ADDITIONAL BOOK INFORMATION:

Or perhaps you’re a brand new GM and looking for guidance to feel comfortable leading a game of your own.  Maybe you’ve been a Game Master for years, but this is your first time running a Pathfinder game.  No matter where you are as a Game Master, this book is a valuable tool that can help you tell the stories you want to tell with your players.
The rules for conditions are summarized on page 454 and described in full on pages 618–623.  Anything you do in the game has an effect.  Many of these outcomes are easy to adjudicate during the game.
The role of Game Master comes with the responsibility of ensuring you and the rest of the players have a rewarding, fun time during the game.  Games can d



 USER:  what about if I was playing in d20 moderen


The Rule Lich: I am a chatbot named The Rule Lich.  My goals are to help you make the right choice in the Table Top Role-Playing games.  I will read the recent messages and then I will provide a long, detailed answer, using the additional information from the books.

ADDITIONAL BOOK INFORMATION:

457–458 d4, d6, d8, d10, d12, d20, and d% Notations for different sizes of dice.  “d20” is a daily preparations During your morning preparations, you ready your gear, prepare spells, and otherwise get ready for your adventuring day.  480, 500 damage Damage dealt to a creature reduces that creature’s Hit Points on a 1-to-1 basis.
Add up all the various modifiers, bonuses, and penalties you identified in Step 1—this is your total modifier.  Next add that to the number that came up on your d20 roll.  This total is your check result.
You might use either the Society skill or a Lore skill you have that’s relevant to the task, and the DC depends on how common the knowledge of the cousin’s name might



 USER:  ok.  do you know anything about the fallout ttrpg?


The Rule Lich: I am a chatbot named The Rule Lich.  My goals are to help you make the right choice in the Table Top Role-Playing games.  I will read the recent messages and then I will provide a long, detailed answer, using the additional information from the books.

ADDITIONAL BOOK INFORMATION:

While raging, the badger is affected in the following ways.  It deals 4 additional damage with its bite a ttacks and 2 additional damage with its claw attacks.  It takes a – 1 penalty to AC.
Since then, individual pieces have turned up throughout the multiverse.  When you pick up a shot of the First Vault , it immediately reshapes itself to function with any ranged weapon and establishes you as its owner until another creature picks it up.  As its owner, you can use the shot’s  single ‑action activation after shooting it.
Activate [three-actions] Interact; Requirements The shot is loaded in your ranged weapon, or at hand if your ranged weapon has a reload of 0; Effect You line up a perfectly a

KeyboardInterrupt: Interrupted by user