# Interactive Introduction to ML and AI with a RAG-System

Based on a PDF containing a starter set of DND 5e character [sheets](https://dnd5echaractersheet.com/)


## sys admin

Create a .env file with the following content:

`OPENAI_API_KEY = "^<API_KEY>"`

In [35]:
# required libraries for the tutorial
import openai
import os
from dotenv import load_dotenv
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

## Load PDF data
Loads the data and splits it into chunks.
Each chunk contains 1000 characters max with a max overlap of 100 characters.

In [36]:
# 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
loader = PyPDFLoader("data/StarterSet_Charactersv2.pdf")
chunks = loader.load_and_split(text_splitter)

### Check the chunks
get chunk content with: chunks[index].page_content

In [37]:
# print(chunks[0])
# print("The chunk contains " + chunks[index].page_content + " characters")

## Setup models

We need to prepare an embedding model to vectorise our chunks before storing them into our ChromaDB and a language model to generate answers to our questions.

In [38]:

# Load environment variables from .env file
load_dotenv()

# Access the API key using the variable name defined in the .env file
openai.api_key = os.getenv("OPENAI_API_KEY")

# Initialize the OpenAI chat model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.8)

# initialize the OpenAI embeddings model
embeddings = OpenAIEmbeddings()

### Load / Create Chroma DB

We check for the existence of the directory for 2 reasons:
1) We use Openai Embeddings and pay for the embedding generation
2) Chroma does not overwrite an existing database, but allows to upate it

In [39]:
if os.path.exists("chroma"):
    print("Loading Chroma from disk...")
    Chroma(persist_directory="chroma", embedding_function=embeddings)
else:
    chroma_db = Chroma.from_documents(documents=chunks,
                                    embedding=embeddings,
                                    persist_directory="chroma",
                                    collection_name="lc_chroma_demo")

### Test Your Database

In [40]:
query = "What is this document about?"

Simple Similarity Search

In [41]:
result = chroma_db.similarity_search(query)
print(result)

[Document(page_content='Halfling\xadrogue\xad(criminal),\xadpage\xad2\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 5, 'source': 'data/StarterSet_Charactersv2.pdf'}), Document(page_content='fine\xadclothes,\xadsignet\xadring,\xadscroll \xad \nof pedigree\n*While wearing this armor, you \nhave disadvantage on Dexterity (Stealth)\xadchecks.Lawful neutral\nHuman\xadfighter\xad(noble),\xadpage\xad1\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 0, 'source': 'data/StarterSet_Charactersv2.pdf'}), Document(page_content='delivered in your trances, your god has called you to a new mission. A goblin tribe has made its lair in an ancient ruin now called Cragmaw Castle, where they have defiled a shrine once 

Similarity Search with Scores

In [42]:
result_with_scores = chroma_db.similarity_search_with_score(query)
print(result_with_scores)

[(Document(page_content='Halfling\xadrogue\xad(criminal),\xadpage\xad2\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 5, 'source': 'data/StarterSet_Charactersv2.pdf'}), 0.49335355180071494), (Document(page_content='fine\xadclothes,\xadsignet\xadring,\xadscroll \xad \nof pedigree\n*While wearing this armor, you \nhave disadvantage on Dexterity (Stealth)\xadchecks.Lawful neutral\nHuman\xadfighter\xad(noble),\xadpage\xad1\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 0, 'source': 'data/StarterSet_Charactersv2.pdf'}), 0.5097083813913008), (Document(page_content='delivered in your trances, your god has called you to a new mission. A goblin tribe has made its lair in an ancient ruin now called Cragmaw 

In [43]:
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=chroma_db.as_retriever())

In [44]:
response = chain.invoke(query)
print(response)

{'query': 'What is this document about?', 'result': 'The document you provided appears to be a character sheet for a halfling rogue (criminal) and a human fighter (noble) in a Dungeons & Dragons game. It includes details about their backgrounds, equipment, alignments, and personal goals.'}


### Test some queries Yourself

In [45]:
def get_response(query:str):
    ## add the functionality to combine the functionalities above.

SyntaxError: incomplete input (1678908652.py, line 2)