# RAG

Here we play with [this demo](https://github.com/LangGraph-GUI/LangGraph-learn/tree/main/07%20ollama%20RAG). We will build a Retrieval-Augmented Generation (RAG) system.

## Prerequisites

1. `ollama` server
2. `langchain-ollama`
3. `langchain-core`
4. `chromadb`: [Chroma](https://docs.trychroma.com/docs/overview/introduction) is a vector DB.

In [1]:
from langchain_ollama import ChatOllama
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
import chromadb

In [2]:
llm_name = 'deepseek-r1:7b'
# NOTE: Embedding models manually configured for some LLMs do not work well with embedding search via vector DB like Chroma
manual_embedding = False

## Preprocessing External Files

We will work on one external markdown file for this demo.

In [3]:
def load_file_content(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        return f.read()

In [4]:
file_path = 'files/elden_ring.md'
file_content = load_file_content(file_path)

First, we use a text splitter to slice the file content into text chunks.

In [5]:
# text splitter
# https://python.langchain.com/docs/how_to/recursive_text_splitter/
# https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html
# https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)  # unit: number of chars

In [6]:
# split the document into chunks
chunks = text_splitter.split_text(file_content)

In [7]:
# debug sliced chunks
print([len(chunk) for chunk in chunks])
if len(chunks) >= 2:
    display(chunks[0][-100:])
    print('<SEPARATOR>')
    display(chunks[1][:100])

[365, 442, 488, 485, 388, 399, 435, 326, 316, 363, 285, 257, 296, 332, 414, 420, 477, 329, 261, 430, 392]


'gameplay FromSoftware is renowned for with a vast, open-world experience.\n\n---\n\n## Story and Setting'

<SEPARATOR>


'---\n\n## Story and Setting\n\n**Elden Ring** is set in the mystical world of the Lands Between, where t'

Then we use a embedding model to transform natural language text chunks into the embedding space. We will do that later.

In [8]:
# embedding model
# https://python.langchain.com/api_reference/ollama/embeddings/langchain_ollama.embeddings.OllamaEmbeddings.html
embed_model = OllamaEmbeddings(model=llm_name) if manual_embedding else None

In [9]:
# debug embedding model
if manual_embedding and len(chunks) > 0:
    embedded = embed_model.embed_documents([chunks[0]])
    print(len(embedded), len(embedded[0]))

Now we create a ChromaDB client and initialize a collection to store the embedded chunks.

In [10]:
# Chroma Ephemeral Client (in-memory)
# https://docs.trychroma.com/docs/run-chroma/ephemeral-client
# https://docs.trychroma.com/reference/python/client#client
chroma_client = chromadb.Client()

In [11]:
# Chroma Collection
# https://docs.trychroma.com/docs/collections/create-get-delete
# https://docs.trychroma.com/reference/python/client#createcollection
collection = chroma_client.create_collection(name='docs')

In [12]:
# add embeded chunk to Chroma collection
# https://docs.trychroma.com/docs/collections/add-data
# https://docs.trychroma.com/reference/python/collection#add
collection.add(
    ids=[str(i) for i in range(len(chunks))],
    embeddings=embed_model.embed_documents(chunks) if manual_embedding else None,  # note that embeded is already a list
    documents=chunks
)

## Define the Execution Chain Including LLM

In [13]:
# initialize the ChatOllama model with desired parameters
# https://pypi.org/project/langchain-ollama/
# https://python.langchain.com/docs/integrations/chat/ollama/
# https://python.langchain.com/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html
llm = ChatOllama(model=llm_name)

In [14]:
# define a prompt template
template = """
I am helping the user to maintain a knowledge base - some local files.
I will answer the user's questions strictly based on the content retrieved from the local files.
Current question: {question}
"""
prompt = PromptTemplate.from_template(template)

In [15]:
# define a langchain pipeline - execution chain
llm_chain = prompt | llm | StrOutputParser()

## Specify RAG Logic

Now we demonstrate how RAG system is actually implemented. We define the logic in one single function.

In [16]:
test_question = 'Describe the boss combat strategy of this game?'

In [17]:
# debug embed_query: the embedded vector length should match the one in embed_documents
if manual_embedding:
    test_embedded_query = embed_model.embed_query(test_question)
    print(len(test_embedded_query))

In [18]:
# debug collection.query
# https://docs.trychroma.com/docs/querying-collections/query-and-get
# https://docs.trychroma.com/reference/python/collection#query
# how distances are calculated: https://docs.trychroma.com/docs/collections/configure
test_results = collection.query(
    query_embeddings=[test_embedded_query] if manual_embedding else None,
    query_texts=[test_question] if not manual_embedding else None,
    n_results=3
)
print(test_results.keys())
print('ids:', test_results['ids'])
print('distances:', test_results['distances'])
print(test_results['documents'][-1][0])

dict_keys(['ids', 'embeddings', 'documents', 'uris', 'data', 'metadatas', 'distances', 'included'])
ids: [['9', '8', '12']]
distances: [[1.0183849334716797, 1.0212794542312622, 1.2001222372055054]]
**Boss Fights:**
- **Strategies**: Each boss in Elden Ring has unique mechanics and requires specific strategies to defeat. Learning these patterns and adapting your approach is key to success.
- **Spirit Summons**: Utilizing spirit summons can provide valuable assistance in tough fights, drawing aggro or dealing damage.

---

## Multiplayer and Online Features


In [19]:
# answer questions using RAG
def answer(question: str) -> str:
    # embed query
    if manual_embedding:
        embedded_query = embed_model.embed_query(question)
    # retrieve the most relevant document chunks
    search_res = collection.query(
        query_embeddings=[embedded_query] if manual_embedding else None,
        query_texts=[question] if not manual_embedding else None,
        n_results=3
    )
    search_documents = search_res['documents'][0]
    combined_docs = ''
    for i in range(len(search_documents)):
        combined_docs += f'Chunk {i}:\n{search_documents[i]}'
        if i != len(search_documents) - 1:
            combined_docs += '\n'
    # add RAG results to the initial question
    enhanced_question = f'Relevant text chunks retrieved from the knowledge base:\n{combined_docs}\nAnswer this question: {question}'
    formatted_prompt = prompt.format(question=enhanced_question)
    print(formatted_prompt)
    # execute
    generation = llm_chain.invoke(formatted_prompt)
    return generation

## Test

### Test 0

In [20]:
res = answer('What is Elden Ring?')


I am helping the user to maintain a knowledge base - some local files.
I will answer the user's questions strictly based on the content retrieved from the local files.
Current question: Relevant text chunks retrieved from the knowledge base:
Chunk 0:
---

# Conclusion

Elden Ring stands as a monumental achievement in the action RPG genre. Its vast open world, intricate lore, and challenging gameplay offer an unparalleled experience for fans and newcomers alike. Whether you're delving into the lore, mastering combat mechanics, or exploring every nook and cranny of the Lands Between, Elden Ring provides a rich, rewarding adventure that sets a new standard for the genre.

---
Chunk 1:
# Elden Ring: A Comprehensive Guide

**Elden Ring** is an action RPG developed by FromSoftware and published by Bandai Namco Entertainment. Directed by Hidetaka Miyazaki, in collaboration with George R. R. Martin, the game blends the intricate lore and challenging gameplay FromSoftware is renowned for with 

In [21]:
print(res)

<think>
Okay, so I need to figure out what Elden Ring is based on the provided knowledge base. Let me start by looking through the chunks.

Chunk 0 is titled "Conclusion" about Elden Ring. It says it's a monumental action RPG with a vast open world and intricate lore. The gameplay feels challenging and sets a new standard for the genre, involving exploring different areas and mastering combat mechanics.

Chunk 1 is a Comprehensive Guide that states Elden Ring is an action RPG developed by FromSoftware and published by Bandai Namco Entertainment. It's directed by Hidetaka Miyazaki, in collaboration with George R.R. Martin. This chunk gives the game's developer, publisher, and director along with its collaborative author.

Chunk 2 starts but doesn't finish the section on "Story and Setting." However, from Chunk 3 onwards, it seems to discuss Gameplay Mechanics. The previous answer explains that Elden Ring is an action RPG where players assume roles as Tarnished exiles to restore the Elde

### Test 1

In [22]:
res = answer('Does the game support multiplayer mode?')


I am helping the user to maintain a knowledge base - some local files.
I will answer the user's questions strictly based on the content retrieved from the local files.
Current question: Relevant text chunks retrieved from the knowledge base:
Chunk 0:
---

## Multiplayer and Online Features

**Co-op and PvP:**
- **Co-op**: Players can summon others to help with tough areas or bosses, making the game more manageable.
- **PvP**: Competitive multiplayer allows players to invade others' worlds, adding an element of risk and excitement.
Chunk 1:
**Special Features:**
- **Spirit Ashes and Summons**: Players can summon spirits to assist in combat, each with unique abilities and roles.
- **Mounted Combat**: Fighting on horseback adds a new layer of strategy.
- **Multiplayer**: Both cooperative and competitive multiplayer modes are available, allowing players to interact, fight, or assist each other.

---

## World and Environment
Chunk 2:
**Asynchronous Features:**
- **Bloodstains**: Viewing b

In [23]:
print(res)

<think>
Okay, so I'm trying to figure out whether the game supports multiplayer mode based on the given knowledge base. Let me go through the chunks provided step by step.

First, looking at Chunk 0 titled "Multiplayer and Online Features." It mentions both Co-op and PvP. Co-op allows players to summon others for tough areas or bosses, making it easier. PvP is competitive, letting players invade others' worlds, adding risk and excitement. So that's two multiplayer modes: Co-op and PvP.

Then in Chunk 1 under "Special Features," it again lists Multiplayer along with other features like Spirit Ashes, Mounted Combat, etc. It specifically mentions both cooperative (Co-op) and competitive (PvP) multiplayer modes. So this reinforces the previous point from Chunk 0.

Chunk 2 talks about Asynchronous Features with Messages and Bloodstains but doesn't mention anything specific to multiplayer beyond what's already covered in Chunks 0 and 1.

Putting it all together, both chunks 0 and 1 consisten

### Test 2

In [24]:
res = answer('Who produced the soundtrack for the game?')


I am helping the user to maintain a knowledge base - some local files.
I will answer the user's questions strictly based on the content retrieved from the local files.
Current question: Relevant text chunks retrieved from the knowledge base:
Chunk 0:
---

## Technical and Artistic Aspects

**Graphics and Art Style:**
- **Visuals**: Stunning graphics that blend gothic and high fantasy elements.
- **Art Direction**: A unique art style that enhances the game's atmosphere and storytelling.

**Audio:**
- **Soundtrack**: Composed by Yuka Kitamura and others, the music enhances the mood and intensity of the game.
- **Voice Acting**: High-quality voice acting brings characters and lore to life.

---

## Development and Release
Chunk 1:
---

## Development and Release

**History:**
- **Announcement**: First revealed in 2019, developed by FromSoftware and directed by Hidetaka Miyazaki, with world-building by George R. R. Martin.
- **Release Date**: Released on February 25, 2022, for PlayStation

In [25]:
print(res)

<think>
Alright, I need to figure out who produced the soundtrack for Elden Ring based on the provided knowledge base. Let me go through each chunk step by step.

First, looking at Chunk 0, it's about technical and artistic aspects, including graphics and art style, as well as audio components like the soundtrack and voice acting. Under "Audio: Soundtrack," it mentions that the soundtrack is composed by Yuka Kitamura and others. So that gives a direct answer from the knowledge base.

Then, Chunk 1 discusses development and release history. It doesn't mention anything about the soundtrack here, so I can ignore this chunk for this specific question.

Chunk 2 is titled "# Elden Ring: A Comprehensive Guide." It provides an overview of Elden Ring, mentioning that it's developed by FromSoftware and published by Bandai Namco Entertainment. The guide talks about the game's story, setting, graphics, etc., but again, no mention of the soundtrack here.

Putting this together, the knowledge base e