# Retrieval Augmented Generation (RAG)

RAG enhances an LLM by retrieving relevant documents (external data) before generating a response. This allows the model to ground its answers in factual information.

In [1]:
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
# Initialize Ollama embeddings
ollama_embeddings = OllamaEmbeddings(model="qwen3:4b")

persist_directory="db/"
pdf_folder_path = "files/"

all_chunks = []

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

for fn in os.listdir(pdf_folder_path):
    if not fn.endswith(".pdf"):
        continue
    print(f"Processing {fn}...")
    path_file = os.path.join(pdf_folder_path, fn)

    # Load the PDF document
    loader = PyPDFLoader(path_file)
    documents = loader.load()

    # Split the documents into smaller chunks for better retrieval
    chunks = text_splitter.split_documents(documents)
    all_chunks.extend(chunks)

# Create a Chroma database from the document chunks

print("Creating Chroma database...")

db = Chroma.from_documents(all_chunks, embedding=ollama_embeddings, persist_directory = persist_directory)


Processing daniel2018cairnform.pdf...
Processing doudna2014crispr.pdf...
Creating Chroma database...


In [9]:
PROMPT0 = "What is the CairnFORM prototype?"
results = db.similarity_search(PROMPT0)
print("Search results:")
for result in results:
    print(f"\n\n{result.page_content}")

Search results:


shared practices by displaying energy data in collective and
public spaces, such as public places and workplaces. It is 360˚-
readable, and as a dynamic physical ring chart, it can change
its cylindrical symmetry with quiet motion. W e conducted two
user studies. The ﬁrst study clearly revealed the attractiveness
of CairnFORM in a public place and its usability for a range
task and for a compare task. Consequently , this makes
CairnFORM useful to analyze renewable energy availability .


emmanuelle.charpentier@helmholtz-hzi.de (E.C.)


the two following exercises. These two exercises aim at
catching the ring’s motion with the peripheral vision, but with
different attention levels: one is with the focus of attention
(i.e., focusing attention on the detection task) and the other
7 The mid-peripheral vision—covering the region from 30˚ to 60˚—
deﬁnes the limit of the upwards peripheral vision [13].
8 Repeated measures with three apps on a smartphone and a tablet.


88-cm

# Define Environment Variables

In [None]:
OLLAMA_BASE_URL = "http://localhost:11434"
LLM_MODEL = "qwen3:4b"
LLM_SEED = 42 
LLM_TEMPERATURE = 0.0
TEST_PROMPT0 = "What time is it?"
TEST_PROMPT1 = "What is the price of gold right now?"
TEST_PROMPT3 = "How many 1 in 111111111111111?"

# Initialize Ollama Chatbot

In [12]:
from langchain_ollama import ChatOllama

# minimize randomness for reproducibility
seed = 42
temperature = 0

# Set up the Ollama chat model with specified LLM model and parameters
llm = ChatOllama(
    base_url=OLLAMA_BASE_URL,
    model=LLM_MODEL,
    temperature=LLM_TEMPERATURE,
    seed=LLM_SEED,
    stream=True
)

In [16]:
from typing import List
from typing_extensions import Annotated, TypedDict

from langchain_core.tools import tool

@tool
def retrieve(query: str):
    """Retrieve information related to a query."""
    retrieved_docs = db.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized

tools = [retrieve]

In [17]:
from langchain_core.runnables import RunnableLambda
from inspect import signature

tools_names = ", ".join([tool.name for tool in tools])
tools_descriptions = "\n".join([f"{tool.name}{signature(tool.func)} - {tool.description}" for tool in tools]) 


# Taken from https://smith.langchain.com/hub/hwchase17/react-json
instructions = f"""Answer the following questions as best you can.
You can answer directly if the user is greeting you or similar.
Otherwise, you have access to the following tools:

{tools_descriptions}

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are: {tools_names}

The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:

```
{{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}}
```

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action:
```
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Reminder to always use the exact characters `Final Answer:` when responding.
Always make your final answer easy to read and understand for humans.
"""

print(instructions)

Answer the following questions as best you can.
You can answer directly if the user is greeting you or similar.
Otherwise, you have access to the following tools:

retrieve(query: str) - Retrieve information related to a query.

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are: retrieve

The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:

```
{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}
```

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action:
```
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final A

In [18]:
from typing import List
from langchain.schema import BaseMessage, AIMessage, HumanMessage, SystemMessage

class Chatbot:
    llm: ChatOllama
    history: List[BaseMessage]

    def __init__(self, llm: ChatOllama, history: List[BaseMessage] = []):
        """Initialize the chatbot with an LLM and an optional history."""
        self.llm = llm
        self.history = history
    
    def invoke(self, prompt:str) -> None:
        """Run the chatbot with the current history."""
        clear_output(wait=True)
        self.pretty_print()
        human_message = HumanMessage(content=prompt)
        human_message.pretty_print()
        self.history.append(human_message)
        ai_message = AIMessage(content="")
        ai_message.pretty_print()
        print()
        for chunk in self.llm.stream(self.history):
            print(chunk.content, end="")
            ai_message.content += chunk.content
        print()
        self.history.append(ai_message)
        
    
    def interact(self) -> None:
        """Run the chatbot in interactive mode."""
        while True: 
            prompt = input("Prompt (Enter 'stop' to exit)")
            if prompt == "stop": 
                break
            self.invoke(prompt)


    def pretty_print(self) -> None:
        """Pretty print the chatbot's history."""
        for message in self.history:
            message.pretty_print()

In [21]:
from langchain.schema import BaseMessage, AIMessage, HumanMessage, SystemMessage
import re
from typing import Union
from langchain_core.agents import AgentAction, AgentFinish, AgentActionMessageLog
from langchain_core.exceptions import OutputParserException
from langchain.agents.agent import AgentOutputParser
from langchain.agents.chat.prompt import FORMAT_INSTRUCTIONS
import ast
from IPython.display import clear_output, display
from langchain_core.messages import ToolMessage
import traceback
import sys
from ollama import Tool

class Agent(Chatbot):

    tools: List[Tool]
    
    def __init__(self, llm: ChatOllama, tools: List[Tool], history: List[BaseMessage] = []):
        """Initialize the chatbot with an LLM, tools, and an optional history."""
        super().__init__(llm, history)
        self.tools = tools

    def invoke(self, prompt:str) -> None:
        """Run the chatbot in interactive mode."""
        self.history.append(HumanMessage(content=prompt))
        clear_output(wait=True)
        self.pretty_print()
        stop = False
        while not stop:
            ai_pre_action_message = self.llm.invoke(self.history)
            self.history.append(ai_pre_action_message)
            try:
                action = self.parse_action(ai_pre_action_message.content)
                if isinstance(action, AgentAction):
                    tool_message = self.call_tool(action)
                    self.history.append(tool_message)
                if isinstance(action, AgentFinish):
                    stop = True
            except SyntaxError as e:
                self.history.append(SystemMessage(content=str(e)))
            except Exception as e:
                traceback.print_exc()
                sys.exit()
            clear_output(wait=True)
            self.pretty_print()


    def call_tool(self, action: AgentAction) -> ToolMessage:
        """Call the specified tool with the given action input."""
        tool = next((t for t in self.tools if t.name == action.tool), None)
        if not tool:
            return ToolMessage(content=f"Error: Tool '{action.tool}' does not exist.", tool_call_id="unknown_tool")
        result = None
        sig = signature(tool.func)
        if len(sig.parameters):
            action_input = action.tool_input
            if not isinstance(action_input, dict):
                param_name = next(iter(sig.parameters))
                action_input = {param_name: action_input}
            result = tool.func(**action_input)
        else:
            result = tool.func()
        return ToolMessage(content=f"{result}", tool_call_id=tool.func.__name__)
    
    
    def parse_action(self, text:str) -> Union[AgentAction, AgentActionMessageLog, AgentFinish]:
        """Parse the action from the LLM output text and return an AgentAction or AgentFinish object."""
        FINAL_ANSWER_ACTION = "Final Answer:"
        pattern = re.compile(r"^.*?`{3}(?:json)?\n?(.*?)`{3}.*?$", re.DOTALL)
        includes_answer = FINAL_ANSWER_ACTION in text
        try:
            found = pattern.search(text)
            if not found:
                raise ValueError("action not found.")
            action = found.group(1)
            response = ast.literal_eval(action.strip())
            return AgentAction(response["action"], response.get("action_input", {}), text)
        except Exception as e:
            if not includes_answer:
                raise SyntaxError("Reminder to always use the exact characters `Final Answer:` when responding.")
            output = text.split(FINAL_ANSWER_ACTION)[-1].strip()
            return AgentFinish({"output": output}, text)


In [22]:
# Create a chat history with a system message and a human message
binded_llm = llm.bind(stop=["Observation:"])
history = [SystemMessage(content=instructions)]
agent = Agent(llm=binded_llm, tools=tools, history=history)
agent.invoke(PROMPT0)


Answer the following questions as best you can.
You can answer directly if the user is greeting you or similar.
Otherwise, you have access to the following tools:

retrieve(query: str) - Retrieve information related to a query.

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are: retrieve

The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:

```
{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}
```

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action:
```
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final 

# Retrieval Integrated Generation (RIG)
Retrieval Integrated Generation (RIG) is a technique that combines the strengths of retrieval-based and generation-based approaches to improve the quality and relevance of generated responses. It involves retrieving relevant information from a knowledge base or database only when the model asks for it and then using that information to generate more accurate and contextually appropriate responses.