# Chatbot application using RAG - made by Horváth Botond
> The goal of this notebook is to present a chatbot application, which is based on Retrieval Augmented Generation.

Expectations:
* The chatbot should demonstrate the basic elements of agentic behavior,
autonomous decision making, autonomous decomposition of subtasks and
execution.
* The application must use the Retrieval-Augmented Generation (RAG)
technique to integrate external knowledge sources.
* The solution should include structured documentation of the design
design decisions, architecture used and operational logic.

## Steps
1. Setting up the environment
2. Loading and saving the data
3. Splitting text into chunks
4. Creating embeddings and building vector database
5. Define tools and agent
6. Testing nodes and RAG

### 1. Setting up the environment
* Hugging Face -> models and datasets
* LangChain -> chaining and retrieval 
* FAISS -> vector store
* LangGraph -> agentic logic

In [1]:
# Installing required packages
%pip install -q langchain-community langgraph faiss-cpu huggingface_hub transformers sentence-transformers datasets tiktoken unstructured

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# Installing parser
%pip install mwparserfromhell

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
%pip install "accelerate>=0.26.0"

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
%pip install bitsandbytes

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
# Importing libraries

# Core libraries
import os
import json
import torch
from typing import List

# Hugging Face & Transformers
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_dataset

# LangChain for RAG logic
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document
from langchain.tools.retriever import create_retriever_tool
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from langchain import LLMChain
from langchain.llms import HuggingFacePipeline

# LangGraph for agentic execution
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import create_react_agent

# Utilities
import logging
logging.basicConfig(level=logging.INFO)




### 2. Loading and saving the data
> Loading a legacy dataset from HuggingFace.

> Saving 500 articles

In [6]:
# Loading in streaming mode, because it takes up less disc space
streamed_dataset = load_dataset("wikipedia", "20220301.en", split="train", streaming=True, trust_remote_code=True)

In [7]:
# Getting a sample from the data
sampled_data = []
for i, article in enumerate(streamed_dataset):
    if i >= 500:
        break
    sampled_data.append(article)

print(f"Collected {len(sampled_data)} articles.")

Collected 500 articles.


In [8]:
# Print titles of the first 10 articles
for i, article in enumerate(sampled_data[:10]):
    print(f"{i+1}. {article['title']}")

1. Anarchism
2. Autism
3. Albedo
4. A
5. Alabama
6. Achilles
7. Abraham Lincoln
8. Aristotle
9. An American in Paris
10. Academy Award for Best Production Design


In [9]:
# Viewing one article
sample_article = sampled_data[0]

print(f"Title: {article['title']}")
print(f"URL: {article['url']}\n")
print(article['text'][:300])  # Show only the first 300 characters

Title: Academy Award for Best Production Design
URL: https://en.wikipedia.org/wiki/Academy%20Award%20for%20Best%20Production%20Design

The Academy Award for Best Production Design recognizes achievement for art direction in film. The category's original name was Best Art Direction, but was changed to its current name in 2012 for the 85th Academy Awards. This change resulted from the Art Director's branch of the Academy of Motion Pi


In [10]:
# Saving articles

# Creating a directory
os.makedirs("data", exist_ok=True)

# Save each article
for i, article in enumerate(sampled_data):
    filename = f"{article['title']}.json"
    filepath = os.path.join("data", filename)
    
    with open(filepath, "w", encoding="utf-8") as f:
        json.dump(article, f, indent=2, ensure_ascii=False)

print(f"Saved {len(sampled_data)} articles to 'data'")

Saved 500 articles to 'data'


### 3. Splitting text into chunks
> Converting articles to LangChain documents

> Specifying chunk size and overlap

In [11]:
# LangChain Document objects
documents = [Document(page_content=article["text"], metadata={"title": article["title"]}) for article in sampled_data]

# Use recursive character splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,      # Characters per chunk
    chunk_overlap=50     # Overlap between chunks to preserve context
)

# Split the documents into chunks
chunks = splitter.split_documents(documents)

print(f"Total number of text chunks: {len(chunks)}")
print(f"\nSample chunk (from article: {chunks[0].metadata['title']}):\n")
print(chunks[0].page_content)

Total number of text chunks: 46103

Sample chunk (from article: Anarchism):

Anarchism is a political philosophy and movement that is sceptical of authority and rejects all involuntary, coercive forms of hierarchy. Anarchism calls for the abolition of the state, which it holds to be unnecessary, undesirable, and harmful. As a historically left-wing movement, placed on the farthest left of the political spectrum, it is usually described alongside communalism and libertarian


In [12]:
# Average chunk number in one article
print(f"Average chunks per article: {len(chunks) / len(sampled_data):.2f}")

Average chunks per article: 92.21


### 4. Creating embeddings and building vector database
> Choosing an embedding model

> Build vector database with FAISS

In [13]:
# Load a sentence transformer model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

  embedding_model = HuggingFaceEmbeddings(
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


In [14]:
VECTORSTORE_PATH = "vectorstore/faiss_index"

if os.path.exists(VECTORSTORE_PATH):
    print("Loading existing FAISS vector store...")
    vectorstore = FAISS.load_local(VECTORSTORE_PATH, 
                                   embeddings=embedding_model,
                                   allow_dangerous_deserialization=True)
else:
    print("Building FAISS vector store from chunks...")
    vectorstore = FAISS.from_documents(chunks, embedding_model)
    vectorstore.save_local(VECTORSTORE_PATH)
    print("FAISS vector store created and saved.")

INFO:faiss.loader:Loading faiss with AVX512 support.
INFO:faiss.loader:Could not load library with AVX512 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx512'")
INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
INFO:faiss:Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes. This is only an error if you're trying to use GPU Faiss.


Loading existing FAISS vector store...


### 5. Define tools and agent
> Define search tool

> Define agent with tool

> Define nodes

In [15]:
# Defining tool for searching

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

retriever_tool = create_retriever_tool(
    retriever,
    name="vectorstore_search",
    description="Searches the document store for relevant context."
)

In [19]:
# Defining function for loading generator model
def load_local_model(model_id):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)  # no quantization
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512)
    return HuggingFacePipeline(pipeline=pipe)

In [17]:
pip show accelerate

Name: accelerate
Version: 1.7.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: zach.mueller@huggingface.co
License: Apache
Location: C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [20]:
# Loading generator model
model_id = "tiiuae/falcon-rw-1b"
llm = load_local_model(model_id)

generation_config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

Device set to use cpu
  return HuggingFacePipeline(pipeline=pipe)


In [21]:
tools = [retriever_tool]

In [39]:
# Create agent with tool

from typing import TypedDict
from langchain_core.runnables import RunnableConfig
import re

class AgentState(TypedDict):
    input: str
    output: str
    docs: list  # store documents from retriever
    used_rag: bool

def retrieval_node(state: AgentState) -> AgentState:
    print("[Node] retrieval_node")
    query = state["input"]
    docs = retriever_tool.invoke(query)
    return {
        **state,
        "docs": docs,
        "used_rag": bool(docs) and len(docs) > 1
    }

def rag_generation_node(state: AgentState) -> AgentState:
    print("[Node] rag_generation_node")
    query = state["input"]
    docs = state["docs"]
    context = "\n\n".join(
        [doc.page_content if hasattr(doc, "page_content") else str(doc) for doc in docs]
    )
    prompt = f"Use the following context to answer:\n\n{context}\n"
    response = llm.invoke(prompt)

    # Normalize response
    if isinstance(response, list):
        response = "".join(response)
    elif hasattr(response, "content"):
        response = response.content
    elif isinstance(response, bytes):
        response = response.decode("utf-8")

    response = response.replace("\n", "").strip()
    response = re.sub(r"[^\x20-\x7E]+", "", response)   # Remove non-ASCII

    return {**state, "output": response}


def fallback_generation_node(state: AgentState) -> AgentState:
    print("[Node] fallback_generation_node")
    query = state["input"]
    prompt = f"Answer the following question as best as you can:\n"
    response = llm.invoke(prompt)

    # Normalize response
    if isinstance(response, list):
        response = "".join(response)
    elif hasattr(response, "content"):
        response = response.content
    elif isinstance(response, bytes):
        response = response.decode("utf-8")

    response = response.replace("\n", "").strip()

    return {**state, "output": response}

def route_node(state: AgentState) -> str:
    print("[Router] Deciding next node based on RAG availability...")
    return "rag_generation" if state["used_rag"] else "fallback_generation"

# Build LangGraph
graph = StateGraph(AgentState)

# Add nodes
graph.add_node("retrieval", retrieval_node)
graph.add_node("rag_generation", rag_generation_node)
graph.add_node("fallback_generation", fallback_generation_node)

# Routing
graph.add_conditional_edges("retrieval", route_node, {
    "rag_generation": "rag_generation",
    "fallback_generation": "fallback_generation"
})

# Edges
graph.add_edge("rag_generation", END)
graph.add_edge("fallback_generation", END)

# Set entry point
graph.set_entry_point("retrieval")

# Compile graph
app = graph.compile()

### 6. Testing nodes and RAG

The final operation of the agent. Now it prints the order of nodes, usage of RAG and the answer without special characters. 

In [40]:
result = app.invoke({"input": "Who was Achilles?"})
print("Answer:", result["output"])
print("Used RAG:", result["used_rag"])

[Node] retrieval_node


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[Router] Deciding next node based on RAG availability...
[Node] rag_generation_node
Answer: Use the following context to answer:In Greek mythology, Achilles ( ) or Achilleus () was a hero of the Trojan War, the greatest of all the Greek warriors, and is the central character of Homer's Iliad. He was the son of the Nereid Thetis and Peleus, king of Phthia.According to the Achilleid, written by Statius in the 1st century AD, and to non-surviving previous sources, when Achilles was born Thetis tried to make him immortal by dipping him in the river Styx; however, he was left vulnerable at the part of the body by which she held him: his left heel (see Achilles' heel, Achilles' tendon). It is not clear if this version of events was known earlier. InAchilles is the subject of the poem Achilleis (1799), a fragment by Johann Wolfgang von Goethe. In 1899, the Polish playwright, painter and poet Stanisaw Wyspiaski published a national drama, based on Polish history, named Achilles. In 1921, Edwar

Earlier version of the agent - this is the reason, why I had to remove non ASCII characters.

In [31]:
result = app.invoke({"input": "How much is 5+5?"})
print("Answer:", result["output"])
print("Used RAG:", result["used_rag"])

[Node] retrieval_node
[Router] Deciding next node based on RAG availability...
[Node] rag_generation_node


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Answer: Use the following context to answer:denote fives –five units, five tens, etc., essentially in a bi-quinary coded decimal system, related to the Roman numerals. The short grooves on the right may have been used for marking Roman "ounces" (i.e. fractions).earn 5 sacks (200 kg or 400 lb) of grain per month, while a foreman might earn 7 sacks (250 kg or 550 lb). Prices were fixed across the country and recorded in lists to facilitate trading; for example a shirt cost five copper deben, while a cow cost 140deben. Grain could be traded for other goods, according to the fixed price list. During the fifth centuryBC coined money was introduced into Egyptwas 5 per one million, which included children and adults.tracking and data acquisition network, added an additional $5.2 billion ($ adjusted).($))()()())())))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))