# Ollama RAG Chatbot 1
Retrieval-augmented generation is a technique that allows Large Language Models (LLMs) to retrieve and incorporate new information.  

## Project Objective
The objective of this project is to develop a Retrieval-Augmented Generation (RAG) chatbot using LlamaIndex to assist employees in accessing accurate and timely information about Capital One’s 2025 benefits offerings. The chatbot will leverage a Large Language Model (LLM) enhanced by an external knowledge base, the 2025 Capital One Benefits PDF document, to retrieve and generate relevant answers to employee inquiries.

## Reference
- Local-LLM-Chatbot-using-Ollama-and-Langchain, M-A-S1
- Reading Documents Directly from Llama Index as Files Instead of Specifying a Folder Path, Stackoverflow 78338438
- Claude: Llamaindex RAG
- Build a chatbot with customer data sources, powered by Llamaindex, Caroline Frasca, Krista Muir, and Yi Ding

## Setup

In [None]:
# Install llama-index-llms-ollama
!pip install llama-index-llms-ollama

Collecting llama-index-llms-ollama
  Downloading llama_index_llms_ollama-0.6.2-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-core<0.13,>=0.12.4 (from llama-index-llms-ollama)
  Downloading llama_index_core-0.12.49-py3-none-any.whl.metadata (2.5 kB)
Collecting ollama>=0.5.1 (from llama-index-llms-ollama)
  Downloading ollama-0.5.1-py3-none-any.whl.metadata (4.3 kB)
Collecting aiosqlite (from llama-index-core<0.13,>=0.12.4->llama-index-llms-ollama)
  Downloading aiosqlite-0.21.0-py3-none-any.whl.metadata (4.3 kB)
Collecting banks<3,>=2.0.0 (from llama-index-core<0.13,>=0.12.4->llama-index-llms-ollama)
  Downloading banks-2.1.3-py3-none-any.whl.metadata (12 kB)
Collecting dataclasses-json (from llama-index-core<0.13,>=0.12.4->llama-index-llms-ollama)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.13,>=0.12.4->llama-index-llms-ollama)
  Downloading Deprecated-1.2.18-py2.py3-none-any.whl.metadata (5.

In [None]:
# Install Ollama
!sudo apt update
!sudo apt install -y pciutils
!curl -fsSL https://ollama.com/install.sh | sh

[33m0% [Working][0m            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
[33m0% [Waiting for headers] [Waiting for headers] [Connected to cloud.r-project.or[0m                                                                               Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
                                                                               Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
[33m0% [2 InRelease 116 kB/128 kB 90%] [Connected to cloud.r-project.org (65.9.86.1[0m[33m0% [Waiting for headers] [Connected to cloud.r-project.org (65.9.86.12)] [Conne[0m                                                                               Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
[33m0% [4 InRelease 2,113 B/127 kB 2%] [Waiting for headers] [Connecting to r2u.sta[0m                                                                               Get:5 htt

In [None]:

# Running Ollama

import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

In [None]:
# Pulling LLM

!ollama pull llama3.2

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G

In [None]:
# Pull embedding model
!ollama pull nomic-embed-text

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A

In [None]:
# List available models
!ollama list

NAME                       ID              SIZE      MODIFIED       
nomic-embed-text:latest    0a109f422b47    274 MB    2 seconds ago     
llama3.2:latest            a80c4f17acd5    2.0 GB    11 minutes ago    


In [None]:
# load Ollama
from llama_index.llms.ollama import Ollama

In [None]:
llm = Ollama(
    model="llama3.2",
    request_timeout=300.0,
    # Manually set the context window to limit memory usage
    context_window=8000,
)

## Test LLM

In [None]:
# Test Model
resp = llm.complete("How many oceans are there?")

In [None]:
# Print Reply
print(resp)

There is one global ocean, which is often referred to as the "Pacific Ocean" or simply "the ocean." However, it's generally considered to be divided into five distinct bodies of water:

1. Pacific Ocean
2. Atlantic Ocean
3. Indian Ocean
4. Arctic Ocean
5. Southern Ocean (also known as the Antarctic Ocean)

The International Hydrographic Organization (IHO) recognized the Southern Ocean as a separate ocean in 2000, defining it as the waters surrounding Antarctica and extending north to the coast of South America, Africa, and Australia.

So, while there's only one global ocean, it's often divided into these five distinct bodies of water for geographical, scientific, and practical purposes.


## Prep for RAG

In [None]:
# install llama-index, llama-index-reader-web. cohere, jedi
!pip install llama-index llama-index-readers-web cohere jedi

Collecting llama-index
  Downloading llama_index-0.12.49-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-readers-web
  Downloading llama_index_readers_web-0.4.4-py3-none-any.whl.metadata (1.2 kB)
Collecting cohere
  Downloading cohere-5.16.1-py3-none-any.whl.metadata (3.4 kB)
Collecting jedi
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting llama-index-agent-openai<0.5,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.12-py3-none-any.whl.metadata (439 bytes)
Collecting llama-index-cli<0.5,>=0.4.2 (from llama-index)
  Downloading llama_index_cli-0.4.4-py3-none-any.whl.metadata (1.4 kB)
Collecting llama-index-embeddings-openai<0.4,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.7.10-py3-none-any.whl.metadata (3.3 kB)
Collecting llama-inde

In [None]:
# Install llama-index-embeddings-ollama
!pip install llama-index-embeddings-ollama

Collecting llama-index-embeddings-ollama
  Downloading llama_index_embeddings_ollama-0.6.0-py3-none-any.whl.metadata (684 bytes)
Downloading llama_index_embeddings_ollama-0.6.0-py3-none-any.whl (3.4 kB)
Installing collected packages: llama-index-embeddings-ollama
Successfully installed llama-index-embeddings-ollama-0.6.0


In [None]:
# Format output
import textwrap

## RAG Creation and Testing

In [None]:
# Retrieval-augmented Generation
# Use the 2025 Capital One Beneftis document as a knowledge base for the LLM


# Load Libraries
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.ollama import OllamaEmbedding

# Identify embedding model
embed_model = OllamaEmbedding(model_name="nomic-embed-text")

# LLM was previously identified

# Set global settings
Settings.embed_model = embed_model
Settings.llm = llm


# Load the Benefits document
reader = SimpleDirectoryReader(input_files = ["/2025_Benefits_Guide_capitalone.pdf"])
text = reader.load_data()
print(f"Length of docs: {len(text)}")

# Split the document into chunks
text_splitter = SentenceSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.get_nodes_from_documents(text)

# Create the Index.
# The embedding step is automatic when using Llamaindex
# Llamaindex has a built-in vector store
index = VectorStoreIndex(docs)

# chat engine
chat_engine = index.as_chat_engine(chat_mode="condense_question")

# Questions

prompt = input("Your question: ")
response = chat_engine.chat(prompt)
print(response.response)

Length of docs: 34
Your question: can you summarize capitalone benefits
Capital One offers a comprehensive Total Rewards package that includes various benefits to support your total well-being. These benefits cover medical and prescription drug, dental and vision coverage, as well as financial protection options to help you save for the future. The guide provides an overview of these benefits, but it's essential to review all available options to maximize value for you and your family.


In [None]:
prompt = input("Your question: ")
response = chat_engine.chat(prompt)
print("Chatbot: ",textwrap.fill(response.response,width=100))

Your question: Does capitalone offer health insurance to its employees
Chatbot:  Capital One's Total Rewards package includes a comprehensive healthcare component that offers
medical, dental, and vision coverage.


In [None]:
prompt = input("Your question: ")
response = chat_engine.chat(prompt)
print("Chatbot: ",textwrap.fill(response.response,width=100))

Your question: when is an employee eligible for health, dental, and vision benefits
Chatbot:  Yes, you can be eligible for Capital One's medical, dental, and vision benefits if you enroll by the
date of hire.


In [None]:
prompt = input("Your question: ")
response = chat_engine.chat(prompt)
print("Chatbot: ",textwrap.fill(response.response,width=100))

Your question: Does capitalone offer tax-advantaged spending and savings accounts
Chatbot:  Yes, Capital One offers a Health Savings Account (HSA) as a tax-advantageous spending and savings
option. It provides triple tax benefits: contributions are made pre-tax, the account balance earns
interest tax-free, and withdrawals for eligible medical expenses are also tax-free.


## Next Steps
- Develop a user-friendly interface for the chatbot
- Determine deployment strategy