# Developing RAG Systems with DeepSeek R1 & Ollama

## Ollama Setup
- https://sebastian-petrus.medium.com/developing-rag-systems-with-deepseek-r1-ollama-f2f561cfda97
- https://apidog.com/blog/rag-deepseek-r1-ollama/

In [1]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [2]:
import subprocess

# Run the command as a background process
subprocess.Popen(["nohup", "ollama", "serve"], stdout=open("nohup.out", "w"), stderr=subprocess.STDOUT)

<Popen: returncode: None args: ['nohup', 'ollama', 'serve']>

In [3]:
%%capture
!ollama pull 'deepseek-r1'

In [4]:
%%capture
!ollama pull 'deepseek-r1:1.5b'

In [5]:
%%capture
!pip install -q ollama

## DeepSeek R1 Model Variants
DeepSeek R1 ranges from 1.5B to 671B parameters. Start small with the 1.5B model for lightweight RAG applications.

```sh
ollama run deepseek-r1:1.5b
```

## Step-by-Step Guide to Building the RAG Pipeline
### Step 1: Import Libraries
We’ll use:

- [LangChain](https://github.com/langchain-ai/langchain) for document processing and retrieval.
- [Streamlit](https://streamlit.io/) for the user-friendly web interface.

![images](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*eiQWHXXZgS0pHdi-.png)

In [6]:
%%capture
!pip install -qq docling docling-core langchain_experimental langchain langchain-text-splitters langchain-huggingface langchain-chroma langchain-groq langchain-ollama langchain-openai langchain_community

In [7]:
%%capture
!pip install -q streamlit
!pip install -q pdfplumber
!pip install -q faiss-gpu faiss-cpu

In [8]:
import streamlit as st  
from langchain_community.document_loaders import PDFPlumberLoader  
from langchain_experimental.text_splitter import SemanticChunker  
from langchain_community.embeddings import HuggingFaceEmbeddings  
from langchain_community.vectorstores import FAISS  
from langchain_community.llms import Ollama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

In [9]:
# Load PDF text  
loader = PDFPlumberLoader("./documents/2309.15217v1.pdf")  
docs = loader.load()

In [10]:
len(docs)

8

In [11]:
docs[0]

Document(metadata={'source': './documents/2309.15217v1.pdf', 'file_path': './documents/2309.15217v1.pdf', 'page': 0, 'total_pages': 8, 'Author': '', 'CreationDate': 'D:20230928011700Z', 'Creator': 'LaTeX with hyperref', 'Keywords': '', 'ModDate': 'D:20230928011700Z', 'PTEX.Fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'Producer': 'pdfTeX-1.40.25', 'Subject': '', 'Title': '', 'Trapped': 'False'}, page_content='RAGAS: Automated Evaluation of Retrieval Augmented Generation\nShahulEs†,JithinJames†,LuisEspinosa-Anke∗♢,StevenSchockaert∗\n†ExplodingGradients\n∗CardiffNLP,CardiffUniversity,UnitedKingdom\n♢AMPLYFI,UnitedKingdom\nshahules786@gmail.com,jamesjithin97@gmail.com\n{espinosa-ankel,schockaerts1}@cardiff.ac.uk\nAbstract struggletomemoriseknowledgethatisonlyrarely\nmentioned in the training corpus (Kandpal et al.,\nWeintroduceRAGAS(RetrievalAugmented\n2022;Mallenetal.,2023). Thestandardsolution\nGeneration Assessment), a framework 

In [12]:
from IPython.display import Markdown, display

In [13]:
display(Markdown(docs[0].page_content))

RAGAS: Automated Evaluation of Retrieval Augmented Generation
ShahulEs†,JithinJames†,LuisEspinosa-Anke∗♢,StevenSchockaert∗
†ExplodingGradients
∗CardiffNLP,CardiffUniversity,UnitedKingdom
♢AMPLYFI,UnitedKingdom
shahules786@gmail.com,jamesjithin97@gmail.com
{espinosa-ankel,schockaerts1}@cardiff.ac.uk
Abstract struggletomemoriseknowledgethatisonlyrarely
mentioned in the training corpus (Kandpal et al.,
WeintroduceRAGAS(RetrievalAugmented
2022;Mallenetal.,2023). Thestandardsolution
Generation Assessment), a framework for
to these issues is to rely on Retrieval Augmented
reference-free evaluation of Retrieval Aug-
Generation (RAG) (Lee et al., 2019; Lewis et al.,
mented Generation (RAG) pipelines. RAG
2020; Guu et al., 2020). Answering a question
systems are composed of a retrieval and an
LLM based generation module, and provide then essentially involves retrieving relevant pas-
LLMswithknowledgefromareferencetextual sages from a corpus and feeding these passages,
database,whichenablesthemtoactasanatu- alongwiththeoriginalquestion,totheLM.While
rallanguagelayerbetweenauserandtextual initial approaches relied on specialised LMs for
databases,reducingtheriskofhallucinations.
retrieval-augmentedlanguagemodelling(Khandel-
EvaluatingRAGarchitecturesis,however,chal-
waletal.,2020;Borgeaudetal.,2022),recentwork
lengingbecausethereareseveraldimensionsto
has suggested that simply adding retrieved docu-
consider: theabilityoftheretrievalsystemto
identifyrelevantandfocusedcontextpassages, mentstotheinputofastandardLMcanalsowork
theabilityoftheLLMtoexploitsuchpassages well (Khattab et al., 2022; Ram et al., 2023; Shi
in a faithful way, or the quality of the gener- etal.,2023),thusmakingitpossibletouseretrieval-
ationitself. With RAGAS,weputforwarda augmented strategies in combination with LLMs
suiteofmetricswhichcanbeusedtoevaluate thatareonlyavailablethroughAPIs.
these different dimensions without having to
While the usefulness of retrieval-augmented
relyongroundtruthhumanannotations. We
strategies is clear, their implementation requires
positthatsuchaframeworkcancruciallycon-
asignificantamountoftuning, astheoverallper-
tributetofasterevaluationcyclesofRAGarchi-
tectures, which is especially important given formance will be affected by the retrieval model,
thefastadoptionofLLMs. theconsideredcorpus,theLM,orthepromptfor-
mulation,amongothers. Automatedevaluationof
1 Introduction
retrieval-augmentedsystemsisthusparamount. In
practice,RAGsystemsareoftenevaluatedinterms
Language Models (LMs) capture a vast amount
ofthelanguagemodellingtaskitself,i.e.bymea-
ofknowledgeabouttheworld,whichallowsthem
suringperplexityonsomereferencecorpus. How-
to answer questions without accessing any exter-
ever, such evaluations are not always predictive
nal sources. This idea of LMs as repositories of
ofdownstreamperformance(Wangetal.,2023c).
knowledgeemergedshortlyaftertheintroduction
Moreover,thisevaluationstrategyreliesontheLM
of BERT (Devlin et al., 2019) and became more
probabilities, which are not accessible for some
firmly established with the introduction of ever
closed models (e.g. ChatGPT and GPT-4). Ques-
largerLMs(Robertsetal.,2020). Whilethemost
tionansweringisanothercommonevaluationtask,
recent Large Language Models (LLMs) capture
butusuallyonlydatasetswithshortextractivean-
enough knowledge to rival human performance
swersareconsidered,whichmaynotberepresen-
acrossawidevarietyofquestionansweringbench-
tativeofhowthesystemwillbeused.
marks (Bubeck et al., 2023), the idea of using
Toaddresstheseissues,inthispaperwepresent
LLMsasknowledgebasesstillhastwofundamen-
tallimitations. First,LLMsarenotabletoanswer
RAGAS1,aframeworkfortheautomatedassess-
questions about events that have happened after
1RAGAS is available at https://github.com/
theyweretrained. Second,eventhelargestmodels explodinggradients/ragas.
3202
peS
62
]LC.sc[
1v71251.9032:viXra


## Step 3: Chunk Documents Strategically
Leverage Streamlit’s file uploader to select a local PDF. Use PDFPlumberLoader to extract text efficiently without manual parsing.

In [14]:
# Split text into semantic chunks  
text_splitter = SemanticChunker(HuggingFaceEmbeddings())  
documents = text_splitter.split_documents(docs)

  text_splitter = SemanticChunker(HuggingFaceEmbeddings())
  text_splitter = SemanticChunker(HuggingFaceEmbeddings())


In [15]:
len(documents)

24

In [16]:
documents[0]

Document(metadata={'source': './documents/2309.15217v1.pdf', 'file_path': './documents/2309.15217v1.pdf', 'page': 0, 'total_pages': 8, 'Author': '', 'CreationDate': 'D:20230928011700Z', 'Creator': 'LaTeX with hyperref', 'Keywords': '', 'ModDate': 'D:20230928011700Z', 'PTEX.Fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'Producer': 'pdfTeX-1.40.25', 'Subject': '', 'Title': '', 'Trapped': 'False'}, page_content='RAGAS: Automated Evaluation of Retrieval Augmented Generation\nShahulEs†,JithinJames†,LuisEspinosa-Anke∗♢,StevenSchockaert∗\n†ExplodingGradients\n∗CardiffNLP,CardiffUniversity,UnitedKingdom\n♢AMPLYFI,UnitedKingdom\nshahules786@gmail.com,jamesjithin97@gmail.com\n{espinosa-ankel,schockaerts1}@cardiff.ac.uk\nAbstract struggletomemoriseknowledgethatisonlyrarely\nmentioned in the training corpus (Kandpal et al.,\nWeintroduceRAGAS(RetrievalAugmented\n2022;Mallenetal.,2023). Thestandardsolution\nGeneration Assessment), a framework 

In [17]:
display(Markdown(documents[0].page_content))

RAGAS: Automated Evaluation of Retrieval Augmented Generation
ShahulEs†,JithinJames†,LuisEspinosa-Anke∗♢,StevenSchockaert∗
†ExplodingGradients
∗CardiffNLP,CardiffUniversity,UnitedKingdom
♢AMPLYFI,UnitedKingdom
shahules786@gmail.com,jamesjithin97@gmail.com
{espinosa-ankel,schockaerts1}@cardiff.ac.uk
Abstract struggletomemoriseknowledgethatisonlyrarely
mentioned in the training corpus (Kandpal et al.,
WeintroduceRAGAS(RetrievalAugmented
2022;Mallenetal.,2023). Thestandardsolution
Generation Assessment), a framework for
to these issues is to rely on Retrieval Augmented
reference-free evaluation of Retrieval Aug-
Generation (RAG) (Lee et al., 2019; Lewis et al.,
mented Generation (RAG) pipelines. RAG
2020; Guu et al., 2020). Answering a question
systems are composed of a retrieval and an
LLM based generation module, and provide then essentially involves retrieving relevant pas-
LLMswithknowledgefromareferencetextual sages from a corpus and feeding these passages,
database,whichenablesthemtoactasanatu- alongwiththeoriginalquestion,totheLM.While
rallanguagelayerbetweenauserandtextual initial approaches relied on specialised LMs for
databases,reducingtheriskofhallucinations. retrieval-augmentedlanguagemodelling(Khandel-
EvaluatingRAGarchitecturesis,however,chal-
waletal.,2020;Borgeaudetal.,2022),recentwork
lengingbecausethereareseveraldimensionsto
has suggested that simply adding retrieved docu-
consider: theabilityoftheretrievalsystemto
identifyrelevantandfocusedcontextpassages, mentstotheinputofastandardLMcanalsowork
theabilityoftheLLMtoexploitsuchpassages well (Khattab et al., 2022; Ram et al., 2023; Shi
in a faithful way, or the quality of the gener- etal.,2023),thusmakingitpossibletouseretrieval-
ationitself. With RAGAS,weputforwarda augmented strategies in combination with LLMs
suiteofmetricswhichcanbeusedtoevaluate thatareonlyavailablethroughAPIs. these different dimensions without having to
While the usefulness of retrieval-augmented
relyongroundtruthhumanannotations. We
strategies is clear, their implementation requires
positthatsuchaframeworkcancruciallycon-
asignificantamountoftuning, astheoverallper-
tributetofasterevaluationcyclesofRAGarchi-
tectures, which is especially important given formance will be affected by the retrieval model,
thefastadoptionofLLMs. theconsideredcorpus,theLM,orthepromptfor-
mulation,amongothers. Automatedevaluationof
1 Introduction
retrieval-augmentedsystemsisthusparamount. In
practice,RAGsystemsareoftenevaluatedinterms
Language Models (LMs) capture a vast amount
ofthelanguagemodellingtaskitself,i.e.bymea-
ofknowledgeabouttheworld,whichallowsthem
suringperplexityonsomereferencecorpus. How-
to answer questions without accessing any exter-
ever, such evaluations are not always predictive
nal sources. This idea of LMs as repositories of
ofdownstreamperformance(Wangetal.,2023c). knowledgeemergedshortlyaftertheintroduction
Moreover,thisevaluationstrategyreliesontheLM
of BERT (Devlin et al., 2019) and became more
probabilities, which are not accessible for some
firmly established with the introduction of ever
closed models (e.g. ChatGPT and GPT-4). Ques-
largerLMs(Robertsetal.,2020). Whilethemost
tionansweringisanothercommonevaluationtask,
recent Large Language Models (LLMs) capture
butusuallyonlydatasetswithshortextractivean-
enough knowledge to rival human performance
swersareconsidered,whichmaynotberepresen-
acrossawidevarietyofquestionansweringbench-
tativeofhowthesystemwillbeused. marks (Bubeck et al., 2023), the idea of using
Toaddresstheseissues,inthispaperwepresent
LLMsasknowledgebasesstillhastwofundamen-
tallimitations.

## Step 4: Create a Searchable Knowledge Base
Generate vector embeddings for the chunks and store them in a FAISS index.

- Embeddings allow fast, contextually relevant searches.

In [18]:
# Generate embeddings  
embeddings = HuggingFaceEmbeddings()  
vector_store = FAISS.from_documents(documents, embeddings)  

# Connect retriever  
retriever = vector_store.as_retriever(search_kwargs={"k": 3})  # Fetch top 3 chunks

  embeddings = HuggingFaceEmbeddings()


## Step 5: Configure DeepSeek R1
Set up a **RetrievalQA chain** using the **DeepSeek R1 1.5B** model.

- This ensures answers are grounded in the PDF’s content rather than relying on the model’s training data.

In [32]:
llm = Ollama(model="deepseek-r1:1.5b")  # Our 1.5B parameter model  


# Craft the prompt template
# 3. Keep answers under 4 sentences.  
prompt = """  
1. Use ONLY the context below.  
2. If unsure, say "I don’t know".  

Context: {context}  

Question: {question}  

Answer:  
"""  
QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt)

In [33]:
QA_CHAIN_PROMPT

PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='  \n1. Use ONLY the context below.  \n2. If unsure, say "I don’t know".  \n\nContext: {context}  \n\nQuestion: {question}  \n\nAnswer:  \n')

## Step 6: Assemble the RAG Chain
Integrate uploading, chunking, and retrieval into a cohesive pipeline.

- This approach gives the model verified context, enhancing accuracy.

In [34]:
from langchain.chains import LLMChain, StuffDocumentsChain, RetrievalQA

# Chain 1: Generate answers  
llm_chain = LLMChain(llm=llm, prompt=QA_CHAIN_PROMPT)  

# Chain 2: Combine document chunks  
document_prompt = PromptTemplate(  
    template="Context:\ncontent:{page_content}\nsource:{source}",  
    input_variables=["page_content", "source"]  
)
document_variable_name = "context"

# Final RAG pipeline  
qa = RetrievalQA(  
    combine_documents_chain=StuffDocumentsChain(  
        llm_chain=llm_chain,  
        document_prompt=document_prompt,
        document_variable_name=document_variable_name
    ),  
    retriever=retriever  
)

In [35]:
user_input = input("Enter your question: ")
response = qa(user_input)

Enter your question:  Define LLMs, RAG, and Agents


In [36]:
print(response["result"])

<think>
Okay, so I need to define LLMs, RAG, and Agents based on the context provided. Let me start by reading through each section carefully.

First, there's a mention of Amos Azaria and Tom Mitchell in 2023. Their work is about the internal state of an LLM when lying. The source is from ./documents/2309.15217v1.pdf. I know that an LLM stands for Large Language Model, which is a type of AI designed to understand and generate human language.

Next, there's information about RAGAS. It says LLMs aren't great at answering the "What has happened" questions, but they can answer other types of questions. The source links to a GitHub repository where LLMs are trained on datasets. They're trained and then used in various ways, like scoring systems. I remember hearing about RAGAS before; it's a framework that allows multiple models to share the same evaluation metrics.

Then there's a section talking about the origins of LLMs. It mentions that these models have "exploding gradients" and are "ex

In [37]:
display(Markdown(response["result"]))

<think>
Okay, so I need to define LLMs, RAG, and Agents based on the context provided. Let me start by reading through each section carefully.

First, there's a mention of Amos Azaria and Tom Mitchell in 2023. Their work is about the internal state of an LLM when lying. The source is from ./documents/2309.15217v1.pdf. I know that an LLM stands for Large Language Model, which is a type of AI designed to understand and generate human language.

Next, there's information about RAGAS. It says LLMs aren't great at answering the "What has happened" questions, but they can answer other types of questions. The source links to a GitHub repository where LLMs are trained on datasets. They're trained and then used in various ways, like scoring systems. I remember hearing about RAGAS before; it's a framework that allows multiple models to share the same evaluation metrics.

Then there's a section talking about the origins of LLMs. It mentions that these models have "exploding gradients" and are "extremely large." They're also used in automated retrieval for questions after 1 year. I think this refers to how LLMs can retrieve information from documents, especially since they've been trained on a vast amount of data.

Finally, the context discusses different aspects of evaluating text generation systems. It mentions RAGS as an evaluation framework that integrates with LLMs. The idea is to assess facts versus fictions by using prompts. There's also talk about detecting hallucinations—when answers might be misleading or incorrect—and comparing faithfulness scores. This ties into how RAGA works, ensuring accurate evaluations.

Putting it all together: LLMs are AI models that can process language. RAGAS is a system that evaluates text responses, considering factual vs. fictional answers and using prompts to assess accuracy. Agents likely refer to the users or systems that interact with these models.
</think>

LLMs (Large Language Models) are advanced AI systems designed to generate and understand human language, capable of processing vast amounts of data and context.

RAGAS is a framework for evaluating text responses, focusing on factual versus fictional answers and using prompts to assess accuracy. It integrates with LLMs to ensure evaluations reflect truthfulness and correctness.

Agents in this context likely refer to systems or users that interact with these models, leveraging their capabilities for tasks like retrieval and evaluation of text.