<a href="https://colab.research.google.com/github/claudio1975/Medium-blog/blob/master/DeepSeek_RAG/RAG_DeepSeek_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naive RAG with DeepSeek and LangChain

This notebook shows an easy RAG (Retrieval Augmented Generation) with DeepSeek model from Hugging Face [`deepseek-ai/DeepSeek-R1`](https://huggingface.co/deepseek-ai/DeepSeek-R1), and LangChain.


**RAG process**

The RAG (Retrieval-Augmented Generation) system combines a retrieval system with an LLM. The system first retrieves relevant documents from a corpus using a vector database, then uses an LLM hosted in Hugging Face to generate answers based on the retrieved documents.


# Prepare Workspace

In [None]:
!pip install -q torch transformers sentence-transformers faiss-cpu pypdf &> /dev/null

In [None]:
!pip install -U langchain-huggingface &>/dev/null

In [None]:
!pip install -q langchain langchain-community &> /dev/null

In [None]:
import langchain as lc
from langchain import LLMMathChain
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline


## Upload the data


In [None]:
# Load content from local PDFs
loader = PyPDFLoader("./2501.12948v1.pdf")
docs = loader.load()

In [None]:
# Define the document:
Document(page_content="DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.",
         metadata={
             'document_id' : '2501.12948v1',
             'document_source' : "ArXiv",
             'document_create_time' : "2025"
         })

Document(metadata={'document_id': '2501.12948v1', 'document_source': 'ArXiv', 'document_create_time': '2025'}, page_content='DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.')

In [None]:
print("\nPage Content: ", docs[0].page_content)
print("\nMeta Data: ", docs[0].metadata)


Page Content:  DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-AI
research@deepseek.com
Abstract
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super-
vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.
Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing
reasoning behaviors. However, it encounters challenges such as poor readability, and language
mixing. To address these issues and further enhance reasoning performance, we introduce
DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-
R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the
research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models
(1.5B, 7B, 8B, 14B, 32B, 70B) distilled from Dee

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)

In [None]:
print("PDF Splited by Chunks - You have {0} number of chunks.".format(len(docs)))

PDF Splited by Chunks - You have 22 number of chunks.


## Embeddings + Retriever

For embeddings I use the `HuggingFaceEmbeddings` and the [`BAAI/bge-base-en-v1.5`](https://huggingface.co/BAAI/bge-base-en-v1.5) embeddings model.

To create the vector database, I use `FAISS`, a library developed by Facebook AI. This library offers efficient similarity search and clustering of dense vectors.

In [None]:
db = FAISS.from_documents(chunked_docs,
                          HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5'))

In [None]:
retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 3}
)

## Load the model

In [None]:
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

## Setup the LLM chain

First, I create a text_generation pipeline using the loaded model and its tokenizer.

Next, I create a prompt template.

then, I combine the `llm_chain` with the retriever to create a RAG chain.

In [None]:
# Pipeline for text generation
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=500,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# Prompt template to match desired output format
prompt_template = """
You are a professional AI researcher, give an help in study. Use the following context to answer the question using information provided by the paper:

{context}

Question: {question}
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

llm_chain = prompt | llm | StrOutputParser()


rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)



Device set to use cuda:0


# Questions

In [None]:
question = "What are the advantages of using reinforcement learning directly on a base model, as demonstrated by DeepSeek-R1-Zero?"

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)


You are a professional AI researcher, give an help in study. Use the following context to answer the question using information provided by the paper:

[Document(id='c6471a4b-9c42-46fa-8800-e1c9641e8926', metadata={'source': './2501.12948v1.pdf', 'page': 4, 'page_label': '5'}, page_content='the inclusion of a small amount of cold-start data. In the following sections, we present: (1)\nDeepSeek-R1-Zero, which applies RL directly to the base model without any SFT data, and\n(2) DeepSeek-R1, which applies RL starting from a checkpoint fine-tuned with thousands of\nlong Chain-of-Thought (CoT) examples. 3) Distill the reasoning capability from DeepSeek-R1 to\nsmall dense models.\n2.2. DeepSeek-R1-Zero: Reinforcement Learning on the Base Model'), Document(id='47ec7b12-9b93-49b2-a957-1d74f702f136', metadata={'source': './2501.12948v1.pdf', 'page': 3, 'page_label': '4'}, page_content='1.1. Contributions\nPost-Training: Large-Scale Reinforcement Learning on the Base Model\n• We directly apply 

In [None]:
question = "What is cold-start data and why is it used in DeepSeek-R1 training?"

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)


You are a professional AI researcher, give an help in study. Use the following context to answer the question using information provided by the paper:

[Document(id='6154be67-45be-4a07-87f7-2ee0fde4236f', metadata={'source': './2501.12948v1.pdf', 'page': 8, 'page_label': '9'}, page_content='models to generate detailed answers with reflection and verification, gathering DeepSeek-R1-\nZero outputs in a readable format, and refining the results through post-processing by human\nannotators.\nIn this work, we collect thousands of cold-start data to fine-tune the DeepSeek-V3-Base as\nthe starting point for RL. Compared to DeepSeek-R1-Zero, the advantages of cold start data\n9'), Document(id='4505bf93-bd96-4a96-9bf8-a8e99c283b80', metadata={'source': './2501.12948v1.pdf', 'page': 15, 'page_label': '16'}, page_content='learning. DeepSeek-R1-Zero represents a pure RL approach without relying on cold-start\ndata, achieving strong performance across various tasks. DeepSeek-R1 is more powerful,\n

In [None]:
question = "What are DeepSeek-R1-Zero and DeepSeek-R1?"

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)


You are a professional AI researcher, give an help in study. Use the following context to answer the question using information provided by the paper:

[Document(id='4505bf93-bd96-4a96-9bf8-a8e99c283b80', metadata={'source': './2501.12948v1.pdf', 'page': 15, 'page_label': '16'}, page_content='learning. DeepSeek-R1-Zero represents a pure RL approach without relying on cold-start\ndata, achieving strong performance across various tasks. DeepSeek-R1 is more powerful,\nleveraging cold-start data alongside iterative RL fine-tuning. Ultimately, DeepSeek-R1 achieves\nperformance comparable to OpenAI-o1-1217 on a range of tasks.\nWe further explore distillation the reasoning capability to small dense models. We use'), Document(id='9dd48cae-8619-445f-a962-e393bb52a1a8', metadata={'source': './2501.12948v1.pdf', 'page': 7, 'page_label': '8'}, page_content='Figure 3 |The average response length of DeepSeek-R1-Zero on the training set during the RL\nprocess. DeepSeek-R1-Zero naturally learns to s