<a href="https://colab.research.google.com/github/claudio1975/Medium-blog/blob/master/DeepSeek_RAG/RAG_DeepSeek_v4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naive RAG with DeepSeek and LangChain

This notebook shows an easy RAG (Retrieval Augmented Generation) with DeepSeek model from Hugging Face [`deepseek-ai/DeepSeek-R1`](https://huggingface.co/deepseek-ai/DeepSeek-R1), and LangChain.


**RAG process**

The RAG (Retrieval-Augmented Generation) system combines a retrieval system with an LLM. The system first retrieves relevant documents from a corpus using a vector database, then uses an LLM hosted in Hugging Face to generate answers based on the retrieved documents.


# Prepare Workspace

In [None]:
!pip install -q torch transformers sentence-transformers faiss-cpu pypdf &> /dev/null

In [None]:
!pip install -U langchain-huggingface &>/dev/null

In [None]:
!pip install -q langchain langchain-community &> /dev/null

In [None]:
import langchain as lc
from langchain import LLMMathChain
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline


## Upload the data


In [None]:
# Load content from local PDFs
loader = PyPDFLoader("./2501.12948v1.pdf")
docs = loader.load()

In [None]:
# Define the document:
Document(page_content="DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.",
         metadata={
             'document_id' : '2501.12948v1',
             'document_source' : "ArXiv",
             'document_create_time' : "2025"
         })

Document(metadata={'document_id': '2501.12948v1', 'document_source': 'ArXiv', 'document_create_time': '2025'}, page_content='DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.')

In [None]:
print("\nPage Content: ", docs[0].page_content)
print("\nMeta Data: ", docs[0].metadata)


Page Content:  DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-AI
research@deepseek.com
Abstract
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super-
vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.
Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing
reasoning behaviors. However, it encounters challenges such as poor readability, and language
mixing. To address these issues and further enhance reasoning performance, we introduce
DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-
R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the
research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models
(1.5B, 7B, 8B, 14B, 32B, 70B) distilled from Dee

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)

In [None]:
print("PDF Splited by Chunks - You have {0} number of chunks.".format(len(docs)))

PDF Splited by Chunks - You have 22 number of chunks.


## Embeddings + Retriever

For embeddings I use the `HuggingFaceEmbeddings` and the [`BAAI/bge-base-en-v1.5`](https://huggingface.co/BAAI/bge-base-en-v1.5) embeddings model.

To create the vector database, I use `FAISS`, a library developed by Facebook AI. This library offers efficient similarity search and clustering of dense vectors.

In [None]:
db = FAISS.from_documents(chunked_docs,
                          HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5'))

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 3}
)

## Load the model

In [None]:
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/679 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.06k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

## Setup the RAG

First, I create a text_generation pipeline using the loaded model and its tokenizer.

Next, I create a prompt template.

Then, I combined the `llm_chain` with the retriever to create a RAG chain.

In [None]:
# Pipeline for text generation
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=500,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# Prompt template to match desired output format
prompt_template = """
You are a professional AI researcher, give an help in study. Use the following context to answer the question using information provided by the paper:

{context}

Question: {question}
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

llm_chain = prompt | llm | StrOutputParser()


rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)



Device set to use cuda:0


# Questions

In [None]:
question = "What are the advantages of using reinforcement learning directly on a base model, as demonstrated by DeepSeek-R1-Zero?"

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)

</think>

The advantages of using reinforcement learning directly on a base model, as demonstrated by DeepSeek-R1-Zero, include:

1. **Direct Application of RL**: DeepSeek-R1-Zero applies reinforcement learning directly to the base model without requiring any pre-trained or fine-tuned data.

2. **Chain-of-Thought (CoT) Utilization**: The model leverages CoT techniques inherently during its training process, allowing it to develop sophisticated problem-solving abilities through exploration and refinement.

3. **Self-Learning Capabilities**: By applying RL directly, DeepSeek-R1-Zero enables it to learn and improve its reasoning and decision-making processes independently, enhancing its overall performance.

4. **Enhanced Problem-Solving Through RL**: The integration of reinforcement learning within the base model allows for dynamic and adaptive learning, where the model can adapt its strategies based on feedback and outcomes observed during training.

These features collectively demonstr

In [None]:
question = "What is cold-start data and why is it used in DeepSeek-R1 training?"

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)

</think>

Cold-start data refers to initial training data that is scarce or not available during the initial phase of a machine learning model's development. It is often used when there is limited data available to train a model effectively.

In the context of DeepSeek-R1 training:
- **Cold-start data** is utilized to initialize the model's knowledge base.
- The model starts with minimal information but gradually gains expertise through repeated training cycles.
- This approach helps the model adapt better to new challenges and improve its overall performance over time.

By leveraging cold-start data, DeepSeek-R1 can efficiently learn and refine its capabilities, making it more effective at solving complex problems.


In [None]:
question = "What are DeepSeek-R1-Zero and DeepSeek-R1?"

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)

</think>

DeepSeek-R1-Zero and DeepSeek-R1 are both models developed by DeepSeek, but they represent different versions or iterations of the same system.

- **DeepSeek-R1**:
  - **Description**: It is a language model developed by DeepSeek, designed for tasks such as text generation, question answering, and problem-solving.
  - **Key Features**:
    - **Language Model**: Utilizes transformer-based architectures to handle sequence modeling tasks.
    - **Inference Speed**: Achieves high inference speeds, making it suitable for real-world applications.
    - **Task Capabilities**: Supports a wide range of tasks including natural language understanding, generation, and reasoning.
    - **Performance**: Demonstrates strong performance across multiple domains, as evidenced by its results in benchmarks like ARXIV.

- **DeepSeek-R1-Zero**:
  - **Description**: This version introduces a "pure RL" approach, which likely refers to reinforcement learning techniques optimized for sequential decisi