<h1>Creación de un pequeño agente chatbot con Deepseek en MAC</h1>
<h3>como Proyecto final de Fausto Morales </h3>
<h3><ffmogbaj/h3>

Debemos de instalar varias herramientas, como Ollama.

Los pasos de instalación se encuentran en: https://ollama.com/download

Una vez instalado descargamos localmente el modelo de deepseek: https://ollama.com/library/deepseek-r1

Y debido a que nuestra computadora es pequeña escogemos el modelo más chico: DeepSeek-R1-Distill-Qwen-1.5B

Y usamos el comando en la terminal para instalarlo:
```bash
ollama run deepseek-r1:1.5b
```

También instalamos complementos para poder ejecutar el código de este notebook:
```bash
brew install poppler
brew install tesseract
```

También añadimos la siguiente configuración en el bash:
```bash
nano ~/.zshrc
```
Colocamos
```bash
export OCR_AGENT="unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract"
```

E instalamos los paquetes de python:
```python
pip install ollama langchain chromadb gradio pdfminer langchain_chroma 
pip install -U langchain-community unstructured
pip install pdfminer.six pi_heif unstructured_inference pdf2image python-dateutil pytesseract unstructured_pytesseract 
```


In [1]:
import ollama
import re
import gradio as gr

# Nombre del modelo que tienes corriendo
model_name = "deepseek-r1:1.5b"

# Función para generar una respuesta usando el modelo
def generate_text(prompt):
    # Usar la función 'chat' para interactuar con el modelo
    response = ollama.chat(model=model_name, messages=[{"role": "user", "content": prompt}])
    cleaned_response = re.sub(r'<think>.*?</think>', '', response['message']['content'], flags=re.DOTALL).strip()
    # Imprimir la respuesta completa para ver su estructura
    print("Respuesta completa del modelo:", response)    
    # Acceder al contenido correcto dependiendo de la estructura de la respuesta
    return cleaned_response  # Ajusta esto según la estructura de la respuesta

# Crear la interfaz
interface = gr.Interface(fn=generate_text, inputs="text", outputs="text")
# Lanzar la interfaz
interface.launch()




Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




Resultado ejemplo:
<img src="./img/Result model 1.png" alt="Imagen de ejemplo" width="800">


Mejora: Realizar IA generativa con RAG (Generación aumentada por recuperación).

Ejemplo: cargando un PDF con la página de wikipedia de condusef con embeddings (vectores).

Se descargo del siguiente URL: https://es.wikipedia.org/w/index.php?title=Especial:DownloadAsPdf&page=Comisión_Nacional_para_la_Protección_y_Defensa_de_los_Usuarios_de_Servicios_Financieros&action=show-download-screen
 y se colocó en el path del notebook.

<img src="https://cdn.getmidnight.com/171293841d3fdd4af2e12426ce202ac9/2024/02/User-Query-1.png" alt="Captura de pantalla" style="zoom:25%;" />


In [2]:
# Imports
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Jupyter-specific imports
from IPython.display import display, Markdown

# Set environment variable for protobuf
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

In [3]:
# Load PDF
from langchain_community.document_loaders import UnstructuredPDFLoader

local_path = "./pdf/1.pdf"
if local_path:
    loader = UnstructuredPDFLoader(file_path=local_path)
    data = loader.load()
    print(f"PDF loaded successfully: {local_path}")
else:
    print("Upload a PDF file")

# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(data)
print(f"Text split into {len(chunks)} chunks")

# Create vector database
db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model=model_name),
    collection_name="local-rag"
)

print("Vector database created successfully")

# Set up LLM and retrieval
llm = ChatOllama(model=model_name)

# Query prompt template
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate 2
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

# Set up retriever
retriever = MultiQueryRetriever.from_llm(
    db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt template
template = """To concisely answer the question based ONLY on the following context
{context}.
Question: {question}"""

prompt = ChatPromptTemplate.from_template(template)

# Create chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

def chat_with_pdf(quiestion):
    """
    Chat with the PDF using the RAG chain and return the result as plain text.
    """
    # Obtener la respuesta de la cadena (probablemente el resultado de chain.invoke)
    result = chain.invoke(quiestion)
    # Usamos regex para eliminar el contenido entre <think> y </think>
    cleaned_result = re.sub(r'<think>.*?</think>', '', result, flags=re.DOTALL)

    # Devolver el resultado como texto (en lugar de usar display(Markdown(...)))
    return cleaned_result


# Crear la interfaz
interface = gr.Interface(fn=chat_with_pdf, inputs="text", outputs="text")
# Lanzar la interfaz
interface.launch()

PDF loaded successfully: ./pdf/1.pdf
Text split into 4 chunks
Vector database created successfully
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




Resultado ejemplo:
<img src="./img/Result model 2.png" alt="Imagen de ejemplo" width="800">

RAG con más de un archivo.

Se cargaron en la carpeta ./pdf: 

se empleó la página de la wikipedia de condusef, la página de wikipedia de banamex y una página aleatoria del banco de México sobre el Artículo 48 Bis 2 de la Ley de Instituciones de Crédito.

In [4]:
import argparse
import os
import shutil
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema.document import Document
from langchain.vectorstores.chroma import Chroma
from langchain_ollama import OllamaEmbeddings
import sys
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_community.embeddings.bedrock import BedrockEmbeddings


# Nombre del modelo que tienes corriendo
model_name = "deepseek-r1:1.5b"

CHROMA_PATH = "./chroma"
DATA_PATH = "./pdf/"


def get_embedding_function():
    embeddings = OllamaEmbeddings(model=model_name)
    # embeddings = OllamaEmbeddings(model="nomic-embed-text")
    return embeddings

def load_documents():
    document_loader = PyPDFDirectoryLoader(DATA_PATH)
    return document_loader.load()


def split_documents(documents: list[Document]):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=80,
        length_function=len,
        is_separator_regex=False,
    )
    return text_splitter.split_documents(documents)


def add_to_chroma(chunks: list[Document]):
    # Load the existing database.
    db = Chroma(
        persist_directory=CHROMA_PATH, embedding_function=get_embedding_function()
    )

    # Calculate Page IDs.
    chunks_with_ids = calculate_chunk_ids(chunks)

    # Add or Update the documents.
    existing_items = db.get(include=[])  # IDs are always included by default
    existing_ids = set(existing_items["ids"])
    print(f"Number of existing documents in DB: {len(existing_ids)}")

    # Only add documents that don't exist in the DB.
    new_chunks = []
    for chunk in chunks_with_ids:
        if chunk.metadata["id"] not in existing_ids:
            new_chunks.append(chunk)

    if len(new_chunks):
        print(f"👉 Adding new documents: {len(new_chunks)}")
        new_chunk_ids = [chunk.metadata["id"] for chunk in new_chunks]
        db.add_documents(new_chunks, ids=new_chunk_ids)
        db.persist()
    else:
        print("✅ No new documents to add")


def calculate_chunk_ids(chunks):

    # This will create IDs like "data/monopoly.pdf:6:2"
    # Page Source : Page Number : Chunk Index

    last_page_id = None
    current_chunk_index = 0

    for chunk in chunks:
        source = chunk.metadata.get("source")
        page = chunk.metadata.get("page")
        current_page_id = f"{source}:{page}"

        # If the page ID is the same as the last one, increment the index.
        if current_page_id == last_page_id:
            current_chunk_index += 1
        else:
            current_chunk_index = 0

        # Calculate the chunk ID.
        chunk_id = f"{current_page_id}:{current_chunk_index}"
        last_page_id = current_page_id

        # Add it to the page meta-data.
        chunk.metadata["id"] = chunk_id

    return chunks


def clear_database():
    if os.path.exists(CHROMA_PATH):
        shutil.rmtree(CHROMA_PATH)


# Check if the database should be cleared (using the --reset flag).
import sys
parser = argparse.ArgumentParser()
parser.add_argument("--reset", action="store_true", help="Reset the database.")
args, unknown = parser.parse_known_args(sys.argv[1:])  # ✅ aquí va el cambio

if args.reset:
    print("✨ Clearing Database")
    clear_database()

# Create (or update) the data store.
documents = load_documents()
chunks = split_documents(documents)
add_to_chroma(chunks)


Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 11 0 (offset 0)
Ignoring wrong pointing object 13 0 (offset 0)
Ignoring wrong pointing object 76 0 (offset 0)


Number of existing documents in DB: 50
✅ No new documents to add


In [5]:
from langchain.prompts import ChatPromptTemplate
from langchain_community.llms.ollama import Ollama
import re

PROMPT_TEMPLATE = """You are an AI language model assistant. 
Your task is to concisely answer the question based ONLY on the following context
{context}.
Question: {question}
"""


def query_rag(query_text):
    # Prepare the DB.
    embedding_function = get_embedding_function()
    vector_db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

     # Search the DB.
    #results = vector_db.similarity_search_with_score(query_text, k=5)

    #context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=vector_db, question=query_text)


    model = Ollama(model=model_name)
    response_text = model.invoke(prompt)

    cleaned_result = re.sub(r'<think>.*?</think>', '', response_text, flags=re.DOTALL)

    return cleaned_result

import gradio as gr

# Crear la interfaz
interface = gr.Interface(fn=query_rag, inputs="text", outputs="text")
# Lanzar la interfaz
interface.launch()



Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.




Resultado ejemplo:
<img src="./img/Result model 3.png" alt="Imagen de ejemplo" width="800">


Referencias para este proyecto:

https://aws.amazon.com/es/what-is/retrieval-augmented-generation/

https://www.datacamp.com/es/tutorial/deepseek-r1-ollama

https://github.com/tonykipkemboi/ollama_pdf_rag/blob/main/notebooks/experiments/updated_rag_notebook.ipynb

https://github.com/pixegami/rag-tutorial-v2/tree/main

https://github.com/fcori47/rag_basico/blob/master/Clase%206%20-%20Final.ipynb

https://aws.amazon.com/es/what-is/retrieval-augmented-generation/

DeepSeek-AI (2025) presenta *DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning* ([arXiv:2501.12948](https://arxiv.org/abs/2501.12948)).


Por: Fausto Morales
ffmogbaj@gmail.com