# RAG (Retrieval-Augmented Generation) dentro del notebook

**Estudiante:** Camilo Andrés Quintero Rodríguez

Este cuaderno es un tutorial **ejecutable** que implementa un agente RAG usando **OpenAI + Pinecone**. Todo corre **dentro del notebook** 


---

## 1) Instalar dependencias

Se ejecuta celda para instalar las librerias


In [None]:

!pip install -U pip
!pip install langchain langchain-text-splitters langchain-community bs4 python-dotenv pinecone-client openai


---

## 2) Configurar claves


In [None]:


OPENAI_API_KEY = 
PINECONE_API_KEY=
PINECONE_ENVIRONMENT=
PINECONE_INDEX_NAME=


---

## 3) Cargar la página web 




In [4]:
from langchain_community.document_loaders import WebBaseLoader
import bs4

url = 'https://lilianweng.github.io/posts/2023-06-23-agent/'

# Usar SoupStrainer para extraer el contenido de post
strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(web_paths=(url,), bs_kwargs={'parse_only': strainer})

print('Descargando y procesando la URL...')
docs = loader.load()
print(f'Documents loaded: {len(docs)}')
if len(docs) > 0:
    print('\n--- Fragmento del documento (primeros 800 caracteres) ---\n')
    print(docs[0].page_content[:800])
else:
    print('No se pudo extraer contenido, revisa la URL o el selector.')


USER_AGENT environment variable not set, consider setting it to identify your requests.


Descargando y procesando la URL...
Documents loaded: 1

--- Fragmento del documento (primeros 800 caracteres) ---



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:

Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The 


---

## 4) Dividir el texto en chunks

Usaremos `RecursiveCharacterTextSplitter` con `chunk_size=1000` y `chunk_overlap=200`.


In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200

splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP, add_start_index=True)
chunks = splitter.split_documents(docs)
print(f'Total chunks: {len(chunks)}')
print('\nEjemplo - chunk 0 (primeros 500 chars):\n')
print(chunks[0].page_content[:500])


Total chunks: 63

Ejemplo - chunk 0 (primeros 500 chars):

LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In a LLM-p


---

## 5) Embeddings y Vector Store

Usamos OpenAI embeddings + Pinecone. 


In [None]:




os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec

emb = OpenAIEmbeddings(model='text-embedding-3-small')
pc = Pinecone(api_key=PINECONE_API_KEY)


existing = [idx['name'] for idx in pc.list_indexes()]
if PINECONE_INDEX_NAME not in existing:
    print('Creando índice en Pinecone (si tu plan y entorno lo permiten)...')
    pc.create_index(name=PINECONE_INDEX_NAME, dimension=1536, metric='cosine', spec=ServerlessSpec(cloud='aws', region=PINECONE_ENVIRONMENT))
index = pc.Index(PINECONE_INDEX_NAME)
vector_store = PineconeVectorStore(index_name=PINECONE_INDEX_NAME, embedding=emb)

added = vector_store.add_documents(chunks)
print(f'Chunks añadidos a Pinecone: {len(added)}')


Modo OpenAI + Pinecone: se requiere OPENAI_API_KEY y PINECONE_API_KEY.


  from .autonotebook import tqdm as notebook_tqdm


Chunks añadidos a Pinecone: 63


---

## 6) Recuperador  y prompt

Construimos el retriever y el prompt que alimentará al LLM.


In [7]:
from langchain_core.prompts import ChatPromptTemplate

TOP_K = 2
retriever = vector_store.as_retriever(search_kwargs={'k': TOP_K})

prompt_template = ChatPromptTemplate.from_template('''Usa el siguiente contexto para responder la pregunta del usuario.\nSi no tienes suficiente información, responde honestamente que no la tienes.\n\nContexto:\n{context}\n\nPregunta:\n{question}\n''')
print('Retriever y prompt creados.')


Retriever y prompt creados.


---

## 7) LLM y pipeline RAG

Usamos `ChatOpenAI` y un pipeline simple que recupera contexto y pasa el prompt al LLM.


In [8]:
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# Configurar LLM (temperatura 0 para respuestas deterministas)
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)

rag_chain = (
    RunnableParallel({'context': retriever, 'question': RunnablePassthrough()})
    | prompt_template
    | llm
)
print('RAG chain lista.')


RAG chain lista.


---

## 8) Función auxiliar `ask()`

Esta celda define una función `ask(question)` que ejecuta la RAG chain y devuelve la respuesta.


In [None]:
def ask(question: str):
    # Invocación simple de la RAG chain
    # El retriever (Runnable) recupera contexto automáticamente
    result = rag_chain.invoke(question)
    # `result` es un objeto tipo ChatResult; mostramos su contenido
    try:
        content = result.content
    except Exception:
        # Fallback si la estructura difiere
        content = str(result)
    print('\n--- Respuesta del agente RAG ---\n')
    print(content)



Función ask definida. Llama a ask("tu pregunta") para probar.


---

## 9) Probar con preguntas reales

Escribimos cualquier pregunta relacionada con el artículo y ejecutamos la celda. Ejemplo: `ask('What is task decomposition?')`.


In [None]:

ask('What is task decomposition?')




--- Respuesta del agente RAG ---

Task decomposition is the process of breaking down a larger task into smaller, more manageable sub-tasks or goals. This can be achieved in several ways:

1. Using large language models (LLMs) with simple prompts, such as asking for "Steps for XYZ" or "What are the subgoals for achieving XYZ?"
2. Employing task-specific instructions, like "Write a story outline" for writing a novel.
3. Incorporating human inputs to guide the decomposition process.

Additionally, there is a distinct approach called LLM+P, which involves using an external classical planner for long-horizon planning. This method utilizes the Planning Domain Definition Language (PDDL) to describe the planning problem, where the LLM translates the problem into PDDL, requests a classical planner to generate a plan, and then translates that plan back into natural language.
Si quieres probar ask() ahora, descomenta la llamada o ejecuta ask("tu pregunta").
