## RAG with LangChain + Gemini + Pinecone

Este notebook implementa un sistema RAG usando LangChain, Gemini y Pinecone donde primero se carga la documentación oficial de LangChain, luego se divide en fragmentos pequeños para poder trabajarla mejor, después se generan embeddings con Google Gemini para convertir el texto en vectores numéricos, esos vectores se almacenan en una base de datos vectorial en Pinecone para poder hacer búsquedas semánticas, cuando el usuario hace una pregunta esta se transforma también en embedding, se consulta Pinecone para traer los fragmentos más relevantes según similitud, y finalmente esos fragmentos junto con la pregunta se envían al modelo Gemini con un prompt adecuado para generar una respuesta final mucho más precisa y contextualizada.

## 1. Install Dependencies

In [7]:
!pip install -q \
    langchain \
    langchain-google-genai \
    langchain-pinecone \
    langchain-community \
    pinecone-client \
    python-dotenv \
    bs4 \
    tiktoken

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-text-splitters 0.3.5 requires langchain-core<0.4.0,>=0.3.29, but you have langchain-core 1.2.15 which is incompatible.

[notice] A new release of pip is available: 23.1.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. Load API Keys


In [15]:
import os
from dotenv import load_dotenv

load_dotenv(dotenv_path="../.env")

GOOGLE_API_KEY     = os.getenv("GOOGLE_API_KEY")
PINECONE_API_KEY   = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX     = os.getenv("PINECONE_INDEX_NAME", "rag-langchain-docs")

assert GOOGLE_API_KEY,   "Missing GOOGLE_API_KEY"
assert PINECONE_API_KEY, "Missing PINECONE_API_KEY"

print("API Keys loaded successfully")
print(f"Pinecone index name: {PINECONE_INDEX}")

API Keys loaded successfully
Pinecone index name: rag-langchain-docs


## 3. Load & Split Documents

In [9]:
!pip install -q langchain-text-splitters

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain 1.2.10 requires langchain-core<2.0.0,>=1.2.10, but you have langchain-core 0.3.83 which is incompatible.
langchain-openai 1.1.10 requires langchain-core<2.0.0,>=1.2.13, but you have langchain-core 0.3.83 which is incompatible.
langgraph-prebuilt 1.0.8 requires langchain-core>=1.0.0, but you have langchain-core 0.3.83 which is incompatible.

[notice] A new release of pip is available: 23.1.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [16]:
import os
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

os.environ["USER_AGENT"] = "RAG-LangChain-Lab/1.0"

URLS = [
    "https://python.langchain.com/docs/tutorials/rag/",
    "https://python.langchain.com/docs/concepts/rag/",
    "https://python.langchain.com/docs/concepts/vectorstores/",
]

print("Loading documents from LangChain docs...")

loader = WebBaseLoader(web_paths=URLS)
docs = loader.load()

docs = [d for d in docs if len(d.page_content.strip()) > 100]

print(f"Loaded {len(docs)} documents")
print(f"Total characters: {sum(len(d.page_content) for d in docs):,}")

Loading documents from LangChain docs...
Loaded 3 documents
Total characters: 39,750


In [18]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True
)

splits = text_splitter.split_documents(docs)

print(f"Split into {len(splits)} chunks")
print(f"\n Example chunk (first 300 chars):")
print("-" * 50)
print(splits[0].page_content[:300])
print("-" * 50)
print(f"\nMetadata: {splits[0].metadata}")

Split into 58 chunks

 Example chunk (first 300 chars):
--------------------------------------------------
Build a RAG agent with LangChain - Docs by LangChainSkip to main contentDocs by LangChain home pageOpen sourceSearch...⌘KAsk AIGitHubTry LangSmithTry LangSmithSearch...NavigationLangChainBuild a RAG agent with LangChainDeep AgentsLangChainLangGraphIntegrationsLearnReferenceContributePythonLearnTutor
--------------------------------------------------

Metadata: {'source': 'https://python.langchain.com/docs/tutorials/rag/', 'title': 'Build a RAG agent with LangChain - Docs by LangChain', 'language': 'en', 'start_index': 0}


## 4. Setup Embeddings (Google Gemini)


In [None]:
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004", 
    google_api_key=GOOGLE_API_KEY
)

test_vector = embeddings.embed_query("What is LangChain?")
print(f"Embedding model ready")
print(f"Embedding dimensions: {len(test_vector)}")

## 5. Setup Pinecone & Index Documents


In [None]:
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore
import time

pc = Pinecone(api_key=PINECONE_API_KEY)

existing_indexes = [idx.name for idx in pc.list_indexes()]
print(f"Existing Pinecone indexes: {existing_indexes}")

if PINECONE_INDEX not in existing_indexes:
    print(f"Creating new index: '{PINECONE_INDEX}'...")
    pc.create_index(
        name=PINECONE_INDEX,
        dimension=768,         
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1" 
        )
    )

    while not pc.describe_index(PINECONE_INDEX).status["ready"]:
        print("Waiting for index to be ready...")
        time.sleep(2)
    print(f"Index '{PINECONE_INDEX}' created!")
else:
    print(f"Index '{PINECONE_INDEX}' already exists, reusing it.")

In [None]:

index = pc.Index(PINECONE_INDEX)
stats = index.describe_index_stats()
vector_count = stats.get("total_vector_count", 0)

print(f"Vectors currently in index: {vector_count}")

if vector_count == 0:
    print(f"⬆Uploading {len(splits)} chunks to Pinecone...")
    vectorstore = PineconeVectorStore.from_documents(
        documents=splits,
        embedding=embeddings,
        index_name=PINECONE_INDEX,
        pinecone_api_key=PINECONE_API_KEY
    )
    print(f"Documents indexed successfully!")
else:
    print("Connecting to existing vectorstore (skipping upload)...")
    vectorstore = PineconeVectorStore(
        index_name=PINECONE_INDEX,
        embedding=embeddings,
        pinecone_api_key=PINECONE_API_KEY
    )
    print("Connected to existing vectorstore!")

## 6. Setup Retriever


In [None]:
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}  
)


test_query = "What is a vector store in LangChain?"
retrieved_docs = retriever.invoke(test_query)

print(f"Query: '{test_query}'")
print(f"Retrieved {len(retrieved_docs)} relevant chunks:\n")

for i, doc in enumerate(retrieved_docs):
    print(f"--- Chunk {i+1} (from: {doc.metadata.get('source', 'unknown')}) ---")
    print(doc.page_content[:200])
    print()

## 7. Setup LLM (Gemini)



In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    google_api_key=GOOGLE_API_KEY,
    temperature=0.3,
)

response = llm.invoke("Say 'RAG system ready!' in one sentence.")
print(f"LM test: {response.content}")

## 8. Build the RAG Chain

Aquí combinamos tres cosas clave, primero un prompt personalizado que le dice a Gemini que responda usando únicamente el contexto recuperado y no información inventada, segundo el retriever que se encarga de buscar en la base vectorial los fragmentos más relevantes según la pregunta del usuario, y tercero el modelo LLM que toma esos fragmentos junto con la pregunta y genera la respuesta final de forma coherente y contextualizada.


In [None]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


RAG_PROMPT = """\
You are an expert assistant on LangChain documentation.
Use ONLY the following retrieved context to answer the question.
If you cannot find the answer in the context, say "I don't have enough context to answer this."
Keep your answer concise and accurate.

Context:
{context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(RAG_PROMPT)

def format_docs(docs):
    """Concatenate retrieved chunks into a single context string."""
    return "\n\n".join(
        f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
        for doc in docs
    )

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("RAG Chain built successfully!")
print("\nChain structure:")
print("   User Question")
print("   Retriever (Pinecone similarity search)")
print("   Prompt Template")
print("   Gemini 1.5 Flash (LLM)")
print("   Final Answer")

## 9. Demo — Ask Questions!


In [None]:
def ask(question: str):
    """Ask the RAG system a question and display the result."""
    print(f"\n{'='*60}")
    print(f"Question: {question}")
    print(f"{'='*60}")
    answer = rag_chain.invoke(question)
    print(f"Answer:\n{answer}")
    return answer

questions = [
    "What is RAG and why is it useful?",
    "What is a vector store and how does it work in LangChain?",
    "How does the retrieval step work in a RAG pipeline?",
    "What is LCEL (LangChain Expression Language)?",
]

for q in questions:
    ask(q)

## 10. Interactive Mode


In [None]:

my_question = "How do I add memory to a RAG chain?"

ask(my_question)

## 11. Inspect Retrieved Context


In [None]:
def ask_with_sources(question: str):
    """Ask a question and show both the answer and the source chunks."""
    print(f"\n{'='*60}")
    print(f"Question: {question}")
    print(f"{'='*60}")
    
    relevant_docs = retriever.invoke(question)
    
    print(f"\nRetrieved {len(relevant_docs)} source chunks:")
    for i, doc in enumerate(relevant_docs):
        print(f"\n  [{i+1}] Source: {doc.metadata.get('source', 'unknown')}")
        print(f"       Preview: {doc.page_content[:150]}...")
    
    answer = rag_chain.invoke(question)
    print(f"\n Answer:\n{answer}")

ask_with_sources("What components make up a RAG system?")