# RAG Introduction with LangChain 1.0 (LCEL) and Pinecone

This notebook demonstrates the complete RAG pipeline using **LangChain Expression Language (LCEL)**: document loading, chunking, embeddings, vector database setup, and retrieval-augmented generation.

## Setup and Installation

# Install required packages
!pip install langchain langchain-openai langchain-pinecone langchain-text-splitters pinecone-client pypdf python-dotenv -q

In [None]:
# Import necessary libraries
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Verify API keys are loaded
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in environment variables. Please create a .env file with your API key.")
if not os.getenv("PINECONE_API_KEY"):
    raise ValueError("PINECONE_API_KEY not found in environment variables. Please create a .env file with your API key.")

## Section 1: Document Loading and Chunking

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter, CharacterTextSplitter

# Load a PDF document (replace with your PDF path, do it together with the students on the second pass)
# loader = PyPDFLoader("path/to/your/document.pdf")
# documents = loader.load()

# For demonstration, create sample documents
from langchain_core.documents import Document
documents = [
    Document(page_content="Machine learning is a subset of artificial intelligence that enables systems to learn from data."),
    Document(page_content="Deep learning uses neural networks with multiple layers to process complex patterns."),
    Document(page_content="Natural language processing allows computers to understand and generate human language.")
]

In [None]:
# Chunking strategy 1: RecursiveCharacterTextSplitter (recommended)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len
)

chunks = text_splitter.split_documents(documents)
print(f"Number of chunks: {len(chunks)}")
print(f"First chunk: {chunks[0].page_content[:100]}...")

**Scoping Insight**: Chunking matters when documents are large or structured. For simple use cases with small documents, you might skip chunking entirely. Recognize when chunking adds value vs when it's unnecessary complexity.

## Section 2: Embeddings

In [None]:
from langchain_openai import OpenAIEmbeddings

# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create embeddings for a sample text
sample_text = "Machine learning enables systems to learn from data"
sample_embedding = embeddings.embed_query(sample_text)
print(f"Embedding dimension: {len(sample_embedding)}")
print(f"First 5 values: {sample_embedding[:5]}")

**Scoping Insight**: Embedding costs add up with large document collections. Consider cheaper embedding models for MVPs, and upgrade only when quality matters. Understand the cost implications before committing to a solution.

## Section 3: Pinecone Vector Store Setup

In [None]:
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore

# Initialize Pinecone client
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create or connect to an index
index_name = "rag-intro-index"

# Check if index exists, create if not
if index_name not in [index.name for index in pc.list_indexes()]:
    pc.create_index(
        name=index_name,
        dimension=1536,  # OpenAI text-embedding-3-small dimension
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )
    print(f"Created index: {index_name}")
else:
    print(f"Index {index_name} already exists")

In [None]:
# Create vector store using LangChain
vectorstore = PineconeVectorStore.from_documents(
    documents=chunks,
    embedding=embeddings,
    index_name=index_name
)

print("Documents added to Pinecone vector store")

**Scoping Insight**: Pinecone is powerful but adds infrastructure complexity and cost. For small projects or MVPs, consider simpler alternatives like in-memory vector stores or Chroma. Use Pinecone when you need scale, performance, or managed infrastructure.

## Section 4: Query and Retrieval

In [None]:
# Perform a similarity search
query = "What is machine learning?"
results = vectorstore.similarity_search(query, k=2)

print(f"Query: {query}")
print(f"\nRetrieved {len(results)} documents:")
for i, doc in enumerate(results, 1):
    print(f"\n{i}. {doc.page_content}")

In [None]:
# Get similarity scores
results_with_scores = vectorstore.similarity_search_with_score(query, k=2)

print(f"Query: {query}")
print(f"\nRetrieved documents with scores:")
for doc, score in results_with_scores:
    print(f"\nScore: {score:.4f}")
    print(f"Content: {doc.page_content}")

**Scoping Insight**: Retrieval quality varies with chunking strategy and embedding model. Test retrieval before building the full RAG system. If retrieval consistently fails, the problem might be with chunking or embeddings, not the LLM.

## Section 5: Complete RAG Implementation with LCEL

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create a RAG prompt template
rag_prompt = ChatPromptTemplate.from_template(
    """Answer the question based only on the following context:

{context}

Question: {question}

Answer:"""
)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# Helper to format retrieved documents into a single string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build the LCEL RAG chain using the pipe (|) operator
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Ask a question
question = "What is machine learning?"
response = rag_chain.invoke(question)
print(f"Question: {question}")
print(f"\nAnswer: {response}")

**Scoping Insight**: LCEL makes RAG chains composable and transparent — each step (retrieval → formatting → prompting → LLM → parsing) is explicit. This is simpler to debug and extend than legacy chain types. However, RAG still adds complexity over simple API calls. Use it when you need to query large document collections or provide domain-specific knowledge.

## Section 6: Comparison: With vs Without RAG

In [None]:
# Without RAG: Direct API call
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

simple_response = llm.invoke("What is machine learning?")
print("Without RAG (direct API call):")
print(simple_response.content)

In [None]:
# With RAG: Context from vector database (LCEL chain)
rag_response = rag_chain.invoke("What is machine learning?")
print("With RAG (retrieved context):")
print(rag_response)

**Scoping Insight**: Compare the complexity and cost of both approaches. RAG is powerful but requires infrastructure, embeddings, and retrieval logic. Simple API calls work for many use cases. Recognize when the added complexity of RAG is justified by the requirements.