### **LangChain Tutorial: From Fundamentals to Advanced RAG**

This tutorial provides a comprehensive overview of LangChain, covering the essential concepts from basic setup to building a sophisticated Retrieval-Augmented Generation (RAG) system.

### **1. Introduction and Setup**

**What is LangChain?**

LangChain is a framework for developing applications powered by large language models (LLMs). It simplifies the entire lifecycle of LLM application development, including development, productionization, and deployment.

**Core Concepts:**

* **LangChain Expression Language (LCEL):** A declarative way to compose chains, offering features like streaming, batching, and async support. The `|` (pipe) operator is central to LCEL, allowing you to chain components together.
* **Components:** LangChain provides modular components for building applications, including:
    * **Models:** Interfaces to various language models (e.g., OpenAI, Anthropic, Google).
    * **Prompts:** Templates for generating prompts for LLMs.
    * **Output Parsers:** Tools to structure the output from LLMs.
* **Retrieval-Augmented Generation (RAG):** A powerful technique to connect LLMs to external data sources, enhancing their knowledge and providing more accurate, up-to-date responses.

**Installation**

First, let's install the necessary packages.

In [None]:
# Install core LangChain and provider-specific packages
!pip install langchain langchain-core langchain-community langchain-openai langchain-chroma faiss-cpu pypdf sentence-transformers tiktoken -q

# Install environment management and web request libraries
!pip install python-dotenv requests beautifulsoup4 -q

**Environment Setup**

Configure your API keys. It's recommended to use environment variables for security.

In [3]:
import os
import getpass
from dotenv import load_dotenv

# Load environment variables from a .env file if it exists
load_dotenv()

# Set up your OpenAI API key
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

print("✅ API keys configured!")

✅ API keys configured!


### **2. Building Your First Chain with LCEL**

A "chain" in LangChain is a sequence of calls to components. We'll use the LangChain Expression Language (LCEL) to build a simple chain.

**Initialize the Model**

We'll start by initializing a chat model.

In [4]:
from langchain.chat_models import init_chat_model

# Initialize a chat model from OpenAI
model = init_chat_model("gpt-4o-mini", model_provider="openai", temperature=0.7)

print(f"✅ Initialized model: {model.__class__.__name__}")

✅ Initialized model: ChatOpenAI


**Work with Prompt Templates**

Prompt templates allow you to create reusable and parameterized prompts.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Create a prompt template
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in {expertise}."),
    ("human", "Explain {topic} in a simple and concise way.")
])

# Create a simple chain using LCEL
# The chain will:
# 1. Take user input for 'expertise' and 'topic'.
# 2. Format the prompt using the prompt_template.
# 3. Pass the formatted prompt to the model.
# 4. Parse the model's output to a string.
simple_chain = prompt_template | model | StrOutputParser()

# Invoke the chain
response = simple_chain.invoke({
    "expertise": "physics",
    "topic": "quantum entanglement"
})

print(response)

**Streaming Responses**

For a better user experience, you can stream the model's response as it's generated.

In [None]:
print("🌊 Streaming response:")
for chunk in simple_chain.stream({
    "expertise": "culinary arts",
    "topic": "the Maillard reaction"
}):
    print(chunk, end="", flush=True)

### **3. Structured Output and Advanced Chains**

LangChain can parse model outputs into structured formats and build more complex, conditional chains.

**Pydantic Output Parser**

Use Pydantic models to define the desired output structure.

In [None]:
from typing import List
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

# Define a Pydantic model for structured output
class Recipe(BaseModel):
    name: str = Field(description="The name of the recipe")
    ingredients: List[str] = Field(description="A list of ingredients")
    steps: List[str] = Field(description="The steps to prepare the recipe")

# Create a Pydantic output parser
pydantic_parser = PydanticOutputParser(pydantic_object=Recipe)

# Create a prompt that includes format instructions
structured_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world-class chef. Generate a recipe based on the user's request and format it as requested."),
    ("human", "I want a simple recipe for {dish}.\n\n{format_instructions}")
])

# Create the structured output chain
structured_chain = structured_prompt | model | pydantic_parser

# Invoke the chain
recipe_request = {
    "dish": "scrambled eggs",
    "format_instructions": pydantic_parser.get_format_instructions()
}
recipe_output = structured_chain.invoke(recipe_request)

print(f"Recipe for: {recipe_output.name}")
print("\nIngredients:")
for ingredient in recipe_output.ingredients:
    print(f"- {ingredient}")

**Conditional Chains with `RunnableBranch`**

Create dynamic chains that change their behavior based on input conditions.

In [None]:
from langchain_core.runnables import RunnableBranch

# Define different prompts for different levels
beginner_prompt = ChatPromptTemplate.from_template("Explain {topic} to a complete beginner.")
expert_prompt = ChatPromptTemplate.from_template("Provide a detailed, technical explanation of {topic}.")

# Create a conditional chain using RunnableBranch
# This chain checks the 'level' input and routes to the appropriate prompt
conditional_chain = (
    RunnableBranch(
        (lambda x: x.get("level") == "expert", expert_prompt),
        beginner_prompt  # Default prompt
    )
    | model
    | StrOutputParser()
)

# Test the beginner path
beginner_response = conditional_chain.invoke({"topic": "black holes", "level": "beginner"})
print("--- Beginner Explanation ---")
print(beginner_response)

# Test the expert path
expert_response = conditional_chain.invoke({"topic": "black holes", "level": "expert"})
print("\n--- Expert Explanation ---")
print(expert_response)

### **4. Retrieval-Augmented Generation (RAG)**

RAG connects your LLM to external data, allowing it to answer questions about information it wasn't trained on.

**Step 1: Document Loading and Splitting**

Load data from a source (like a website) and split it into smaller chunks for processing.

In [None]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load documents from a web page
loader = WebBaseLoader("https://python.langchain.com/docs/modules/model_io/prompts/")
docs = loader.load()

# Initialize a text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

# Split the documents into chunks
splits = text_splitter.split_documents(docs)

print(f"Loaded {len(docs)} documents and split them into {len(splits)} chunks.")

**Step 2: Embeddings and Vector Stores**

Convert the text chunks into numerical representations (embeddings) and store them in a vector database for efficient searching.

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create a Chroma vector store from the document splits
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

print("✅ Vector store created.")

**Step 3: Creating a Retriever**

A retriever is responsible for finding the most relevant document chunks based on a user's query.

In [None]:
# Create a retriever from the vector store
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Test the retriever
query = "What are prompt templates?"
retrieved_docs = retriever.invoke(query)

print(f"Retrieved {len(retrieved_docs)} documents for the query: '{query}'")

**Step 4: Building the RAG Chain**

Now, let's combine the retriever with a prompt and the LLM to create a complete RAG chain.

In [None]:
from langchain_core.runnables import RunnablePassthrough

# Define a RAG prompt template
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the following context to answer the user's question.\n\nContext:\n{context}"),
    ("human", "{question}")
])

# Helper function to format the retrieved documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Create the RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

# Test the RAG chain
rag_question = "How can I use few-shot examples in my prompts?"
rag_answer = rag_chain.invoke(rag_question)

print(f"\n❓ Question: {rag_question}")
print(f"🎯 Answer: {rag_answer}")