## LangChain Expression Language (LCEL)

**L**ang**C**hain **E**xpression **L**anguage (LCEL) is the recommended approach to building chains in LangChain. Having superceeded the traditional methods with `LLMChain`, etc. LCEL gives us a more flexible system for building chains. The pipe operator `|` is used by LCEL to _chain_ together components. Let's see how we'd construct an `LLMChain` using LCEL.

The `LLMChain` is the simplest chain originally introduced in LangChain (now deprecated). This chain takes a prompt, feeds it into an LLM, and _optionally_ adds an output parsing step before returning the result.

* `prompt` — a `PromptTemplate` that will be used to generate the prompt for the LLM.
* `llm` — the LLM we will be using to generate the output.
* `output_parser` — an optional output parser that will be used to parse the structured output of the LLM.

In [2]:
import os
from getpass import getpass
from dotenv import load_dotenv

# Load variables from .env into environment
load_dotenv()

os.environ['LANGSMITH_TRACING'] = 'true'
os.environ['LANGSMITH_ENDPOINT'] = "https://eu.api.smith.langchain.com "
os.environ['LANGSMITH_API_KEY'] =  os.getenv('LANGSMITH_API_KEY') or getpass('Enter your LangSmith API Key: ')
os.environ['LANGSMITH_PROJECT'] = 'LCEL'

In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI

os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY") or getpass(
    "Enter GOOGLE API Key: "
)

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",      
    temperature=0.0)

In [5]:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    template="Give me a small report on {topic}?"
)

StrOutputParser: 
It parses the output of an LLM or ChatModel into a simple string (extracts and returns just the text).
Useful for chains where you want the "plain" text content without metadata or objects.

In [8]:
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()

In [11]:
lcel_chain = prompt | llm | output_parser

In [12]:
result = lcel_chain.invoke({"topic": "RAG in LangChain"})
print(result)

## Small Report: Retrieval Augmented Generation (RAG) in LangChain

### Introduction to RAG

Retrieval Augmented Generation (RAG) is a powerful technique that enhances the capabilities of Large Language Models (LLMs) by allowing them to access and incorporate external, up-to-date, or proprietary information during the generation process. Traditional LLMs are limited to the data they were trained on, which can lead to hallucinations, outdated information, or an inability to answer questions about specific, private datasets. RAG addresses these limitations by first retrieving relevant information from a knowledge base and then using that information to ground the LLM's response.

### LangChain's Role in RAG

LangChain is a framework designed to simplify the development of applications powered by LLMs. It provides a modular and extensible set of tools and components that make building complex LLM workflows, including RAG pipelines, significantly easier. LangChain abstracts away much of th

We can view a formatted version of this output using the `Markdown` display:

In [14]:
from IPython.display import display, Markdown
display(Markdown(result))

## Small Report: Retrieval Augmented Generation (RAG) in LangChain

### Introduction to RAG

Retrieval Augmented Generation (RAG) is a powerful technique that enhances the capabilities of Large Language Models (LLMs) by allowing them to access and incorporate external, up-to-date, or proprietary information during the generation process. Traditional LLMs are limited to the data they were trained on, which can lead to hallucinations, outdated information, or an inability to answer questions about specific, private datasets. RAG addresses these limitations by first retrieving relevant information from a knowledge base and then using that information to ground the LLM's response.

### LangChain's Role in RAG

LangChain is a framework designed to simplify the development of applications powered by LLMs. It provides a modular and extensible set of tools and components that make building complex LLM workflows, including RAG pipelines, significantly easier. LangChain abstracts away much of the complexity involved in integrating various components like document loaders, text splitters, embedding models, vector stores, and LLMs.

### Key Components and Workflow of RAG in LangChain

A typical RAG pipeline in LangChain involves two main phases: **Indexing** (preparing the knowledge base) and **Retrieval & Generation** (answering user queries).

#### Phase 1: Indexing (Building the Knowledge Base)

1.  **Document Loading:** LangChain offers a wide array of `DocumentLoaders` to ingest data from various sources (PDFs, websites, databases, Notion, etc.). These loaders convert raw data into `Document` objects, which typically contain page content and metadata.
2.  **Text Splitting:** LLMs have context window limitations, and embeddings work best on smaller, coherent chunks of text. `TextSplitters` (e.g., `RecursiveCharacterTextSplitter`) break down large documents into smaller, manageable chunks, ensuring semantic integrity.
3.  **Embedding Generation:** Each text chunk is converted into a numerical vector (an embedding) using an `Embeddings` model (e.g., OpenAI Embeddings, HuggingFace Embeddings). These embeddings capture the semantic meaning of the text.
4.  **Vector Store Storage:** The generated embeddings, along with their corresponding original text chunks, are stored in a `VectorStore` (e.g., Chroma, FAISS, Pinecone, Weaviate). This vector store acts as a searchable index for efficient retrieval.

#### Phase 2: Retrieval & Generation (Answering Queries)

1.  **User Query:** A user submits a question or prompt.
2.  **Query Embedding:** The user's query is also converted into an embedding using the *same* embedding model used during indexing.
3.  **Retrieval:** The query embedding is used to perform a similarity search in the `VectorStore`. The `VectorStore` returns the top-k most semantically similar text chunks (documents) from the knowledge base. LangChain's `Retriever` interface standardizes this process.
4.  **Prompt Augmentation:** The retrieved relevant documents are then combined with the original user query to construct an augmented prompt. This prompt provides the LLM with the necessary context to formulate an accurate and grounded answer.
5.  **LLM Generation:** The augmented prompt is sent to an `LLM` (e.g., `ChatOpenAI`, `HuggingFaceHub`). The LLM uses the provided context to generate a coherent and informed response.
6.  **Orchestration (Chains/LCEL):** LangChain's `Chains` (like `RetrievalQA` or `create_retrieval_chain`) or the more flexible `LangChain Expression Language (LCEL)` are used to seamlessly connect all these steps. LCEL allows for building custom, highly optimized, and streaming-capable RAG pipelines with clear component separation.

### Benefits of RAG with LangChain

*   **Reduced Hallucinations:** LLMs are grounded in factual, retrieved information.
*   **Access to Private/Up-to-Date Data:** Enables LLMs to answer questions beyond their training data.
*   **Transparency:** Allows users to see the source documents used to generate the answer.
*   **Modularity:** LangChain's component-based architecture makes it easy to swap out different loaders, splitters, embedding models, vector stores, and LLMs.
*   **Rapid Prototyping:** Simplifies the development and iteration of complex RAG applications.
*   **Scalability:** Supports various vector stores suitable for different scales and deployment needs.

### Conclusion

RAG is a transformative approach for building robust and reliable LLM applications, and LangChain stands out as a premier framework for implementing it. By providing a comprehensive toolkit for data ingestion, indexing, retrieval, and LLM integration, LangChain empowers developers to create intelligent systems that leverage external knowledge effectively, pushing the boundaries of what LLMs can achieve.

### How Does the Pipe Operator Work?

Before moving onto other LCEL features, let's take a moment to understand what the pipe operator `|` is doing and _how_ it works.

Functionality wise, the pipe tells you that whatever the _left_ side outputs will be fed as input into the _right_ side. In the example of `prompt | llm | output_parser`, we see that `prompt` feeds into `llm` feeds into `output_parser`.

The pipe operator is a way of chaining together components, and is a way of saying that whatever the _left_ side outputs will be fed as input into the _right_ side.

Let's make a basic class named `Runnable` that will transform our a provided function into a _runnable_ class that we will then use with the pipe `|` operator.

In [19]:
class Runnable:
    def __init__(self, func):
        self.func = func

    def __or__(self, other):
        def chained_func(*args, **kargs):
            return other.invoke(self.func(*args, **kargs))
        return Runnable(chained_func)
    
    def invoke(self, *args, **kargs):
        return self.func(*args, **kargs)
        

### 🧩 What This Class Does — In Simple Terms

`Runnable` lets you **chain functions together** using the `|` (pipe) operator — just like you do in Unix commands or Pandas pipelines.

So instead of:
```python
output = func3(func2(func1(data)))

You can write:

pipeline = Runnable(func1) | Runnable(func2) | Runnable(func3)
result = pipeline.invoke(data)

✅ It’s cleaner
✅ It’s modular
✅ It builds a chain of steps

### Breakdown of Each Part

#### 1️⃣ The Constructor 
```python
def __init__(self, func):
    self.func = func
```

When you create a Runnable, you give it a function, and it stores that in self.func.

#### 2️⃣ The invoke Method
```python
def invoke(self, *args, **kargs):
    return self.func(*args, **kargs)
```
This just calls the stored function.

invoke() is a wrapper that always executes the function the same way, which helps when chaining multiple Runnable objects.

#### 3️⃣ The __or__ Method
```python
def __or__(self, other):
    def chained_func(*args, **kargs):
        return other.invoke(self.func(*args, **kargs))
    return Runnable(chained_func)
```
This defines what happens when you use the | operator (pipe).

When you write:

A | B


Python calls:
```python
A.__or__(B)
```

Here’s what it does:

1. Defines a new function chained_func() that:
    * Runs A’s function first (self.func)
    * Passes its output to B’s invoke() method
2.  Returns a new Runnable containing this chained function.

This effectively builds a pipeline where:

input → A.func → B.func → output



### Why LangChain Uses This Pattern

In LangChain, this Runnable pattern allows you to build data flow pipelines easily:
```
{
    "input": lambda x: x["input"],
    "history": lambda x: x["chat_history"]
} | prompt | llm
```

Each | connects one processing step to the next:

Extract data → format a prompt → call the model → return result

### Summary
```
__init__: Stores the function
invoke(): Calls the stored function
__or__(): Chains this Runnable with another one so their functions run sequentially
```

With the `Runnable` class, we will be able to wrap a function into the class, allowing us to then chain together multiple of these _runnable_ functions using the `__or__` method.

First, let's create a few functions that we'll chain together:

In [16]:
def add_five(x):
    return x+5

def sub_five(x):
    return x-5

def mul_five(x):
    return x*5

Now we wrap our functions with the `Runnable`:

In [20]:
add_five_runnable = Runnable(add_five)
sub_five_runnable = Runnable(sub_five)
mul_five_runnable = Runnable(mul_five)

Finally, we can chain these together using the `__or__` method from the `Runnable` class:

In [21]:
chain = add_five_runnable.__or__(sub_five_runnable).__or__(mul_five_runnable)
chain.invoke(10)  # ((10 + 5) - 5) * 5 = 50

50

So we can see that we're able to chain together our functions using `__or__`. The pipe `|` operator is simply a shortcut for the `__or__` method, so we can create the exact same chain like so:

In [23]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

chain.invoke(10)

50

## LCEL `RunnableLambda`

The `RunnableLambda` class is LangChain's built-in method for constructing a _runnable_ object from a function. That is, it does the same thing as the custom `Runnable` class we created earlier. Let's try it out with the same functions as before.

In [29]:
from langchain_core.runnables import RunnableLambda

add_five_runnable = RunnableLambda(add_five)
sub_five_runnable = RunnableLambda(sub_five)
mul_five_runnable = RunnableLambda(mul_five)

We chain these together again with the pipe `|` operator:

In [30]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

And call them using the `invoke` method:

In [31]:
chain.invoke(10)

50

Those are our `RunnableLambda` functions. It's worth noting that all inputs to these functions are expected to be a SINGLE arguments. If you have a function that accepts multiple arguments, you can input a dictionary with keys, then unpack them inside the function.

## LCEL `RunnableParallel` and `RunnablePassthrough`

LCEL provides us with various `Runnable` classes that allow us to control the flow of data and execution order through our chains. Two of these are `RunnableParallel` and `RunnablePassthrough`.

* `RunnableParallel` — allows us to run multiple `Runnable` instances in parallel. Acting almost as a Y-fork in the chain.

* `RunnablePassthrough` — allows us to pass through a variable to the next `Runnable` without modification.

To see these runnables in action, we will create two data sources, each source provides specific information but to answer the question we will need both to fed to the LLM.

In [53]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import DocArrayInMemorySearch

#HuggingFaceEmbeddings creates vector embeddings for text using a free, local model.
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vecstore_a = DocArrayInMemorySearch.from_texts(
    [
        "half the info is here",
        "DeepSeek-V3 was released in December 2024",
        "dog is a domestic animal"
    ],
    embedding=embedding_model
)
vecstore_b = DocArrayInMemorySearch.from_texts(
    [
        "the other half of the info is here",
        "the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters"
    ],
    embedding=embedding_model
)


DocArrayInMemorySearch builds two in-memory vector databases:
* vecstore_a: "half the info is here", "DeepSeek-V3 was released in December 2024"
* vecstore_b: "the other half...", "the DeepSeek-V3 LLM is a mixture of experts..."

Each store can retrieve the most relevant chunk for a user question using semantic similarity.

In [39]:
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(
        """Using the context provided, answer the user's question.
        Context:
        {context_a}
        {context_b}"""
    ),
    HumanMessagePromptTemplate.from_template("{question}")
])


Here we are wrapping our vector stores as retrievers so they can be fitted into one big retrieval variable to be used by the prompt.

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

retriever_a = vecstore_a.as_retriever(search_kwargs={"k": 1}) # the top k documents with the highest similarity.
retriever_b = vecstore_b.as_retriever()

retrieval = RunnableParallel(
    {
        "context_a" : retriever_a, "context_b" : retriever_b, "question": RunnablePassthrough()
    }
)

RunnableParallel: Runs multiple Runnables (functions, retrievers, etc.) in parallel.

* context_a = retriever_a gets the most relevant doc(s) from vecstore_a

* context_b = retriever_b gets from vecstore_b

* question = RunnablePassthrough() simply returns the original input unchanged.

Mapping:
When you invoke this chain with a question (e.g., "what architecture does the model DeepSeek released in december use?"):

This string is:

* Passed to both retrievers as the query, getting matches for each.

* Also passed as is to "question" via RunnablePassthrough.

The result of retrieval is a dictionary:

In [60]:
retrieval.invoke("what architecture does the model DeepSeek released in december use?")

{'context_a': [Document(metadata={}, page_content='DeepSeek-V3 was released in December 2024')],
 'context_b': [Document(metadata={}, page_content='the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters'),
  Document(metadata={}, page_content='the other half of the info is here')],
 'question': 'what architecture does the model DeepSeek released in december use?'}

This dictionary is used to fill the template variables in your prompt.

llm generates an answer based on the expanded prompt.

output_parser (e.g., StrOutputParser()) extracts the plain text from the LLM response.

The chain will look something like this:

![](https://github.com/aurelio-labs/langchain-course/blob/main/assets/lcel-flow.png?raw=1)

In [64]:
chain = retrieval | prompt | llm | output_parser

chain.invoke("what architecture does the model DeepSeek released in december use?")


'The DeepSeek model released in December (DeepSeek-V3) uses a mixture of experts architecture.'

With that we've seen how we can use `RunnableParallel` and `RunnablePassthrough` to control the flow of data and execution order through our chains.

---

#### Vector Database

A **vector database** (or vector store) is a database specialized for storing vector representations (**embeddings**) of documents, chunks, or other items.

- **Semantic Search:** Lets you search for similar items based on vector similarity (not just keywords or exact match).
- **Example:** Given a user query, you can find the most *semantically* similar passage, even if it uses different words or phrasing.
- **Examples of vector database backends:** FAISS, Pinecone, Chroma, Weaviate, DocArrayInMemorySearch.

**In your code:**
vecstore_a = DocArrayInMemorySearch.from_texts([...], embedding=embedding_model)

Here, you are storing documents internally as embeddings, which enables semantic search.

---

#### `.as_retriever()`

The `.as_retriever()` method converts a vector store object into a **retriever interface**.

- Allows you to use the vector store in a standard way for question answering, RAG, etc.
- **Retriever objects** in LangChain are *callable*: you can `.invoke(question)` to return relevant chunks.
- Includes configuration like `search_kwargs` for how many documents to fetch (`k`), filters, retrieval mode, etc.

**In your code:**
retriever_a = vecstore_a.as_retriever()
output = retriever_a.invoke("What are DeepSeek-V3's details?")

This finds the *k* most similar documents from `vecstore_a` based on embedding similarity to your query.

In [None]:
retriever_a.invoke("What are DeepSeek-V3's details?") # k = 1

[Document(metadata={}, page_content='DeepSeek-V3 was released in December 2024')]

In [63]:
retriever_b.invoke("What are DeepSeek-V3's details?") # k is not set

[Document(metadata={}, page_content='the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters'),
 Document(metadata={}, page_content='the other half of the info is here')]