[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/langchain-course/blob/main/chapters/07-lcel.ipynb)

#### LangChain Essentials Course

# LangChains Expression Language

LangChain is one of the most popular open source libraries for AI Engineers. It's goal is to abstract away the complexity in building AI software, provide easy-to-use building blocks, and make it easier when switching between AI service providers.

In this example, we will introduce LangChain's Expression Langauge (LCEL), abstracting a full chain and understanding how it will work. We'll provide examples for both OpenAI's `gpt-4o-mini` *and* Meta's `llama3.2` via Ollama!

In [1]:
!pip install -qU \
  langchain-core \
  langchain-google-genai \
  langchain-community \
  langsmith \
  google-generativeai \
  docarray==0.40.0

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/270.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.2/270.2 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/441.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m441.6/441.6 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m70.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m367.9/367.9 kB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

---

> ⚠️ We will be using OpenAI for this example allowing us to run everything via API. If you would like to use Ollama instead, check out the [Ollama LangChain Course](https://github.com/aurelio-labs/langchain-course/tree/main/notebooks/ollama).

---

---

> ⚠️ If using LangSmith, add your API key below:

In [2]:
import os
from getpass import getpass

# LangSmith Setup (optional, for observability)
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY") or \
    getpass("Enter LangSmith API Key: ")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] = "aurelioai-langchain-course-prompts-gemini"

# Google Gemini API Key
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY") or \
    getpass("Enter Google API Key: ")

Enter LangSmith API Key: ··········
Enter Google API Key: ··········


---

## Traditional Chains vs LCEL

In this section we're going to dive into a basic example using the traditional method for building chains before jumping into LCEL. We will build a pipeline where the user must input a specific topic, and then the LLM will look and return a report on the specified topic. Generating a _research report_ for the user.

### Traditional LLMChain

The `LLMChain` is the simplest chain originally introduced in LangChain. This chain takes a prompt, feeds it into an LLM, and _optionally_ adds an output parsing step before returning the result.

Let's see how we construct this using the traditional method, for this we need:

* `prompt` — a `PromptTemplate` that will be used to generate the prompt for the LLM.
* `llm` — the LLM we will be using to generate the output.
* `output_parser` — an optional output parser that will be used to parse the structured output of the LLM.

In [3]:
from langchain import PromptTemplate

prompt_template = "Give me a small report on {topic}"

prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_template
)

For the LLM, we'll start by initializing our connection to the OpenAI API. We do need an OpenAI API key, which you can get from the [OpenAI platform](https://platform.openai.com/api-keys).

We will use the `gpt-4o-mini` model with a `temperature` of `0.0`:

In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(temperature=0.0, model="gemini-2.5-flash")

In [5]:
llm_out = llm.invoke("Hello there")
llm_out

AIMessage(content='Hello there! How can I help you today?', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}, id='run--1c87a0fe-9ec1-4a79-80dd-ea916aa01375-0', usage_metadata={'input_tokens': 3, 'output_tokens': 10, 'total_tokens': 49, 'input_token_details': {'cache_read': 0}})

Then we define our output parser, this will be used to parse the output of the LLM. In this case, we will use the `StrOutputParser` which will parse the `AIMessage` output from our LLM into a single string.

In [6]:
from langchain.schema.output_parser import StrOutputParser

output_parser = StrOutputParser()

In [7]:
out = output_parser.invoke(llm_out)
out

'Hello there! How can I help you today?'

Through the `LLMChain` class we can place each of our components into a linear `chain`.

In [8]:
from langchain.chains import LLMChain

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

  chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)


Note that the `LLMChain` _was_ deprecated in LangChain `0.1.17`, the expected way of constructing these chains today is through LCEL, which we'll cover in a moment.

We can `invoke` our `chain`, providing a `topic` that we'd like to be researched.

In [9]:
result = chain.invoke("retrieval augmented generation")
result

{'topic': 'retrieval augmented generation',
 'text': '## Small Report: Retrieval Augmented Generation (RAG)\n\n**Title:** Retrieval Augmented Generation (RAG): Enhancing LLM Accuracy and Relevance\n\n**Introduction:**\nRetrieval Augmented Generation (RAG) is a paradigm-shifting technique designed to enhance the capabilities of Large Language Models (LLMs) by providing them with access to external, up-to-date, and domain-specific information. While LLMs are powerful in generating human-like text, they often suffer from "hallucinations" (generating factually incorrect information), outdated knowledge, or a lack of specific context for niche queries. RAG addresses these limitations by combining the generative power of LLMs with the precision of information retrieval systems.\n\n**How it Works:**\nRAG operates in three primary steps:\n\n1.  **Retrieval:** When a user poses a query, the RAG system first searches a vast external knowledge base (e.g., a database of documents, articles, intern

We can view a formatted version of this output using the `Markdown` display:

In [10]:
from IPython.display import display, Markdown

display(Markdown(result["text"]))

## Small Report: Retrieval Augmented Generation (RAG)

**Title:** Retrieval Augmented Generation (RAG): Enhancing LLM Accuracy and Relevance

**Introduction:**
Retrieval Augmented Generation (RAG) is a paradigm-shifting technique designed to enhance the capabilities of Large Language Models (LLMs) by providing them with access to external, up-to-date, and domain-specific information. While LLMs are powerful in generating human-like text, they often suffer from "hallucinations" (generating factually incorrect information), outdated knowledge, or a lack of specific context for niche queries. RAG addresses these limitations by combining the generative power of LLMs with the precision of information retrieval systems.

**How it Works:**
RAG operates in three primary steps:

1.  **Retrieval:** When a user poses a query, the RAG system first searches a vast external knowledge base (e.g., a database of documents, articles, internal company data, or the internet). It identifies and retrieves the most relevant pieces of information, often called "chunks" or "documents," that are pertinent to the query. This knowledge base can be continuously updated, ensuring the information is current.

2.  **Augmentation:** The retrieved information is then used to "augment" or enrich the original user query. Instead of just sending the raw query to the LLM, the system constructs a new, more comprehensive prompt that includes both the user's question and the relevant context retrieved from the external source.

3.  **Generation:** Finally, this augmented prompt is fed into the LLM. The LLM then generates a response, but critically, it does so by grounding its answer in the provided retrieved context. This significantly reduces the likelihood of hallucinations and ensures the response is factual, relevant, and specific to the information found.

**Key Benefits:**

*   **Enhanced Accuracy & Factuality:** Reduces the risk of LLMs generating incorrect or fabricated information by providing verifiable sources.
*   **Access to Up-to-Date Information:** Bypasses the LLM's training data cutoff, allowing it to incorporate real-time or frequently updated data.
*   **Domain Specificity:** Enables LLMs to answer questions about proprietary, internal, or highly specialized knowledge that wasn't part of their original training.
*   **Reduced Training Costs:** Eliminates the need for expensive and time-consuming retraining (fine-tuning) of LLMs every time new information becomes available.
*   **Transparency & Attribution:** Can often provide citations or links to the source documents from which the information was retrieved, increasing user trust.
*   **Improved Explainability:** Makes it easier to understand *why* an LLM generated a particular answer, as the source context is explicit.

**Challenges & Considerations:**

*   **Retrieval Quality:** The effectiveness of RAG heavily depends on the quality of the retrieval step. Poorly retrieved information will lead to poor generation ("garbage in, garbage out").
*   **Context Window Limits:** LLMs have limits on how much text they can process in a single prompt, which can restrict the amount of retrieved context.
*   **Complexity:** Building and maintaining a robust RAG system, including data indexing, chunking strategies, and retrieval algorithms, can be complex.
*   **Latency:** The retrieval step adds an extra processing layer, potentially increasing response times compared to a standalone LLM.

**Applications:**
RAG is being widely adopted across various sectors, including:

*   **Customer Support:** Providing accurate answers from extensive knowledge bases.
*   **Enterprise Search:** Enabling employees to quickly find specific information within internal documents.
*   **Healthcare:** Answering medical queries based on the latest research and patient records.
*   **Legal:** Summarizing case law and retrieving relevant statutes.
*   **Education:** Creating personalized learning experiences with up-to-date content.

**Conclusion:**
Retrieval Augmented Generation represents a significant leap forward in making LLMs more reliable, trustworthy, and practical for real-world applications. By intelligently combining the strengths of information retrieval with the generative power of large language models, RAG mitigates key limitations of standalone LLMs, paving the way for more accurate, contextually relevant, and verifiable AI-driven solutions.

That is a simple `LLMChain` using the traditional LangChain method. Now let's move onto LCEL.

## LangChain Expression Language (LCEL)

**L**ang**C**hain **E**xpression **L**anguage (LCEL) is the recommended approach to building chains in LangChain. Having superceeded the traditional methods with `LLMChain`, etc. LCEL gives us a more flexible system for building chains. The pipe operator `|` is used by LCEL to _chain_ together components. Let's see how we'd construct an `LLMChain` using LCEL.

In [11]:
lcel_chain = prompt | llm | output_parser

We can `invoke` this chain in the same way as we did before:

In [12]:
result = lcel_chain.invoke("retrieval augmented generation")
result

'## Small Report: Retrieval Augmented Generation (RAG)\n\n**Title:** Retrieval Augmented Generation (RAG): Enhancing LLM Accuracy and Relevance\n\n**Introduction:**\nRetrieval Augmented Generation (RAG) is a paradigm-shifting technique designed to enhance the capabilities of Large Language Models (LLMs) by providing them with access to external, up-to-date, and domain-specific information. While LLMs are powerful in generating human-like text, they often suffer from "hallucinations" (generating factually incorrect information), outdated knowledge, or a lack of specific context for niche queries. RAG addresses these limitations by combining the generative power of LLMs with the precision of information retrieval systems.\n\n**How it Works:**\nRAG operates in three primary steps:\n\n1.  **Retrieval:** When a user poses a query, the RAG system first searches a vast external knowledge base (e.g., a database of documents, articles, internal company data, or the internet). It identifies and 

The output format is slightly different, but the underlying functionality and content being output is the same. As before, we can view a formatted version of this output using the `Markdown` display:

In [13]:
display(Markdown(result))

## Small Report: Retrieval Augmented Generation (RAG)

**Title:** Retrieval Augmented Generation (RAG): Enhancing LLM Accuracy and Relevance

**Introduction:**
Retrieval Augmented Generation (RAG) is a paradigm-shifting technique designed to enhance the capabilities of Large Language Models (LLMs) by providing them with access to external, up-to-date, and domain-specific information. While LLMs are powerful in generating human-like text, they often suffer from "hallucinations" (generating factually incorrect information), outdated knowledge, or a lack of specific context for niche queries. RAG addresses these limitations by combining the generative power of LLMs with the precision of information retrieval systems.

**How it Works:**
RAG operates in three primary steps:

1.  **Retrieval:** When a user poses a query, the RAG system first searches a vast external knowledge base (e.g., a database of documents, articles, internal company data, or the internet). It identifies and retrieves the most relevant pieces of information, often called "chunks" or "documents," that are pertinent to the query. This knowledge base can be continuously updated, ensuring the information is current.

2.  **Augmentation:** The retrieved information is then used to "augment" or enrich the original user query. Instead of just sending the raw query to the LLM, the system constructs a new, more comprehensive prompt that includes both the user's question and the relevant context retrieved from the external source.

3.  **Generation:** Finally, this augmented prompt is fed into the LLM. The LLM then generates a response, but critically, it does so by grounding its answer in the provided retrieved context. This significantly reduces the likelihood of hallucinations and ensures the response is factual, relevant, and specific to the information found.

**Key Benefits:**

*   **Enhanced Accuracy & Factuality:** Reduces the risk of LLMs generating incorrect or fabricated information by providing verifiable sources.
*   **Access to Up-to-Date Information:** Bypasses the LLM's training data cutoff, allowing it to incorporate real-time or frequently updated data.
*   **Domain Specificity:** Enables LLMs to answer questions about proprietary, internal, or highly specialized knowledge that wasn't part of their original training.
*   **Reduced Training Costs:** Eliminates the need for expensive and time-consuming retraining (fine-tuning) of LLMs every time new information becomes available.
*   **Transparency & Attribution:** Can often provide citations or links to the source documents from which the information was retrieved, increasing user trust.
*   **Improved Explainability:** Makes it easier to understand *why* an LLM generated a particular answer, as the source context is explicit.

**Challenges & Considerations:**

*   **Retrieval Quality:** The effectiveness of RAG heavily depends on the quality of the retrieval step. Poorly retrieved information will lead to poor generation ("garbage in, garbage out").
*   **Context Window Limits:** LLMs have limits on how much text they can process in a single prompt, which can restrict the amount of retrieved context.
*   **Complexity:** Building and maintaining a robust RAG system, including data indexing, chunking strategies, and retrieval algorithms, can be complex.
*   **Latency:** The retrieval step adds an extra processing layer, potentially increasing response times compared to a standalone LLM.

**Applications:**
RAG is being widely adopted across various sectors, including:

*   **Customer Support:** Providing accurate answers from extensive knowledge bases.
*   **Enterprise Search:** Enabling employees to quickly find specific information within internal documents.
*   **Healthcare:** Answering medical queries based on the latest research and patient records.
*   **Legal:** Summarizing case law and retrieving relevant statutes.
*   **Education:** Creating personalized learning experiences with up-to-date content.

**Conclusion:**
Retrieval Augmented Generation represents a significant leap forward in making LLMs more reliable, trustworthy, and practical for real-world applications. By intelligently combining the strengths of information retrieval with the generative power of large language models, RAG mitigates key limitations of standalone LLMs, paving the way for more accurate, contextually relevant, and verifiable AI-driven solutions.

### How Does the Pipe Operator Work?

Before moving onto other LCEL features, let's take a moment to understand what the pipe operator `|` is doing and _how_ it works.

Functionality wise, the pipe tells you that whatever the _left_ side outputs will be fed as input into the _right_ side. In the example of `prompt | llm | output_parser`, we see that `prompt` feeds into `llm` feeds into `output_parser`.

The pipe operator is a way of chaining together components, and is a way of saying that whatever the _left_ side outputs will be fed as input into the _right_ side.

Let's make a basic class named `Runnable` that will transform our a provided function into a _runnable_ class that we will then use with the pipe `|` operator.

In [14]:
class Runnable:
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        def chained_func(*args, **kwargs):
            return other.invoke(self.func(*args, **kwargs))
        return Runnable(chained_func)
    def invoke(self, *args, **kwargs):
        return self.func(*args, **kwargs)

With the `Runnable` class, we will be able wrap a function into the class, allowing us to then chain together multiple of these _runnable_ functions using the `__or__` method.

First, let's create a few functions that we'll chain together:

In [15]:
def add_five(x):
    return x+5

def sub_five(x):
    return x-5

def mul_five(x):
    return x*5

Now we wrap our functions with the `Runnable`:

In [16]:
add_five_runnable = Runnable(add_five)
sub_five_runnable = Runnable(sub_five)
mul_five_runnable = Runnable(mul_five)

Finally, we can chain these together using the `__or__` method from the `Runnable` class:

In [17]:
chain = (add_five_runnable).__or__(sub_five_runnable).__or__(mul_five_runnable)

chain.invoke(3)

15

So we can see that we're able to chain together our functions using `__or__`. The pipe `|` operator is simply a shortcut for the `__or__` method, so we can create the exact same chain like so:

In [18]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

chain.invoke(3)

15

## LCEL `RunnableLambda`

The `RunnableLambda` class is LangChain's built-in method for constructing a _runnable_ object from a function. That is, it does the same thing as the custom `Runnable` class we created earlier. Let's try it out with the same functions as before.

In [19]:
from langchain_core.runnables import RunnableLambda

add_five_runnable = RunnableLambda(add_five)
sub_five_runnable = RunnableLambda(sub_five)
mul_five_runnable = RunnableLambda(mul_five)

We chain these together again with the pipe `|` operator:

In [20]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

And call them using the `invoke` method:

In [21]:
chain.invoke(3)

15

Now we want to try something a little more testing, so this time we will generate a report, and we will try and edit that report using this functionallity.

In [22]:
prompt_str = "give me a small report about {topic}"
prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_str
)

In [23]:
chain = prompt | llm | output_parser

In [24]:
result = chain.invoke("AI")
display(Markdown(result))

## Artificial Intelligence (AI): A Concise Report

**Introduction**
Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI is no longer a futuristic concept but a rapidly evolving field that is transforming industries and daily life.

**What is AI?**
At its core, AI involves developing algorithms that enable computers to analyze vast amounts of data, identify patterns, and make decisions or predictions. Key sub-fields include:
*   **Machine Learning (ML):** Systems learn from data without explicit programming.
*   **Deep Learning (DL):** A subset of ML that uses neural networks with many layers to learn complex patterns, often used for image and speech recognition.
*   **Natural Language Processing (NLP):** Enables computers to understand, interpret, and generate human language.
*   **Computer Vision:** Allows machines to "see" and interpret visual information from the world.

**Key Applications & Impact**
AI's influence is pervasive, impacting various sectors:
*   **Healthcare:** Drug discovery, personalized treatment plans, diagnostic tools, and robotic surgery.
*   **Finance:** Fraud detection, algorithmic trading, and personalized financial advice.
*   **Transportation:** Self-driving cars, traffic optimization, and logistics.
*   **Customer Service:** Chatbots and virtual assistants providing instant support.
*   **Entertainment:** Content recommendation systems (e.g., Netflix, Spotify), and generative AI for art and music.
*   **Manufacturing:** Predictive maintenance, quality control, and robotic automation.

**Benefits of AI**
*   **Increased Efficiency & Automation:** Automating repetitive tasks, freeing up human resources for more complex work.
*   **Enhanced Decision-Making:** Analyzing large datasets to provide insights and support better, faster decisions.
*   **Innovation & Problem Solving:** Enabling breakthroughs in scientific research and addressing complex global challenges.
*   **Personalization:** Delivering tailored experiences in various services, from shopping to education.

**Challenges & Considerations**
While promising, AI development faces significant challenges:
*   **Ethical Concerns:** Issues of bias in algorithms, privacy of data, and accountability for AI decisions.
*   **Job Displacement:** Potential for automation to displace human jobs in certain sectors.
*   **Complexity & Control:** Ensuring AI systems remain controllable and aligned with human values as they become more sophisticated.
*   **Security:** Protecting AI systems from malicious attacks and misuse.

**Conclusion**
Artificial Intelligence is a powerful and rapidly evolving technology that is reshaping our world. Its continued development promises further advancements across all aspects of society, offering immense potential for progress and innovation. However, realizing this potential responsibly requires careful consideration of its ethical, social, and economic implications to ensure a beneficial and equitable future for all.

Here we are making two functions, `extract_fact` to pull out the main content of our text and `replace_word` that will replace AI with Skynet!

In [25]:
def extract_fact(x):
    if "\n\n" in x:
        return "\n".join(x.split("\n\n")[1:])
    else:
        return x

old_word = "AI"
new_word = "skynet"

def replace_word(x):
    return x.replace(old_word, new_word)

Lets wrap these functions and see what the output is!

In [26]:
extract_fact_runnable = RunnableLambda(extract_fact)
replace_word_runnable = RunnableLambda(replace_word)

In [27]:
chain = prompt | llm | output_parser | extract_fact_runnable | replace_word_runnable

In [28]:
result = chain.invoke("retrieval augmented generation")
display(Markdown(result))

**Introduction**
Retrieval Augmented Generation (RAG) is a technique designed to enhance the capabilities of large language models (LLMs) by providing them with access to external, up-to-date, and domain-specific information. While LLMs are powerful in generating human-like text, they often suffer from "hallucinations" (generating factually incorrect information), are limited by their training data cutoff, and lack specific knowledge about proprietary or niche subjects. RAG addresses these limitations by grounding the LLM's responses in verifiable, external data.
**How it Works**
RAG operates in three primary phases:
1.  **Retrieval Phase:**
    *   When a user poses a query, the system first analyzes it to understand the user's intent and keywords.
    *   It then searches a vast external knowledge base (e.g., a database of documents, articles, web pages, or a vector database containing embeddings of these sources) to find relevant information snippets or documents. This knowledge base is separate from the LLM's core training data.
2.  **Augmentation Phase:**
    *   The retrieved relevant information is then combined with the original user query.
    *   This combined input (query + retrieved context) forms an "augmented prompt" that is much richer and more specific than the original query alone.
3.  **Generation Phase:**
    *   The augmented prompt is fed into the LLM.
    *   The LLM then uses this combined information to formulate a more accurate, relevant, and grounded response, drawing directly from the provided context rather than solely relying on its internal, potentially outdated, or generic knowledge.
**Key Benefits**
*   **Improved Accuracy and Factuality:** Significantly reduces hallucinations by providing the LLM with verifiable facts.
*   **Access to Up-to-Date Information:** Allows LLMs to answer questions about recent events or newly published data, bypassing their training data cutoff.
*   **Domain-Specific Knowledge:** Enables LLMs to provide expert answers on proprietary or niche topics by connecting them to private knowledge bases (e.g., company documents, medical records, legal texts).
*   **Transparency and Explainability:** In many RAG implementations, the system can cite the sources from which it retrieved information, allowing users to verify the facts.
*   **Reduced Training Costs:** Eliminates the need for expensive and time-consuming retraining or fine-tuning of LLMs every time new information becomes available.
**Applications**
RAG is being widely adopted across various sectors, including:
*   **Customer Support Chatbots:** Providing accurate and up-to-date answers to customer queries based on product manuals, FAQs, and service policies.
*   **Knowledge Management:** Enabling employees to quickly find specific information within vast internal company documents.
*   **Research Assistance:** Helping researchers summarize papers, find specific data points, or answer complex questions by querying academic databases.
*   **Healthcare and Legal:** Assisting professionals in retrieving relevant case law, medical guidelines, or patient information.
*   **Content Creation:** Generating more factual and well-researched articles, reports, or summaries.
**Conclusion**
Retrieval Augmented Generation represents a significant step forward in making LLMs more reliable, versatile, and practical for real-world applications. By bridging the gap between the vast generative power of LLMs and the critical need for factual accuracy and real-time information, RAG is becoming a standard practice for deploying robust and trustworthy skynet-powered solutions.

Those are our `RunnableLambda` functions. It's worth noting that all inputs to these functions are expected to be a SINGLE arguments. If you have a function that accepts multiple arguments, you can input a dictionary with keys, then unpack them inside the function.

## LCEL `RunnableParallel` and `RunnablePassthrough`

LCEL provides us with various `Runnable` classes that allow us to control the flow of data and execution order through our chains. Two of these are `RunnableParallel` and `RunnablePassthrough`.

* `RunnableParallel` — allows us to run multiple `Runnable` instances in parallel. Acting almost as a Y-fork in the chain.

* `RunnablePassthrough` — allows us to pass through a variable to the next `Runnable` without modification.

In [30]:
!pip install sentence-transformers

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

To see these runnables in action, we will create two data sources, each source provides specific information but to answer the question we will need both to fed to the LLM.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

# Use a lightweight and performant model
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vecstore_a = DocArrayInMemorySearch.from_texts(
    [
        "half the info is here",
        "DeepSeek-V3 was released in December 2024"
    ],
    embedding=embedding
)

vecstore_b = DocArrayInMemorySearch.from_texts(
    [
        "the other half of the info is here",
        "the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters"
    ],
    embedding=embedding
)


Here you can see the prompt does have three inputs, two for context and one for the question itself.

In [32]:
prompt_str = """Using the context provided, answer the user's question.
Context:
{context_a}
{context_b}
"""

In [33]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt_str),
    HumanMessagePromptTemplate.from_template("{question}")
])

Here we are wrapping our vector stores as retrievers so they can be fitted into one big retrieval variable to be used by the prompt.

In [34]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

retriever_a = vecstore_a.as_retriever()
retriever_b = vecstore_b.as_retriever()

retrieval = RunnableParallel(
    {
        "context_a": retriever_a, "context_b": retriever_b, "question": RunnablePassthrough()
    }
)

The chain we'll be constructing will look something like this:

![](https://github.com/aurelio-labs/langchain-course/blob/main/assets/lcel-flow.png?raw=1)

In [35]:
chain = retrieval | prompt | llm | output_parser

We `invoke` it as usual.

In [36]:
result = chain.invoke(
    "what architecture does the model DeepSeek released in december use?"
)
result

'The DeepSeek-V3 LLM, released in December, uses a mixture of experts architecture.'

With that we've seen how we can use `RunnableParallel` and `RunnablePassthrough` to control the flow of data and execution order through our chains.

---