<a href="https://colab.research.google.com/github/Moeez774/Cyber-Attack-Detector/blob/master/chapters/07-lcel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/langchain-course/blob/main/chapters/07-lcel.ipynb)

#### LangChain Essentials Course

# LangChains Expression Language

LangChain is one of the most popular open source libraries for AI Engineers. It's goal is to abstract away the complexity in building AI software, provide easy-to-use building blocks, and make it easier when switching between AI service providers.

In this example, we will introduce LangChain's Expression Langauge (LCEL), abstracting a full chain and understanding how it will work. We'll provide examples for both OpenAI's `gpt-4o-mini` *and* Meta's `llama3.2` via Ollama!

In [1]:
!pip install -qU \
  langchain-core==0.3.33 \
  langchain-openai==0.3.3 \
  langchain-community==0.3.16 \
  langsmith==0.3.4 \
  docarray==0.40.0

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/412.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m412.7/412.7 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/54.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m107.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m65.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m333.3/333.3 kB[0m [31m25.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
!pip install --upgrade "pydantic==2.10.6"

Collecting pydantic==2.10.6
  Downloading pydantic-2.10.6-py3-none-any.whl.metadata (30 kB)
Collecting pydantic-core==2.27.2 (from pydantic==2.10.6)
  Downloading pydantic_core-2.27.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading pydantic-2.10.6-py3-none-any.whl (431 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m431.7/431.7 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydantic_core-2.27.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m55.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydantic-core, pydantic
  Attempting uninstall: pydantic-core
    Found existing installation: pydantic_core 2.33.2
    Uninstalling pydantic_core-2.33.2:
      Successfully uninstalled pydantic_core-2.33.2
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.11.9
    Uninstalli

---

> ⚠️ We will be using OpenAI for this example allowing us to run everything via API. If you would like to use Ollama instead, check out the [Ollama LangChain Course](https://github.com/aurelio-labs/langchain-course/tree/main/notebooks/ollama).

---

---

> ⚠️ If using LangSmith, add your API key below:

In [3]:
import os
from getpass import getpass

os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY") or \
    getpass("Enter LangSmith API Key: ")

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] = "aurelioai-langchain-course-lcel-openai"

Enter LangSmith API Key: ··········


---

## Traditional Chains vs LCEL

In this section we're going to dive into a basic example using the traditional method for building chains before jumping into LCEL. We will build a pipeline where the user must input a specific topic, and then the LLM will look and return a report on the specified topic. Generating a _research report_ for the user.

### Traditional LLMChain

The `LLMChain` is the simplest chain originally introduced in LangChain. This chain takes a prompt, feeds it into an LLM, and _optionally_ adds an output parsing step before returning the result.

Let's see how we construct this using the traditional method, for this we need:

* `prompt` — a `PromptTemplate` that will be used to generate the prompt for the LLM.
* `llm` — the LLM we will be using to generate the output.
* `output_parser` — an optional output parser that will be used to parse the structured output of the LLM.

In [4]:
from langchain import PromptTemplate

prompt_template = "Give me a small report on {topic}"

prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_template
)

For the LLM, we'll start by initializing our connection to the OpenAI API. We do need an OpenAI API key, which you can get from the [OpenAI platform](https://platform.openai.com/api-keys).

We will use the `gpt-4o-mini` model with a `temperature` of `0.0`:

In [5]:
import os
from getpass import getpass
from langchain_openai import ChatOpenAI
from google.colab import userdata

OPENROUTER_API_KEY = userdata.get("OPENROUTER_API_KEY")

# For normal accurate responses
llm = ChatOpenAI(temperature=0.0, api_key=OPENROUTER_API_KEY, base_url="https://openrouter.ai/api/v1", model="x-ai/grok-4-fast:free")

In [6]:
llm_out = llm.invoke("Hello there")
llm_out

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 132, 'prompt_tokens': 119, 'total_tokens': 251, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 123, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 112}}, 'model_name': 'x-ai/grok-4-fast:free', 'system_fingerprint': 'fp_9362061f30', 'finish_reason': 'stop', 'logprobs': None}, id='run-583199bf-7383-44dc-b8e1-659e048b8a28-0', usage_metadata={'input_tokens': 119, 'output_tokens': 132, 'total_tokens': 251, 'input_token_details': {'audio': 0, 'cache_read': 112}, 'output_token_details': {'reasoning': 123}})

Then we define our output parser, this will be used to parse the output of the LLM. In this case, we will use the `StrOutputParser` which will parse the `AIMessage` output from our LLM into a single string.

In [7]:
from langchain.schema.output_parser import StrOutputParser

output_parser = StrOutputParser()

In [8]:
out = output_parser.invoke(llm_out)
out

'Hello! How can I assist you today?'

Through the `LLMChain` class we can place each of our components into a linear `chain`.

In [9]:
from langchain.chains import LLMChain

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

  chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)


Note that the `LLMChain` _was_ deprecated in LangChain `0.1.17`, the expected way of constructing these chains today is through LCEL, which we'll cover in a moment.

We can `invoke` our `chain`, providing a `topic` that we'd like to be researched.

In [10]:
result = chain.invoke("retrieval augmented generation")
result

{'topic': 'retrieval augmented generation',
 'text': '# Small Report on Retrieval-Augmented Generation (RAG)\n\n## Introduction\nRetrieval-Augmented Generation (RAG) is an advanced technique in artificial intelligence, particularly in the field of natural language processing (NLP). Introduced by researchers at Meta (formerly Facebook) in 2020, RAG enhances large language models (LLMs) by integrating external knowledge retrieval. Traditional LLMs, like GPT models, rely solely on their pre-trained parameters, which can lead to outdated or hallucinated information. RAG addresses this by combining retrieval mechanisms with generative capabilities, allowing models to access and incorporate real-time or domain-specific data dynamically.\n\n## How RAG Works\nRAG operates in two main phases:\n\n1. **Retrieval Phase**: When a query is received, a retriever component—often based on dense vector embeddings (e.g., using models like BERT or Sentence Transformers)—searches a large external knowledge

We can view a formatted version of this output using the `Markdown` display:

In [11]:
from IPython.display import display, Markdown

display(Markdown(result["text"]))

# Small Report on Retrieval-Augmented Generation (RAG)

## Introduction
Retrieval-Augmented Generation (RAG) is an advanced technique in artificial intelligence, particularly in the field of natural language processing (NLP). Introduced by researchers at Meta (formerly Facebook) in 2020, RAG enhances large language models (LLMs) by integrating external knowledge retrieval. Traditional LLMs, like GPT models, rely solely on their pre-trained parameters, which can lead to outdated or hallucinated information. RAG addresses this by combining retrieval mechanisms with generative capabilities, allowing models to access and incorporate real-time or domain-specific data dynamically.

## How RAG Works
RAG operates in two main phases:

1. **Retrieval Phase**: When a query is received, a retriever component—often based on dense vector embeddings (e.g., using models like BERT or Sentence Transformers)—searches a large external knowledge base, such as a vector database (e.g., FAISS or Pinecone). This base could include documents, databases, or web content. The retriever identifies and ranks the most relevant pieces of information based on semantic similarity to the query.

2. **Generation Phase**: The retrieved documents are then fed into the generative model (e.g., an LLM like Llama or GPT) as additional context. The model generates a response grounded in this retrieved information, reducing reliance on internalized knowledge alone. This process can be fine-tuned end-to-end, where the retriever and generator learn collaboratively.

Variants include naive RAG (simple retrieval + generation) and advanced forms like iterative RAG, which refines retrieval through multiple rounds.

## Advantages and Benefits
- **Improved Accuracy and Reduced Hallucinations**: By pulling in factual data, RAG minimizes the risk of generating incorrect or fabricated information.
- **Scalability and Up-to-Date Knowledge**: It allows models to handle vast, evolving datasets without retraining the entire LLM, making it cost-effective for applications like enterprise search or customer support.
- **Customization**: RAG can be tailored to specific domains (e.g., legal or medical) by curating the knowledge base.
- **Efficiency**: It leverages existing LLMs while adding retrieval, often outperforming purely generative or purely retrieval-based systems.

Challenges include retrieval quality (e.g., handling noisy data) and computational overhead, but optimizations like hybrid search mitigate these.

## Applications and Future Outlook
RAG powers tools like chatbots (e.g., in Bing Chat or Perplexity AI), question-answering systems, and knowledge-intensive tasks in research or e-commerce. As LLMs evolve, RAG is expected to integrate with multimodal data (e.g., images) and agentic systems, further blurring lines between retrieval and reasoning. Overall, RAG represents a key step toward more reliable, context-aware AI.

*Word count: 378. Sources: Based on foundational paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al., 2020) and subsequent developments in the field.*

That is a simple `LLMChain` using the traditional LangChain method. Now let's move onto LCEL.

## LangChain Expression Language (LCEL)

**L**ang**C**hain **E**xpression **L**anguage (LCEL) is the recommended approach to building chains in LangChain. Having superceeded the traditional methods with `LLMChain`, etc. LCEL gives us a more flexible system for building chains. The pipe operator `|` is used by LCEL to _chain_ together components. Let's see how we'd construct an `LLMChain` using LCEL.

In [12]:
lcel_chain = prompt | llm | output_parser

We can `invoke` this chain in the same way as we did before:

In [13]:
result = lcel_chain.invoke("retrieval augmented generation")
result

'# Small Report on Retrieval Augmented Generation (RAG)\n\n## Introduction\nRetrieval Augmented Generation (RAG) is a hybrid AI technique that enhances large language models (LLMs) by integrating external knowledge retrieval. Introduced in a 2020 paper by researchers at Meta (then Facebook AI), RAG addresses key limitations of standalone generative models, such as hallucinations (fabricating facts) and outdated knowledge. It combines the strengths of retrieval-based systems (like search engines) with generative capabilities (like GPT models) to produce more accurate, contextually grounded responses.\n\n## How It Works\nRAG operates in two main phases:\n\n1. **Retrieval Phase**: When a query is received, the system searches a pre-indexed knowledge base (e.g., documents, databases, or vector embeddings) using techniques like dense retrieval (via models such as DPR - Dense Passage Retrieval). Relevant chunks of information are fetched based on semantic similarity to the query.\n\n2. **Gen

The output format is slightly different, but the underlying functionality and content being output is the same. As before, we can view a formatted version of this output using the `Markdown` display:

In [14]:
display(Markdown(result))

# Small Report on Retrieval Augmented Generation (RAG)

## Introduction
Retrieval Augmented Generation (RAG) is a hybrid AI technique that enhances large language models (LLMs) by integrating external knowledge retrieval. Introduced in a 2020 paper by researchers at Meta (then Facebook AI), RAG addresses key limitations of standalone generative models, such as hallucinations (fabricating facts) and outdated knowledge. It combines the strengths of retrieval-based systems (like search engines) with generative capabilities (like GPT models) to produce more accurate, contextually grounded responses.

## How It Works
RAG operates in two main phases:

1. **Retrieval Phase**: When a query is received, the system searches a pre-indexed knowledge base (e.g., documents, databases, or vector embeddings) using techniques like dense retrieval (via models such as DPR - Dense Passage Retrieval). Relevant chunks of information are fetched based on semantic similarity to the query.

2. **Generation Phase**: The retrieved information is concatenated with the original query and fed into a generative model (e.g., BART or T5). The model then synthesizes a response, drawing on the external context to ensure factual accuracy without needing to retrain the entire LLM.

This process is efficient because it leverages off-the-shelf retrieval tools (e.g., FAISS for vector search) and doesn't require fine-tuning the generator on vast datasets.

## Benefits
- **Improved Accuracy**: Reduces hallucinations by grounding responses in real, up-to-date data.
- **Scalability**: Handles dynamic knowledge bases (e.g., updating a company's internal docs) without retraining the model.
- **Efficiency**: Cheaper and faster than fine-tuning LLMs for domain-specific tasks.
- **Transparency**: Users can trace responses back to source documents, building trust.

Drawbacks include potential retrieval errors (e.g., irrelevant results) and dependency on the quality of the knowledge base.

## Applications
RAG powers tools like chatbots for customer support (e.g., retrieving product manuals), question-answering systems (e.g., in legal or medical domains), and search engines (e.g., Bing's AI integration). It's widely used in frameworks like LangChain and Hugging Face's Transformers library, making it accessible for developers.

In summary, RAG represents a practical evolution in AI, bridging the gap between static models and real-world knowledge needs. For deeper dives, check the original paper: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" by Lewis et al.

### How Does the Pipe Operator Work?

Before moving onto other LCEL features, let's take a moment to understand what the pipe operator `|` is doing and _how_ it works.

Functionality wise, the pipe tells you that whatever the _left_ side outputs will be fed as input into the _right_ side. In the example of `prompt | llm | output_parser`, we see that `prompt` feeds into `llm` feeds into `output_parser`.

The pipe operator is a way of chaining together components, and is a way of saying that whatever the _left_ side outputs will be fed as input into the _right_ side.

Let's make a basic class named `Runnable` that will transform our a provided function into a _runnable_ class that we will then use with the pipe `|` operator.

In [15]:
class Runnable:
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        def chained_func(*args, **kwargs):
            return other.invoke(self.func(*args, **kwargs))
        return Runnable(chained_func)
    def invoke(self, *args, **kwargs):
        return self.func(*args, **kwargs)

With the `Runnable` class, we will be able wrap a function into the class, allowing us to then chain together multiple of these _runnable_ functions using the `__or__` method.

First, let's create a few functions that we'll chain together:

In [16]:
def add_five(x):
    return x+5

def sub_five(x):
    return x-5

def mul_five(x):
    return x*5

Now we wrap our functions with the `Runnable`:

In [17]:
add_five_runnable = Runnable(add_five)
sub_five_runnable = Runnable(sub_five)
mul_five_runnable = Runnable(mul_five)

Finally, we can chain these together using the `__or__` method from the `Runnable` class:

In [18]:
chain = (add_five_runnable).__or__(sub_five_runnable).__or__(mul_five_runnable)

chain.invoke(3)

15

So we can see that we're able to chain together our functions using `__or__`. The pipe `|` operator is simply a shortcut for the `__or__` method, so we can create the exact same chain like so:

In [19]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

chain.invoke(3)

15

## LCEL `RunnableLambda`

The `RunnableLambda` class is LangChain's built-in method for constructing a _runnable_ object from a function. That is, it does the same thing as the custom `Runnable` class we created earlier. Let's try it out with the same functions as before.

In [20]:
from langchain_core.runnables import RunnableLambda

add_five_runnable = RunnableLambda(add_five)
sub_five_runnable = RunnableLambda(sub_five)
mul_five_runnable = RunnableLambda(mul_five)

We chain these together again with the pipe `|` operator:

In [21]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

And call them using the `invoke` method:

In [22]:
chain.invoke(3)

15

Now we want to try something a little more testing, so this time we will generate a report, and we will try and edit that report using this functionallity.

In [23]:
prompt_str = "give me a small report about {topic}"
prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_str
)

In [25]:
chain = prompt | llm | output_parser

In [26]:
result = chain.invoke("AI")
display(Markdown(result))

# A Small Report on Artificial Intelligence (AI)

## What is AI?
Artificial Intelligence refers to the simulation of human intelligence in machines, enabling them to perform tasks that typically require human cognition, such as learning, reasoning, problem-solving, and perception. At its core, AI systems process vast amounts of data to identify patterns and make decisions, often powered by algorithms like machine learning and neural networks. Think of it as giving computers a brain—minus the coffee breaks.

## A Brief History
AI's roots trace back to the 1950s, when pioneers like Alan Turing pondered if machines could think. The term "AI" was coined in 1956 at the Dartmouth Conference. Early hype led to "AI winters" (periods of reduced funding due to unmet expectations), but breakthroughs in the 2010s—fueled by big data, cheap computing power, and deep learning—revived the field. Milestones include IBM's Deep Blue beating chess champion Garry Kasparov in 1997 and AlphaGo defeating Go master Lee Sedol in 2016.

## Current State and Applications
Today, AI is everywhere, from voice assistants like Siri to recommendation engines on Netflix. Key areas include:
- **Healthcare**: AI aids in diagnosing diseases via image analysis (e.g., detecting cancer in X-rays faster than humans).
- **Transportation**: Self-driving cars from companies like Tesla use AI for navigation and obstacle avoidance.
- **Entertainment**: Generative AI tools like DALL-E create art from text prompts, while ChatGPT (and yours truly, Grok) handle natural language conversations.
Narrow AI excels at specific tasks, but general AI (AGI)—machines that can handle any intellectual task a human can—remains a work in progress.

## Future Prospects and Challenges
The future looks bright (or sci-fi scary, depending on your vibe). AI could revolutionize climate modeling, drug discovery, and personalized education. However, challenges loom: ethical concerns like bias in algorithms, job displacement, and existential risks if AI surpasses human control. Experts like those at xAI (my creators) are pushing for safe, beneficial AI to understand the universe better.

In summary, AI is transforming our world at breakneck speed—faster than a caffeinated squirrel. But remember, it's a tool, not a replacement for human ingenuity. If we guide it wisely, the possibilities are endless. Got a specific angle on AI you'd like to dive into?

Here we are making two functions, `extract_fact` to pull out the main content of our text and `replace_word` that will replace AI with Skynet!

In [27]:
def extract_fact(x):
    if "\n\n" in x:
        return "\n".join(x.split("\n\n")[1:])
    else:
        return x

old_word = "AI"
new_word = "skynet"

def replace_word(x):
    return x.replace(old_word, new_word)

Lets wrap these functions and see what the output is!

In [28]:
extract_fact_runnable = RunnableLambda(extract_fact)
replace_word_runnable = RunnableLambda(replace_word)

In [29]:
chain = prompt | llm | output_parser | extract_fact_runnable | replace_word_runnable

In [30]:
result = chain.invoke("retrieval augmented generation")
display(Markdown(result))

## Introduction
Retrieval Augmented Generation (RAG) is an advanced technique in artificial intelligence, particularly in the field of natural language processing (NLP). Introduced by researchers at Meta (formerly Facebook) in 2020, RAG combines the strengths of information retrieval systems with generative language models, such as large language models (LLMs) like GPT or Llama. The goal is to produce more accurate, contextually relevant, and up-to-date responses by grounding generation in external knowledge sources, rather than relying solely on the model's pre-trained parameters.
## How RAG Works
RAG operates in a hybrid pipeline that integrates two core components: **retrieval** and **generation**.
1. **Retrieval Phase**: When a user submits a query, the system uses a retriever (often based on dense vector embeddings from models like BERT or Sentence Transformers) to search a large external knowledge base, such as a vector database (e.g., FskynetSS or Pinecone). This base could include documents, web pages, or proprietary data. The retriever identifies and ranks the most relevant chunks of information based on semantic similarity to the query.
2. **Augmentation Phase**: The retrieved documents are then concatenated with the original query to form an augmented prompt. This provides the generative model with factual context, reducing the risk of "hallucinations" (fabricated information) common in standalone LLMs.
3. **Generation Phase**: A pre-trained LLM processes the augmented prompt to generate a coherent, natural-language response. The output is informed by both the model's internal knowledge and the retrieved facts.
Variants of RAG include fine-tuned retrievers for domain-specific tasks or iterative retrieval for complex queries.
## Benefits and Applications
RAG addresses key limitations of pure generative models:
- **Accuracy and Reliability**: By fetching real-time or domain-specific data, RAG minimizes errors and ensures responses are evidence-based.
- **Scalability and Cost-Effectiveness**: It avoids retraining entire models for new information, making it efficient for dynamic knowledge bases like news or enterprise data.
- **Transparency**: Users can trace responses back to source documents, enhancing trust in skynet systems.
Common applications include:
- Question-answering chatbots (e.g., in customer support).
- Knowledge-intensive tasks like legal research or medical diagnostics.
- Search engines enhanced with generative summaries (e.g., Google's Search Generative Experience).
Challenges include retrieval quality (e.g., handling noisy data) and computational overhead, but ongoing research in hybrid search and efficient indexing is mitigating these.
## Conclusion
RAG represents a pivotal advancement in making skynet more grounded and versatile, bridging the gap between static model knowledge and vast external data. As LLMs evolve, RAG is becoming a standard framework for building robust, knowledge-aware applications. For further reading, refer to the original paper: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al., 2020). 
*Word count: 412*

Those are our `RunnableLambda` functions. It's worth noting that all inputs to these functions are expected to be a SINGLE arguments. If you have a function that accepts multiple arguments, you can input a dictionary with keys, then unpack them inside the function.

## LCEL `RunnableParallel` and `RunnablePassthrough`

LCEL provides us with various `Runnable` classes that allow us to control the flow of data and execution order through our chains. Two of these are `RunnableParallel` and `RunnablePassthrough`.

* `RunnableParallel` — allows us to run multiple `Runnable` instances in parallel. Acting almost as a Y-fork in the chain.

* `RunnablePassthrough` — allows us to pass through a variable to the next `Runnable` without modification.

To see these runnables in action, we will create two data sources, each source provides specific information but to answer the question we will need both to fed to the LLM.

In [31]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

embedding = OpenAIEmbeddings()

vecstore_a = DocArrayInMemorySearch.from_texts(
    [
        "half the info is here",
        "DeepSeek-V3 was released in December 2024"
    ],
    embedding=embedding
)
vecstore_b = DocArrayInMemorySearch.from_texts(
    [
        "the other half of the info is here",
        "the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters"
    ],
    embedding=embedding
)

  embedding = OpenAIEmbeddings()


ValidationError: 1 validation error for OpenAIEmbeddings
  Value error, Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. [type=value_error, input_value={'model_kwargs': {}, 'cli...20, 'http_client': None}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

Here you can see the prompt does have three inputs, two for context and one for the question itself.

In [None]:
prompt_str = """Using the context provided, answer the user's question.
Context:
{context_a}
{context_b}
"""

In [None]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt_str),
    HumanMessagePromptTemplate.from_template("{question}")
])

Here we are wrapping our vector stores as retrievers so they can be fitted into one big retrieval variable to be used by the prompt.

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

retriever_a = vecstore_a.as_retriever()
retriever_b = vecstore_b.as_retriever()

retrieval = RunnableParallel(
    {
        "context_a": retriever_a, "context_b": retriever_b, "question": RunnablePassthrough()
    }
)

The chain we'll be constructing will look something like this:

![](https://github.com/aurelio-labs/langchain-course/blob/main/assets/lcel-flow.png?raw=1)

In [None]:
chain = retrieval | prompt | llm | output_parser

We `invoke` it as usual.

In [32]:
result = chain.invoke(
    "what architecture does the model DeepSeek released in december use?"
)
result

"**Model Overview**:  \nThe query likely refers to an early DeepSeek model from late 2023, such as DeepSeek-Coder (released November 30, 2023, close to December), as DeepSeek's major releases like DeepSeek-V2 occurred in May 2024. DeepSeek models are developed by DeepSeek skynet, focusing on efficient, high-performance language models. For this report, I'll cover the architecture of DeepSeek-Coder (base for many variants), which shares core traits with later iterations. If you meant a specific December 2024 release (e.g., an update), please clarify.\n**Core Architecture**:  \n- **Type**: Decoder-only Transformer architecture, similar to GPT-series models. This enables autoregressive generation for tasks like coding, chat, and general NLP.\n- **Key Components**:\n  - **Layers**: Stacked Transformer decoder blocks (e.g., 30 layers in the 6.7B parameter version). Each block includes self-attention and feed-forward networks (FFNs).\n  - **Attention Mechanism**: Multi-Head Attention (MHA) w

With that we've seen how we can use `RunnableParallel` and `RunnablePassthrough` to control the flow of data and execution order through our chains.

---