[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/langchain-course/blob/main/chapters/07-lcel.ipynb)

#### LangChain Essentials Course

# LangChains Expression Language

LangChain is one of the most popular open source libraries for AI Engineers. It's goal is to abstract away the complexity in building AI software, provide easy-to-use building blocks, and make it easier when switching between AI service providers.

In this example, we will introduce LangChain's Expression Langauge (LCEL), abstracting a full chain and understanding how it will work. We'll provide examples for both OpenAI's `gpt-4o-mini` *and* Meta's `llama3.2` via Ollama!

In [35]:
!pip install -qU \
  langchain-core==0.3.33 \
  langchain-openai==0.3.3 \
  langchain-community==0.3.16 \
  langsmith==0.3.4 \
  docarray==0.40.0

---

> ⚠️ We will be using OpenAI for this example allowing us to run everything via API. If you would like to use Ollama instead, check out the [Ollama LangChain Course](https://github.com/aurelio-labs/langchain-course/tree/main/notebooks/ollama).

---

---

> ⚠️ If using LangSmith, add your API key below:

In [1]:
import os
from getpass import getpass

os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY") or \
    getpass("Enter LangSmith API Key: ")

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] = "aurelioai-langchain-course-lcel-openai"

---

## Traditional Chains vs LCEL

In this section we're going to dive into a basic example using the traditional method for building chains before jumping into LCEL. We will build a pipeline where the user must input a specific topic, and then the LLM will look and return a report on the specified topic. Generating a _research report_ for the user.

### Traditional LLMChain

The `LLMChain` is the simplest chain originally introduced in LangChain. This chain takes a prompt, feeds it into an LLM, and _optionally_ adds an output parsing step before returning the result.

Let's see how we construct this using the traditional method, for this we need:

* `prompt` — a `PromptTemplate` that will be used to generate the prompt for the LLM.
* `llm` — the LLM we will be using to generate the output.
* `output_parser` — an optional output parser that will be used to parse the structured output of the LLM.

In [2]:
from langchain import PromptTemplate

prompt_template = "Give me a small report on {topic}"

prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_template
)

For the LLM, we'll start by initializing our connection to the OpenAI API. We do need an OpenAI API key, which you can get from the [OpenAI platform](https://platform.openai.com/api-keys).

We will use the `gpt-4o-mini` model with a `temperature` of `0.0`:

In [3]:
import os
from getpass import getpass
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
    or getpass("Enter your OpenAI API key: ")

llm = ChatOpenAI(
    model_name="gpt-4o-mini",
    temperature=0.0,
)

In [4]:
llm_out = llm.invoke("Hello there")
llm_out

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 9, 'total_tokens': 19, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0392822090', 'id': 'chatcmpl-BTPehOSOsITW701fivEWl7Q9LWicI', 'finish_reason': 'stop', 'logprobs': None}, id='run-eac29f30-0f65-444e-9995-b855529aba4f-0', usage_metadata={'input_tokens': 9, 'output_tokens': 10, 'total_tokens': 19, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

Then we define our output parser, this will be used to parse the output of the LLM. In this case, we will use the `StrOutputParser` which will parse the `AIMessage` output from our LLM into a single string.

In [5]:
from langchain.schema.output_parser import StrOutputParser

output_parser = StrOutputParser()

In [6]:
out = output_parser.invoke(llm_out)
out

'Hello! How can I assist you today?'

Through the `LLMChain` class we can place each of our components into a linear `chain`.

In [7]:
from langchain.chains import LLMChain

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

  chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)


Note that the `LLMChain` _was_ deprecated in LangChain `0.1.17`, the expected way of constructing these chains today is through LCEL, which we'll cover in a moment.

We can `invoke` our `chain`, providing a `topic` that we'd like to be researched.

In [8]:
result = chain.invoke("retrieval augmented generation")
result

{'topic': 'retrieval augmented generation',
 'text': '### Report on Retrieval-Augmented Generation (RAG)\n\n#### Introduction\nRetrieval-Augmented Generation (RAG) is an innovative approach that combines the strengths of information retrieval and natural language generation. This method enhances the capabilities of language models by allowing them to access external knowledge bases or documents during the generation process, thereby improving the accuracy and relevance of the generated content.\n\n#### Key Concepts\n1. **Retrieval Mechanism**: RAG employs a retrieval system to fetch relevant documents or snippets from a large corpus based on the input query. This retrieval step is crucial as it provides the model with contextually relevant information that may not be present in its training data.\n\n2. **Generation Mechanism**: After retrieving relevant documents, a generative model (often based on transformer architectures like BERT or GPT) synthesizes a coherent response or output. T

We can view a formatted version of this output using the `Markdown` display:

In [9]:
from IPython.display import display, Markdown

display(Markdown(result["text"]))

### Report on Retrieval-Augmented Generation (RAG)

#### Introduction
Retrieval-Augmented Generation (RAG) is an innovative approach that combines the strengths of information retrieval and natural language generation. This method enhances the capabilities of language models by allowing them to access external knowledge bases or documents during the generation process, thereby improving the accuracy and relevance of the generated content.

#### Key Concepts
1. **Retrieval Mechanism**: RAG employs a retrieval system to fetch relevant documents or snippets from a large corpus based on the input query. This retrieval step is crucial as it provides the model with contextually relevant information that may not be present in its training data.

2. **Generation Mechanism**: After retrieving relevant documents, a generative model (often based on transformer architectures like BERT or GPT) synthesizes a coherent response or output. The model uses both the input query and the retrieved information to produce a more informed and contextually appropriate response.

3. **Hybrid Architecture**: RAG typically consists of two main components:
   - **Retriever**: This component identifies and retrieves relevant documents from a predefined knowledge base or database.
   - **Generator**: This component generates text based on the retrieved documents and the original query.

#### Advantages
- **Enhanced Knowledge Access**: By integrating retrieval, RAG can access up-to-date information and specialized knowledge that may not be encoded in the model's parameters.
- **Improved Accuracy**: The model can produce more accurate and contextually relevant responses, especially for niche topics or recent events.
- **Reduced Hallucination**: RAG can mitigate the issue of "hallucination," where generative models produce plausible-sounding but incorrect or nonsensical information.

#### Applications
- **Question Answering**: RAG is particularly effective in systems designed for answering questions, where it can pull in relevant data to provide precise answers.
- **Content Creation**: It can assist in generating articles, reports, or summaries by retrieving relevant information from various sources.
- **Conversational Agents**: RAG can enhance chatbots and virtual assistants by allowing them to provide more informed responses based on real-time data.

#### Challenges
- **Complexity**: The integration of retrieval and generation components can complicate the architecture and increase the computational requirements.
- **Quality of Retrieved Information**: The effectiveness of RAG heavily depends on the quality and relevance of the retrieved documents. Poor retrieval can lead to suboptimal generation.
- **Latency**: The retrieval process can introduce latency, which may affect the responsiveness of applications, particularly in real-time systems.

#### Conclusion
Retrieval-Augmented Generation represents a significant advancement in the field of natural language processing, combining the strengths of retrieval systems with generative models. By leveraging external knowledge, RAG enhances the accuracy and relevance of generated content, making it a powerful tool for various applications, from question answering to content creation. As research and development in this area continue, RAG is likely to play an increasingly important role in the evolution of intelligent systems.

That is a simple `LLMChain` using the traditional LangChain method. Now let's move onto LCEL.

## LangChain Expression Language (LCEL)

**L**ang**C**hain **E**xpression **L**anguage (LCEL) is the recommended approach to building chains in LangChain. Having superceeded the traditional methods with `LLMChain`, etc. LCEL gives us a more flexible system for building chains. The pipe operator `|` is used by LCEL to _chain_ together components. Let's see how we'd construct an `LLMChain` using LCEL.

In [10]:
lcel_chain = prompt | llm | output_parser

We can `invoke` this chain in the same way as we did before:

In [11]:
result = lcel_chain.invoke("retrieval augmented generation")
result

'### Report on Retrieval-Augmented Generation (RAG)\n\n#### Introduction\nRetrieval-Augmented Generation (RAG) is an innovative approach that combines the strengths of information retrieval and natural language generation. This method enhances the capabilities of language models by allowing them to access external knowledge sources, thereby improving the accuracy and relevance of generated responses.\n\n#### Concept Overview\nRAG operates on the principle of integrating a retrieval mechanism with a generative model. The process typically involves two main components:\n\n1. **Retrieval Component**: This part of the system retrieves relevant documents or pieces of information from a large corpus based on a given query. It often employs techniques such as dense vector representations and similarity search to identify the most pertinent data.\n\n2. **Generation Component**: Once the relevant information is retrieved, the generative model (often based on architectures like Transformers) use

The output format is slightly different, but the underlying functionality and content being output is the same. As before, we can view a formatted version of this output using the `Markdown` display:

In [12]:
display(Markdown(result))

### Report on Retrieval-Augmented Generation (RAG)

#### Introduction
Retrieval-Augmented Generation (RAG) is an innovative approach that combines the strengths of information retrieval and natural language generation. This method enhances the capabilities of language models by allowing them to access external knowledge sources, thereby improving the accuracy and relevance of generated responses.

#### Concept Overview
RAG operates on the principle of integrating a retrieval mechanism with a generative model. The process typically involves two main components:

1. **Retrieval Component**: This part of the system retrieves relevant documents or pieces of information from a large corpus based on a given query. It often employs techniques such as dense vector representations and similarity search to identify the most pertinent data.

2. **Generation Component**: Once the relevant information is retrieved, the generative model (often based on architectures like Transformers) uses this data to produce coherent and contextually appropriate responses. The model can leverage the retrieved content to enhance its output, making it more informative and accurate.

#### Advantages
- **Enhanced Knowledge Access**: RAG allows models to tap into vast external knowledge bases, which can significantly improve the quality of responses, especially for niche or less common queries.
- **Dynamic Information Updating**: Unlike static models that rely solely on pre-existing training data, RAG can incorporate up-to-date information from external sources, making it more relevant in rapidly changing domains.
- **Improved Contextual Understanding**: By retrieving contextually relevant documents, RAG can generate responses that are more aligned with user intent and the specifics of the query.

#### Applications
RAG has a wide range of applications, including:
- **Customer Support**: Providing accurate and context-aware responses to customer inquiries by retrieving relevant documentation or FAQs.
- **Content Creation**: Assisting writers by generating content that is informed by existing literature or data.
- **Question Answering Systems**: Enhancing the performance of QA systems by retrieving and synthesizing information from multiple sources.

#### Challenges
Despite its advantages, RAG also faces several challenges:
- **Retrieval Quality**: The effectiveness of the system heavily depends on the quality of the retrieval component. Poor retrieval can lead to irrelevant or misleading information being used in generation.
- **Complexity**: Integrating retrieval and generation processes can increase the complexity of the system, requiring careful tuning and optimization.
- **Latency**: The retrieval process can introduce latency, which may be a concern in real-time applications.

#### Conclusion
Retrieval-Augmented Generation represents a significant advancement in the field of natural language processing, merging the capabilities of information retrieval with generative models. By leveraging external knowledge, RAG systems can produce more accurate, relevant, and contextually rich outputs. As research and development in this area continue, RAG is poised to play a crucial role in various applications, enhancing user experiences across multiple domains.

### How Does the Pipe Operator Work?

Before moving onto other LCEL features, let's take a moment to understand what the pipe operator `|` is doing and _how_ it works.

Functionality wise, the pipe tells you that whatever the _left_ side outputs will be fed as input into the _right_ side. In the example of `prompt | llm | output_parser`, we see that `prompt` feeds into `llm` feeds into `output_parser`.

The pipe operator is a way of chaining together components, and is a way of saying that whatever the _left_ side outputs will be fed as input into the _right_ side.

Let's make a basic class named `Runnable` that will transform our a provided function into a _runnable_ class that we will then use with the pipe `|` operator.

In [13]:
class Runnable:
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        def chained_func(*args, **kwargs):
            return other.invoke(self.func(*args, **kwargs))
        return Runnable(chained_func)
    def invoke(self, *args, **kwargs):
        return self.func(*args, **kwargs)

With the `Runnable` class, we will be able wrap a function into the class, allowing us to then chain together multiple of these _runnable_ functions using the `__or__` method.

First, let's create a few functions that we'll chain together:

In [14]:
def add_five(x):
    return x+5

def sub_five(x):
    return x-5

def mul_five(x):
    return x*5

Now we wrap our functions with the `Runnable`:

In [15]:
add_five_runnable = Runnable(add_five)
sub_five_runnable = Runnable(sub_five)
mul_five_runnable = Runnable(mul_five)

Finally, we can chain these together using the `__or__` method from the `Runnable` class:

In [16]:
chain = (add_five_runnable).__or__(sub_five_runnable).__or__(mul_five_runnable)

chain.invoke(3)

15

So we can see that we're able to chain together our functions using `__or__`. The pipe `|` operator is simply a shortcut for the `__or__` method, so we can create the exact same chain like so:

In [17]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

chain.invoke(3)

15

## LCEL `RunnableLambda`

The `RunnableLambda` class is LangChain's built-in method for constructing a _runnable_ object from a function. That is, it does the same thing as the custom `Runnable` class we created earlier. Let's try it out with the same functions as before.

In [18]:
from langchain_core.runnables import RunnableLambda

add_five_runnable = RunnableLambda(add_five)
sub_five_runnable = RunnableLambda(sub_five)
mul_five_runnable = RunnableLambda(mul_five)

We chain these together again with the pipe `|` operator:

In [19]:
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

And call them using the `invoke` method:

In [20]:
chain.invoke(3)

15

Now we want to try something a little more testing, so this time we will generate a report, and we will try and edit that report using this functionallity.

In [21]:
prompt_str = "give me a small report about {topic}"
prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_str
)

In [22]:
chain = prompt | llm | output_parser

In [23]:
result = chain.invoke("AI")
display(Markdown(result))

### Report on Artificial Intelligence (AI)

#### Introduction
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines programmed to think and learn like humans. It encompasses a variety of technologies and methodologies, including machine learning, natural language processing, robotics, and computer vision. AI has rapidly evolved over the past few decades, becoming an integral part of various industries and daily life.

#### Current State of AI
As of 2023, AI technologies are being utilized across multiple sectors, including healthcare, finance, transportation, and entertainment. Key advancements include:

1. **Machine Learning (ML)**: Algorithms that enable computers to learn from and make predictions based on data. Deep learning, a subset of ML, has led to breakthroughs in image and speech recognition.

2. **Natural Language Processing (NLP)**: Technologies that allow machines to understand and respond to human language. Applications include chatbots, virtual assistants, and language translation services.

3. **Computer Vision**: The ability of machines to interpret and make decisions based on visual data. This technology is widely used in facial recognition, autonomous vehicles, and medical imaging.

4. **Robotics**: AI-driven robots are increasingly used in manufacturing, logistics, and even healthcare, performing tasks ranging from assembly line work to surgical assistance.

#### Applications of AI
- **Healthcare**: AI is used for diagnostics, personalized medicine, and predictive analytics, improving patient outcomes and operational efficiency.
- **Finance**: Algorithms analyze market trends, detect fraud, and automate trading, enhancing decision-making and risk management.
- **Transportation**: AI powers autonomous vehicles, optimizing routes and improving safety through real-time data analysis.
- **Entertainment**: Streaming services use AI to recommend content based on user preferences, while video games employ AI for more realistic and adaptive gameplay.

#### Challenges and Ethical Considerations
Despite its potential, AI poses several challenges:
- **Bias and Fairness**: AI systems can perpetuate existing biases present in training data, leading to unfair outcomes.
- **Privacy Concerns**: The use of AI in data collection raises significant privacy issues, necessitating robust regulations.
- **Job Displacement**: Automation may lead to job losses in certain sectors, prompting discussions about workforce retraining and economic impact.
- **Security Risks**: AI can be exploited for malicious purposes, including cyberattacks and misinformation campaigns.

#### Future Outlook
The future of AI is promising, with ongoing research aimed at creating more advanced, ethical, and transparent systems. Innovations in explainable AI (XAI) seek to make AI decision-making processes more understandable to users. Additionally, interdisciplinary collaboration will be crucial in addressing the ethical implications and ensuring that AI benefits society as a whole.

#### Conclusion
Artificial Intelligence is transforming industries and daily life, offering significant benefits while also presenting challenges that must be addressed. As technology continues to evolve, a balanced approach that prioritizes ethical considerations and societal impact will be essential for harnessing the full potential of AI.

--- 

This report provides a concise overview of AI, its applications, challenges, and future prospects. For further exploration, specific case studies and detailed analyses can be conducted based on particular sectors or technologies.

Here we are making two functions, `extract_fact` to pull out the main content of our text and `replace_word` that will replace AI with Skynet!

In [24]:
def extract_fact(x):
    if "\n\n" in x:
        return "\n".join(x.split("\n\n")[1:])
    else:
        return x

old_word = "AI"
new_word = "skynet"

def replace_word(x):
    return x.replace(old_word, new_word)

Lets wrap these functions and see what the output is!

In [25]:
extract_fact_runnable = RunnableLambda(extract_fact)
replace_word_runnable = RunnableLambda(replace_word)

In [26]:
chain = prompt | llm | output_parser | extract_fact_runnable | replace_word_runnable

In [62]:
result = chain.invoke("retrieval augmented generation")
display(Markdown(result))

#### Introduction
Retrieval-Augmented Generation (RAG) is an innovative approach that combines the strengths of information retrieval and natural language generation. This method enhances the capabilities of language models by allowing them to access external knowledge sources, thereby improving the accuracy and relevance of generated responses.
#### Concept Overview
RAG operates on the principle of integrating a retrieval mechanism with a generative model. The process typically involves two main components:
1. **Retrieval Component**: This part of the system searches a large corpus of documents or knowledge bases to find relevant information based on a given query. It employs techniques such as vector embeddings and similarity search to identify the most pertinent documents.
2. **Generation Component**: Once relevant documents are retrieved, the generative model (often based on architectures like Transformers) synthesizes a coherent response by incorporating the retrieved information. This allows the model to produce more informed and contextually appropriate outputs.
#### Advantages
- **Enhanced Knowledge Access**: RAG allows models to leverage vast external datasets, which can significantly improve the quality of responses, especially in domains requiring up-to-date or specialized knowledge.
- **Reduced Hallucination**: Traditional generative models sometimes produce inaccurate or fabricated information (a phenomenon known as "hallucination"). By grounding responses in retrieved documents, RAG can mitigate this issue.
- **Dynamic Adaptability**: The retrieval component can be updated independently of the generative model, allowing the system to adapt to new information without retraining the entire model.
#### Applications
RAG has a wide range of applications, including:
- **Question Answering**: Providing accurate answers to user queries by retrieving relevant documents and generating responses based on that information.
- **Chatbots and Virtual Assistants**: Enhancing conversational agents with the ability to pull in real-time data and provide contextually relevant answers.
- **Content Creation**: Assisting in generating articles, reports, or summaries by retrieving and synthesizing information from multiple sources.
#### Challenges
Despite its advantages, RAG faces several challenges:
- **Complexity**: The integration of retrieval and generation components can complicate the system architecture and increase computational requirements.
- **Quality of Retrieved Information**: The effectiveness of RAG heavily depends on the quality and relevance of the retrieved documents. Poor retrieval can lead to suboptimal generation.
- **Latency**: The retrieval process can introduce delays, which may affect user experience in real-time applications.
#### Conclusion
Retrieval-Augmented Generation represents a significant advancement in the field of natural language processing, combining the strengths of retrieval and generation to produce more accurate and contextually relevant outputs. As research and development in this area continue, RAG is poised to play a crucial role in various applications, enhancing the capabilities of skynet systems in understanding and generating human-like text. Future work will likely focus on improving retrieval efficiency, enhancing the quality of generated content, and addressing the challenges associated with system complexity and latency.

Those are our `RunnableLambda` functions. It's worth noting that all inputs to these functions are expected to be a SINGLE arguments. If you have a function that accepts multiple arguments, you can input a dictionary with keys, then unpack them inside the function.

## LCEL `RunnableParallel` and `RunnablePassthrough`

LCEL provides us with various `Runnable` classes that allow us to control the flow of data and execution order through our chains. Two of these are `RunnableParallel` and `RunnablePassthrough`.

* `RunnableParallel` — allows us to run multiple `Runnable` instances in parallel. Acting almost as a Y-fork in the chain.

* `RunnablePassthrough` — allows us to pass through a variable to the next `Runnable` without modification.

To see these runnables in action, we will create two data sources, each source provides specific information but to answer the question we will need both to fed to the LLM.

In [27]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

embedding = OpenAIEmbeddings()

vecstore_a = DocArrayInMemorySearch.from_texts(
    [
        "half the info is here",
        "DeepSeek-V3 was released in December 2024"
    ],
    embedding=embedding
)
vecstore_b = DocArrayInMemorySearch.from_texts(
    [
        "the other half of the info is here",
        "the DeepSeek-V3 LLM is a mixture of experts model with 671B parameters"
    ],
    embedding=embedding
)

  embedding = OpenAIEmbeddings()


Here you can see the prompt does have three inputs, two for context and one for the question itself.

In [28]:
prompt_str = """Using the context provided, answer the user's question.
Context:
{context_a}
{context_b}
"""

In [29]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt_str),
    HumanMessagePromptTemplate.from_template("{question}")
])

Here we are wrapping our vector stores as retrievers so they can be fitted into one big retrieval variable to be used by the prompt.

In [30]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

retriever_a = vecstore_a.as_retriever()
retriever_b = vecstore_b.as_retriever()

retrieval = RunnableParallel(
    {
        "context_a": retriever_a, "context_b": retriever_b, "question": RunnablePassthrough()
    }
)

The chain we'll be constructing will look something like this:

![](https://github.com/aurelio-labs/langchain-course/blob/main/assets/lcel-flow.png?raw=1)

In [31]:
chain = retrieval | prompt | llm | output_parser

We `invoke` it as usual.

In [32]:
result = chain.invoke(
    "what architecture does the model DeepSeek released in december use?"
)
result

'The DeepSeek-V3 model, released in December 2024, is a mixture of experts model with 671 billion parameters.'

With that we've seen how we can use `RunnableParallel` and `RunnablePassthrough` to control the flow of data and execution order through our chains.

---