Follow on from 05: 
Summary: Multi-Step Workflow with LCEL
Overview
This exercise demonstrates how to build a more complex, multi-step workflow using the LangChain Expression Language (LCEL). The project involves generating business ideas, analyzing them for strengths and weaknesses, and producing a structured business report, all through interconnected chains.

Key Steps Covered
1. Initial Setup
All necessary imports and environment configurations are completed.
LCEL components such as RunnableParallel, pipe, and message parsing utilities are ready for use.
2. Idea Generation Chain
A prompt template is created to generate a business idea based on a given industry.
The chain uses:
The idea prompt
A language model (llm)
An output parser to structure the result
idea_chain = idea_prompt | llm | parse
Invoking the chain with an example input like "agro" produces an innovative business idea and a brief description.
3. Business Idea Analysis Chain
A second prompt template is created to analyze the generated business idea.

The analysis prompt asks for:

Three key strengths
Three potential weaknesses
This chain follows the same structure: prompt → model → parser.

analysis_chain = analysis_prompt | llm | parse
The output lists the strengths and weaknesses of the business idea clearly.
4. Business Report Generation Chain
A structured prompt and Pydantic model are used to create a business report from the analysis.
The report uses function calling to ensure structured output:
A Report class is defined with strengths and weaknesses fields.
class Report(BaseModel):
  strengths: List[str]
  weaknesses: List[str]
The report chain builds a structured final report from the analysis results.
report_chain = report_prompt | llm.with_structured_output(Report)
5. Building the End-to-End Workflow
An overall end-to-end chain combines:

idea_chain
analysis_chain
report_chain
This composite chain takes an industry input and produces a complete, structured business report in one invocation.

end_to_end_chain = idea_chain | analysis_chain | report_chain
Testing the full chain with "agro" yields:
An innovative idea
An analysis
A final structured report
6. Encouraged Exploration
Learners are encouraged to:
Add memory modules to track multiple interactions.
Explore additional runnables to expand the workflow.
Integrate even more complex processing steps or refine output formatting.

## Introduction to Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) enhances language models by providing external context from a retrieval system before generating responses. This approach improves accuracy and reduces hallucinations by grounding responses in relevant information.

How RAG Works
RAG pipelines consist of three main components:

Retrieval – Searches a database or document corpus to find relevant information. Uses vector search or keyword matching.

Augmentation – Combines retrieved documents with the user’s query in a structured prompt.

Generation – The LLM generates a response using both the query and retrieved context.

Building a RAG Pipeline
A RAG system requires both an offline indexing phase and an online retrieval phase.

Indexing (Offline Phase)
Before retrieval can happen, documents must be processed and stored efficiently:

Document Loaders – Extract raw data from files, APIs, or databases.
Text Splitters – Divide large documents into smaller, searchable chunks.
VectorStore & Embeddings – Convert text into vector representations and store them for fast retrieval.
Retrieval & Augmented Generation (Online Phase)
When a user submits a query, the system:

Searches the VectorStore for relevant document chunks.

Incorporates retrieved text into a structured prompt.

Generates an informed response using the LLM.

Applications of RAG
Customer Support – AI chatbots retrieve relevant FAQ responses.
Content Creation – AI-assisted writing tools generate fact-based content.
Research & Knowledge Management – Quickly synthesize insights from large datasets.
Design Considerations
Ensure retrieval quality – Poorly selected documents lead to irrelevant outputs.
Optimize text chunking – Too small, and key details get lost; too large, and retrieval suffers.
Manage LLM context limits – Retrieved text must fit within the model’s processing window.
Final Thoughts
RAG significantly improves the accuracy, efficiency, and trustworthiness of AI-generated content. By combining retrieval systems with generative models, applications can deliver more precise and context-aware responses.

Demo: RAG pipelines

In [1]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_community.document_loaders.wikipedia import WikipediaLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from dotenv import load_dotenv

1. Setting the Stage: Defining the Data Source
Wikipedia is used as the external knowledge source.
A search query retrieves a document about Anthony Hopkins.
Only one document is retrieved initially, containing a significant amount of text.
docs = wikipedia_search("Anthony Hopkins")

Loader, getting the source text

In [2]:
loader = WikipediaLoader(
    "Anthony_Hopkins",
    load_max_docs=1,
    doc_content_chars_max=40000
)
docs = loader.load()

In [3]:
len(docs)

1

In [4]:
len(docs[0].page_content)

38910

In [5]:
print(docs[0].page_content)

Sir Philip Anthony Hopkins (born 31 December 1937) is a Welsh actor. Considered one of Britain's most recognisable and prolific actors, he is known for his performances on the screen and stage. Hopkins has received numerous accolades, including two Academy Awards, four BAFTA Awards, two Primetime Emmy Awards, and a Laurence Olivier Award. He has also received the Cecil B. DeMille Award in 2005 and the BAFTA Fellowship for lifetime achievement in 2008. He was knighted by Queen Elizabeth II for his services to drama in 1993.
After graduating from the Royal Welsh College of Music & Drama in 1957, Hopkins trained at RADA (the Royal Academy of Dramatic Art) in London. He was then spotted by Laurence Olivier, who invited him to join the Royal National Theatre in 1965. Productions at the National included King Lear (his favourite Shakespeare play), Coriolanus, Macbeth, and Antony and Cleopatra. In 1985, he received acclaim and a Laurence Olivier Award for his performance in the David Hare pla

2. Splitting Documents into Chunks
Large documents are split into smaller segments using a text splitter.
Smaller chunks improve retrieval accuracy by narrowing the search space.
split_docs = text_splitter.split_documents(docs)
Example: One document is split into 182 sub-documents.

In [6]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300, 
    chunk_overlap=50
)
all_splits = text_splitter.split_documents(docs)


In [7]:
print(f"Split Wikipedia page into {len(all_splits)} sub-documents.")

Split Wikipedia page into 182 sub-documents.


In [8]:
all_splits[0]

Document(metadata={'title': 'Anthony Hopkins', 'summary': "Sir Philip Anthony Hopkins (born 31 December 1937) is a Welsh actor. Considered one of Britain's most recognisable and prolific actors, he is known for his performances on the screen and stage. Hopkins has received numerous accolades, including two Academy Awards, four BAFTA Awards, two Primetime Emmy Awards, and a Laurence Olivier Award. He has also received the Cecil B. DeMille Award in 2005 and the BAFTA Fellowship for lifetime achievement in 2008. He was knighted by Queen Elizabeth II for his services to drama in 1993.\nAfter graduating from the Royal Welsh College of Music & Drama in 1957, Hopkins trained at RADA (the Royal Academy of Dramatic Art) in London. He was then spotted by Laurence Olivier, who invited him to join the Royal National Theatre in 1965. Productions at the National included King Lear (his favourite Shakespeare play), Coriolanus, Macbeth, and Antony and Cleopatra. In 1985, he received acclaim and a Laur

In [9]:
# embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

3. Embedding and Storing Chunks
Each document chunk is converted into a vector embedding using OpenAI Embeddings.
All embeddings are stored in an in-memory vector store.
vectorstore.add_documents(split_docs)
This allows for fast similarity searches when a user query arrives.

In [10]:
vector_store = InMemoryVectorStore(embeddings)

In [11]:
document_ids = vector_store.add_documents(documents=all_splits)
document_ids[:10]

['ecc57f37-abbd-4022-afbd-a6767dabca44',
 '0e6ad605-2bd5-438e-8a34-0fab99116fa7',
 '521c1e1e-6f52-4e6c-b670-748ab8e09900',
 'c7c36634-816c-42f9-8e72-5923c1bd63f5',
 '5759db18-8ee1-41da-97ac-2c838d05ab13',
 'd065e6e6-9ea0-42e9-83a6-b8a612acc948',
 'b647cbeb-a97f-46ad-8363-03cf071a8160',
 'fd11f740-7cba-4f79-81d4-fd9e947d9d6e',
 '4e62d335-3daa-4fc6-a700-972e6118f5d1',
 '728531b0-02ff-424d-b87f-f62aca66c3ce']

4. Retriever Setup
A retriever is created from the vector store.
The retriever fetches the top k=3 most relevant documents based on similarity to the user query.
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [12]:
retriever = vector_store.as_retriever(search_kwargs={"k": 3})

5. Prompt Template Creation
A chat prompt template is defined to guide the LLM:
It receives both the context (retrieved documents) and the question.
The model is instructed to reason based only on provided context.
prompt = ChatPromptTemplate.from_messages([
  ("system", "You are an assistant for question answering."),
  ("human", "Use the following context: {context}\nQuestion: {question}\nAnswer:")
])
The template is a runnable object, so it can be invoked directly.

In [13]:
template = ChatPromptTemplate([
    ("system", "You are an assistant for question-answering tasks."),
    ("human", "Use the following pieces of retrieved context to answer the question. "
              "If you don't know the answer, just say that you don't know. " 
              "Use three sentences maximum and keep the answer concise. "
              "\n# Question: \n-> {question} "
              "\n# Context: \n-> {context} "
              "\n# Answer: "),
])

In [14]:
template.invoke(
    {"context": "##CONTEXT##", "question": "##QUESTION##"}
).to_messages()

[SystemMessage(content='You are an assistant for question-answering tasks.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. \n# Question: \n-> ##QUESTION## \n# Context: \n-> ##CONTEXT## \n# Answer: ", additional_kwargs={}, response_metadata={})]

In [15]:
def format_docs(docs):
    formatted = "\n\n-> ".join(doc.page_content for doc in docs)
    return formatted

6. Generation Pipeline
The complete flow:
Format the retrieved documents into a single string for the context.
Fill the prompt template with the question and context.
Pass the resulting messages to the LLM.
Retrieve the final answer.
context = format_docs(docs)
messages = prompt.invoke({"context": context, "question": "When was The Silence of the Lambs released?"})
answer = llm.invoke(messages)
Example output: "The Silence of the Lambs was released in 1991."

In [16]:
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.0,
)

In [17]:
question = "When The Silence of the Lambs was released?"
context = format_docs(retriever.invoke(question))
messages = template.invoke({'question' : question, 'context' : context}).to_messages()
answer = llm.invoke(messages)

In [18]:
print(messages[1].content)

Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. 
# Question: 
-> When The Silence of the Lambs was released? 
# Context: 
-> === 1990–1999: The Silence of the Lambs and film stardom ===

-> Hopkins won acclaim among critics and audiences as the cannibalistic serial killer Hannibal Lecter in The Silence of the Lambs, for which he won the Academy Award for Best Actor in 1991, with Jodie Foster as Clarice Starling, who also won for Best Actress. The film won Best Picture, Best Director

-> Actress. The film won Best Picture, Best Director and Best Adapted Screenplay, and Hopkins also picked up his first BAFTA for Best Actor. Hopkins reprised his role as Lecter twice; in Ridley Scott's Hannibal (2001), and Red Dragon (2002). His original portrayal of the character in The Silence of 
# Answer: 


In [19]:
answer

AIMessage(content='The Silence of the Lambs was released in 1991.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 242, 'total_tokens': 255, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CO2usQXALSbi00sAWUiGpV0uL609x', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--9f60728f-9407-4dc3-abf2-a52961d55010-0', usage_metadata={'input_tokens': 242, 'output_tokens': 13, 'total_tokens': 255, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

7. Alternative: LCEL Version
The RAG flow is also implemented using LCEL (LangChain Expression Language) for a more concise pipeline:
Parallel retrieval of context and question.
Formatting the final prompt.
LLM invocation.
rag_chain = (
  {"context": retriever | format_docs, "question": RunnablePassthrough()}
  | prompt
  | llm
)
Example question: "When was Anthony Hopkins born?"
Correct answer: "December 31, 1937."

In [20]:
rag_chain = ( 
    RunnableParallel(
        context = retriever | format_docs, 
        question = RunnablePassthrough() 
    )
    | template 
    | llm 
)

In [21]:
rag_chain.invoke("When he was born?")

AIMessage(content='Philip Anthony Hopkins was born on 31 December 1937.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 235, 'total_tokens': 248, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CO2uvqSPZyj9VxtarxhcceAKgHknc', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--06ed11c9-c9da-45c9-9e3d-7c4f2538d494-0', usage_metadata={'input_tokens': 235, 'output_tokens': 13, 'total_tokens': 248, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

8. Key Concepts Reinforced
Augmenting LLMs with external knowledge improves factual accuracy.
Chaining retrieval, prompt construction, and generation creates flexible, reusable workflows.
LCEL simplifies the construction of complex pipelines with minimal code.
9. Conclusion
RAG pipelines enhance LLM outputs by grounding them in verified external documents.
Document splitting, retrieval tuning, and structured prompting are critical for high-quality RAG systems.