# 03 ‚Äì Memory & LCEL Basics

**Learning Goals:**
- Understand conversational memory in LangChain
- Compare memory types: Buffer vs Summary
- Master LCEL (LangChain Expression Language) composition
- Build streaming, retry, and fallback patterns

**What we'll cover:**
1. **Section A: Memory 101** - Buffer and Summary memory patterns
2. **Section B: Memory in Chains** - Inject memory into conversational flows
3. **Section C: LCEL Basics** - Compose runnables with `|` operator
4. **Section D: Advanced LCEL** - Streaming, retry, fallbacks

**Prerequisites:** Notebooks 01 & 02 completed

**Note:** This notebook focuses on fundamentals, not RAG. No ChromaDB or retrieval here.


In [1]:
# ‚öôÔ∏è Global Config & Services (using centralized modules)
import sys
import json
from pathlib import Path
from datetime import datetime
from dotenv import load_dotenv

# Add parent directory to path and change to project root
import os

# Get the notebook's current directory and find project root
notebook_dir = Path.cwd()
if notebook_dir.name == "notebooks":
    project_root = notebook_dir.parent
else:
    project_root = notebook_dir

# Change to project root and add to path
os.chdir(project_root)
sys.path.insert(0, str(project_root))

print(f"üìÇ Working directory: {os.getcwd()}")

from src.services.llm_services import (
    load_config,
    get_llm,
    validate_api_keys,
    print_config_summary
)

# Load environment variables
load_dotenv()

# Load configuration from config.yaml (now we're in project root)
config = load_config("src/config/config.yaml")

# Validate API keys
validate_api_keys(config, verbose=True)

# Print summary
print_config_summary(config)
print(f"  Note: Temperature is {config['temperature']} (good for conversational demos)")


üìÇ Working directory: /Users/machinelearningzuu/Dropbox/Zuu Crew/Courses/üöß AI Engineer Essentials/Live Classes/Week 03
‚úÖ Config loaded:
  LLM: openrouter (openai/gpt-4o-mini)
  Embeddings: sbert / sentence-transformers/all-MiniLM-L6-v2
  Temperature: 0.2
  Artifacts: ./artifacts
  Note: Temperature is 0.2 (good for conversational demos)




In [2]:
# Initialize LLM using factory from llm_services
llm = get_llm(config)
print(f"‚úÖ LLM initialized: {config['llm_provider']} / {config['llm_model']}")

# Verify API key with test completion
print("\nüîç Testing API connection...")
try:
    test_response = llm.invoke("Say 'API working!' if you can read this.")
    test_msg = test_response.content if hasattr(test_response, 'content') else str(test_response)
    print(f"‚úÖ API key verified: {test_msg[:50]}")
except Exception as e:
    print(f"‚ùå API key test failed: {e}")
    print("‚ö†Ô∏è  Please check your .env file and API key configuration.")


‚úÖ LLM initialized: openrouter / gpt-4o-mini

üîç Testing API connection...
‚úÖ API key verified: API working!


---

## Section A: Memory 101

LangChain provides memory primitives to maintain conversational context across turns.

### 1. ConversationBufferMemory

Stores **full chat history** in memory. Simple but can grow large.


In [3]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory

# ConversationBufferMemory: Stores FULL chat history in memory
buffer_memory = ConversationBufferMemory(
    return_messages=False,  # return_messages: Format of stored history
                            #   False = string format "Human: ... AI: ..."
                            #   True = list of Message objects (better for LCEL)
    k=2                     # k: (Note: ignored in BufferMemory, only used in WindowMemory)
)

# ConversationChain: Pre-built chain that manages conversation flow
conversation = ConversationChain(
    llm=llm,                # llm: Language model for generating responses
    memory=buffer_memory,   # memory: Memory object to store conversation history
    verbose=False           # verbose: If True, prints internal prompts (for debugging)
)

  buffer_memory = ConversationBufferMemory(
  conversation = ConversationChain(


In [4]:
while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        break
    response = conversation.predict(input=user_input)
    print("Human: ", user_input)
    print(f"AI: {response}")

Human:  hi
AI: Hello! How are you today? Is there anything specific you'd like to chat about or ask? I'm here to help!
Human:  goog morning
AI: Good morning! I hope you're having a great start to your day. Do you have any plans for today, or is there something on your mind that you'd like to discuss?
Human:  my name is Isuru
AI: Nice to meet you, Isuru! That's a lovely name. Where are you from, or what do you enjoy doing in your free time? I'm curious to learn more about you!
Human:  do you know my name ?
AI: Yes, you just told me your name is Isuru! It's great to know you. If you'd like to share more about yourself or ask anything, feel free!


### 2. ConversationBufferWindowMemory

In [6]:
# ConversationBufferWindowMemory: Only keeps the last 'k' conversation turns
buffer_memory = ConversationBufferWindowMemory(
    return_messages=True,   # return_messages: Return as Message objects (better for LCEL)
    k=3                     # k: Number of conversation turns to keep
                            #   k=3 means last 3 human-AI exchanges are remembered
                            #   Older messages are dropped (sliding window)
)

conversation = ConversationChain(
    llm=llm,
    memory=buffer_memory,
    verbose=False
)

# Interactive conversation loop (type 'exit' to quit)
while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        break
    response = conversation.predict(input=user_input)
    print("Human: ", user_input)
    print(f"AI: {response}")

# View stored history (only last k=3 turns will be shown)
buffer_memory.load_memory_variables({})['history']

Human:  Hello
AI: Hello! How are you today? I'm here to chat about anything on your mind, whether it's a question, a topic you're interested in, or just some friendly banter. What would you like to talk about?
Human:  DO you know my name ?
AI: I don't know your name yet! But I'd love to learn it if you'd like to share. What should I call you?
Human:  My name is ISuru ?
AI: Nice to meet you, Isuru! That's a great name. How can I assist you today? Do you have any specific topics in mind or something you'd like to chat about?
Human:  DO you know my name ?
AI: I don't know your name yet! But I'd love to learn it if you'd like to share. What should I call you?
Human:  Its Isuru
AI: Got it, Isuru! Thanks for reminding me. What would you like to talk about today? Any specific interests or questions on your mind?


[HumanMessage(content='My name is ISuru ?', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Nice to meet you, Isuru! That's a great name. How can I assist you today? Do you have any specific topics in mind or something you'd like to chat about?", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='DO you know my name ?', additional_kwargs={}, response_metadata={}),
 AIMessage(content="I don't know your name yet! But I'd love to learn it if you'd like to share. What should I call you?", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Its Isuru', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Got it, Isuru! Thanks for reminding me. What would you like to talk about today? Any specific interests or questions on your mind?', additional_kwargs={}, response_metadata={})]

### 3. ConversationSummaryMemory

Instead of storing full history, **summarizes** past conversation using an LLM. Reduces token usage but may lose details.


In [7]:
from langchain.memory import ConversationSummaryMemory

# ConversationSummaryMemory: Summarizes history using LLM (compact but loses detail)
summary_memory = ConversationSummaryMemory(
    llm=llm,              # llm: Required! Uses this LLM to generate summaries
    return_messages=True  # return_messages: Return as Message objects
                          #   The summary is stored as a SystemMessage
)

# Simulate a conversation by manually adding context
# save_context(inputs, outputs) - saves a single turn
summary_memory.save_context(
    {"input": "Hi, my name is Alice."},      # input: User's message
    {"output": "Hello Alice! Nice to meet you."}  # output: AI's response
)
summary_memory.save_context(
    {"input": "What's my name?"},
    {"output": "Your name is Alice."}
)
summary_memory.save_context(
    {"input": "What's the capital of France?"},
    {"output": "The capital of France is Paris."}
)

# View summarized history (notice how it's condensed)
print("üìù Summary Memory:")
print(summary_memory.load_memory_variables({}))
print(f"\nüìä Summary is more compact than full buffer")


  summary_memory = ConversationSummaryMemory(


üìù Summary Memory:
{'history': [SystemMessage(content='The human introduces herself as Alice, and the AI responds by greeting her and expressing pleasure in meeting her. The human then asks the AI for her name, and the AI confirms that her name is Alice. The human inquires about the capital of France, and the AI informs her that it is Paris.', additional_kwargs={}, response_metadata={})]}

üìä Summary is more compact than full buffer


In [8]:
print(summary_memory.load_memory_variables({})['history'][0].content)

The human introduces herself as Alice, and the AI responds by greeting her and expressing pleasure in meeting her. The human then asks the AI for her name, and the AI confirms that her name is Alice. The human inquires about the capital of France, and the AI informs her that it is Paris.


### Trade-offs: Buffer vs Summary

| Memory Type | Pros | Cons |
|-------------|------|------|
| **Buffer** | Full detail, no LLM calls | Grows unbounded, context limits |
| **Summary** | Compact, scalable | LLM calls needed, possible drift |

**When to use:**
- **Buffer**: Short conversations, need exact history
- **Summary**: Long conversations, want cost efficiency


---

## Section B: Memory in Chains

Let's inject memory into a simple conversational chain.


In [9]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# Create a conversational chain with memory
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=False,  # Set to True to see internal prompts
)

# Multi-turn conversation
print("üó®Ô∏è  Conversational Chain with Memory\n")

response1 = conversation.predict(input="Hi, I'm Bob and I love Python programming.")
print(f"User: Hi, I'm Bob and I love Python programming.")
print(f"AI: {response1}\n")

response2 = conversation.predict(input="What's my name?")
print(f"User: What's my name?")
print(f"AI: {response2}\n")

response3 = conversation.predict(input="What do I love?")
print(f"User: What do I love?")
print(f"AI: {response3}\n")

# View memory
print("üìù Stored Memory:")
print(memory.load_memory_variables({}))


üó®Ô∏è  Conversational Chain with Memory

User: Hi, I'm Bob and I love Python programming.
AI: Hello, Bob! It's great to meet you! Python is such a versatile and powerful programming language. What do you enjoy most about it? Are you working on any specific projects or exploring particular libraries? There‚Äôs so much you can do with Python, from web development with frameworks like Django and Flask to data analysis with libraries like Pandas and NumPy!

User: What's my name?
AI: Your name is Bob! It's nice to chat with you. Do you have any favorite Python projects or libraries you'd like to share?

User: What do I love?
AI: You love Python programming! It's a fantastic language with a wide range of applications. What aspects of Python do you find most enjoyable? Is it the simplicity of the syntax, the vast ecosystem of libraries, or perhaps the community support?

üìù Stored Memory:
{'history': "Human: Hi, I'm Bob and I love Python programming.\nAI: Hello, Bob! It's great to meet yo

### Resetting Memory

Between sessions, clear memory to start fresh.


In [10]:
# Clear memory
memory.clear()

response4 = conversation.predict(input="What's my name?")
print(f"After clearing memory:")
print(f"User: What's my name?")
print(f"AI: {response4}")
print(f"\n‚úÖ Memory reset - AI no longer remembers Bob")


After clearing memory:
User: What's my name?
AI: I‚Äôm not sure what your name is! I don‚Äôt have access to that information. But I‚Äôd love to know it if you‚Äôd like to share!

‚úÖ Memory reset - AI no longer remembers Bob


---

## Section C: LCEL (LangChain Expression Language) Basics

LCEL is a declarative way to compose LangChain components using the `|` operator.

### Core Concepts

1. **Runnable**: Base interface for all LCEL components
2. **Pipe (`|`)**: Chain runnables together
3. **RunnablePassthrough**: Pass data through unchanged
4. **RunnableMap**: Apply multiple operations in parallel

### Simple LCEL Chain

Let's build: `PromptTemplate | LLM | StrOutputParser`


In [11]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# ChatPromptTemplate: Defines the structure of messages sent to the LLM
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer concisely."),  # System instruction
    ("human", "{question}")  # {question} is a placeholder filled at runtime
])

# Build LCEL chain using the | (pipe) operator
# Data flows: Input Dict ‚Üí Prompt ‚Üí LLM ‚Üí Output Parser ‚Üí String
chain = (
    prompt              # Step 1: Format input into messages
    | llm               # Step 2: Send to LLM, get response
    | StrOutputParser() # Step 3: Extract string from AIMessage
)

# Invoke the chain with input dictionary
response = chain.invoke({"question": "What is eczema and how is it treated?"})
print("üîó Simple LCEL Chain:")
print(f"Question: What is eczema and how is it treated?")
print(f"Answer: {response}")


üîó Simple LCEL Chain:
Question: What is eczema and how is it treated?
Answer: Eczema, also known as atopic dermatitis, is a chronic inflammatory skin condition characterized by dry, itchy, and inflamed skin. It can occur in various forms and may be triggered by allergens, irritants, stress, or changes in temperature.

**Treatment options include:**

1. **Moisturizers:** Regular use of emollients to keep the skin hydrated.
2. **Topical corticosteroids:** To reduce inflammation and itching during flare-ups.
3. **Topical calcineurin inhibitors:** Non-steroidal medications to control inflammation.
4. **Antihistamines:** To relieve itching, especially at night.
5. **Phototherapy:** Controlled exposure to ultraviolet light for severe cases.
6. **Systemic medications:** In severe cases, oral or injectable medications may be prescribed.
7. **Avoiding triggers:** Identifying and avoiding known irritants or allergens.

Consulting a healthcare provider for a personalized treatment plan is recom

### RunnablePassthrough & RunnableMap

Use `RunnablePassthrough` to pass input data and `RunnableMap` (via dict) for parallel operations.


In [12]:
from langchain_core.runnables import RunnableParallel

# RunnableParallel: Runs multiple operations in parallel and merges results
# Useful for preparing multiple inputs for a prompt

# Create a context-aware prompt template
context_prompt = ChatPromptTemplate.from_template("""
Use the context to answer the question.

Context: {context}
Question: {question}

Answer:""")

# Build chain with RunnableParallel for multiple inputs
chain_with_context = (
    RunnableParallel({
        "context": RunnablePassthrough(),   # Pass context through unchanged
        "question": RunnablePassthrough()   # Pass question through unchanged
    })
    | context_prompt    # Format into prompt with both placeholders
    | llm               # Generate answer
    | StrOutputParser() # Extract string
)

# Test the chain
result = chain_with_context.invoke({
    "context": "Zuu Crew AI is delivering Agentic AI Engineering Bootcamp.",
    "question": "What bootcamps is Zuu Crew AI doing?"
})

print("üîó Chain with Context:")
print(f"Result: {result}")

üîó Chain with Context:
Result: Zuu Crew AI is delivering the Agentic AI Engineering Bootcamp.


---

## Section D: Advanced LCEL Patterns

### 1. Streaming

Stream tokens as they're generated for better UX.


In [13]:
import sys

# Compare: Regular invoke (waits for full response) vs Streaming (token by token)

print("üåä Without Streaming (waits for complete response):")
print(chain.invoke({"question": "Explain RAG in one sentence."}))

print("\nüåä With Streaming (tokens appear as generated):")
print("Answer: ", end="")

# chain.stream() yields chunks as they're generated
for chunk in chain.stream({"question": "Explain RAG in one sentence."}):
    print(chunk, end="", flush=True)  # end="" prevents newlines, flush=True forces immediate output

print("\n\n‚úÖ Streaming complete")


üåä Without Streaming (waits for complete response):
RAG, or Retrieval-Augmented Generation, is a machine learning approach that combines information retrieval with generative models to enhance the generation of text by incorporating relevant external knowledge.

üåä With Streaming (tokens appear as generated):
Answer: RAG (Retrieval-Augmented Generation) is a machine learning approach that combines retrieval of relevant documents from a knowledge base with generative models to produce more accurate and contextually relevant responses.

‚úÖ Streaming complete


### 2. Retry with Fallback

Use `.with_retry()` for automatic retries and `.with_fallbacks()` for fallback models.


In [14]:
# .with_retry(): Automatically retry on transient failures
chain_with_retry = chain.with_retry(
    stop_after_attempt=3  # stop_after_attempt: Max number of retry attempts
                          #   3 = try up to 3 times before raising error
                          # Other options: wait_exponential_jitter=True for backoff
)

print("üîÑ Chain with retry enabled")
print("   - Retries up to 3 times on API failures")

# .with_fallbacks(): Use backup LLM if primary fails
# Example (requires a second LLM configured):
# fallback_llm = get_llm({"llm_provider": "groq", ...})
# chain_with_fallback = chain.with_fallbacks([fallback_llm | StrOutputParser()])
# print("   - Falls back to secondary LLM if primary fails")

print("‚úÖ Retry pattern configured")


üîÑ Chain with retry enabled
   - Retries up to 3 times on API failures
‚úÖ Retry pattern configured


---

## Save Manifest


In [15]:
manifests_dir = Path(config["artifacts_root"]) / "manifests"
manifests_dir.mkdir(parents=True, exist_ok=True)

manifest = {
    "notebook": "03_memory_lcel_basics",
    "topics": [
        "ConversationBufferMemory",
        "ConversationBufferWindowMemory", 
        "ConversationSummaryMemory",
        "ConversationChain with memory",
        "LCEL composition (pipe operator)",
        "RunnableParallel",
        "Streaming",
        "Retry patterns"
    ],
    "llm_provider": config["llm_provider"],
    "llm_model": config["llm_model"],
    "created_at": datetime.now().isoformat(),
}

manifest_path = manifests_dir / "memory_lcel.json"
with open(manifest_path, "w") as f:
    json.dump(manifest, f, indent=2)

print(f"‚úÖ Manifest saved: {manifest_path}")


‚úÖ Manifest saved: artifacts/manifests/memory_lcel.json


---

## Summary

**What we learned:**

### Memory
- ‚úÖ **Buffer Memory**: Stores full history (simple but grows)
- ‚úÖ **Summary Memory**: LLM-summarized history (compact but may drift)
- ‚úÖ **Memory in Chains**: Inject context into conversational flows
- ‚úÖ **Reset/Clear**: Start fresh between sessions

### LCEL
- ‚úÖ **Composition**: Use `|` to chain runnables
- ‚úÖ **RunnablePassthrough**: Pass data unchanged
- ‚úÖ **RunnableParallel**: Run operations in parallel
- ‚úÖ **Streaming**: Token-by-token generation
- ‚úÖ **Retry**: Automatic retries on failure
- ‚úÖ **Fallbacks**: Switch to backup LLM

**Key Patterns:**
```python
# Simple chain
chain = prompt | llm | parser

# With context
chain = RunnableParallel({...}) | prompt | llm | parser

# With retry
chain = chain.with_retry(stop_after_attempt=3)

# With streaming
for chunk in chain.stream(input):
    print(chunk)
```

**Artifacts:**
- `./artifacts/manifests/memory_lcel.json`
