# Session 4.1: BakeryAI - LangSmith Tracing & Observability

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/15iijCXgCn0xgj35mmglpfWL6CYtpWGaC?usp=sharing)

## 🎯 Production Readiness: Observability

### Why Observability Matters

In production, you need to know:
- 📊 **What's happening**: Which tools are being called?
- ⏱️ **Performance**: How long does each step take?
- 💰 **Cost**: How many tokens are we using?
- 🐛 **Errors**: Where do failures occur?
- 📈 **Quality**: Are responses getting better or worse?

### What is LangSmith?

**LangSmith** is LangChain's official observability and evaluation platform:

```
Your Application
     ↓
[LangSmith Tracing]
     ↓
Dashboard Shows:
- Every LLM call
- Tool executions
- Latency metrics
- Token usage
- Error logs
- User feedback
```

### LangSmith Features:

1. **Tracing**: See every step of agent execution
2. **Monitoring**: Real-time performance metrics
3. **Debugging**: Pinpoint errors instantly
4. **Evaluation**: Test prompts and chains
5. **Datasets**: Build test suites
6. **Feedback**: Collect user ratings

Let's instrument BakeryAI! 🚀

In [1]:
!pip install -q langchain langchain-openai langsmith
!pip install -q python-dotenv

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.0/76.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25h

## 1. Setting Up LangSmith

**Get your API keys:**
1. Go to https://smith.langchain.com
2. Sign up for free account
3. Go to Settings → API Keys
4. Create a new API key

In [2]:
from google.colab import userdata
import os

# Set OpenAI API key from Google Colab's user environment or default
def set_openai_api_key(default_key: str = "YOUR_API_KEY") -> None:
    """Set the OpenAI API key from Google Colab's user environment or use a default value."""
    #if not (userdata.get("OPENAI_API_KEY") or "OPENAI_API_KEY" in os.environ):
    try:
      os.environ["OPENAI_API_KEY"] = userdata.get("MDX_OPENAI_API_KEY")
    except:
      os.environ["OPENAI_API_KEY"] = default_key

set_openai_api_key()

In [3]:
# Enable LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = userdata.get("MDX_LANGSMITH_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = "mdx-conclave-agents"  # Project name

print("✅ LangSmith Configuration:")
print(f"   Tracing Enabled: {os.getenv('LANGCHAIN_TRACING_V2')}")
print(f"   Project: {os.getenv('LANGCHAIN_PROJECT')}")
print("\n⚠️  Make sure to set LANGCHAIN_API_KEY in your .env file!")

✅ LangSmith Configuration:
   Tracing Enabled: true
   Project: mdx-conclave-agents

⚠️  Make sure to set LANGCHAIN_API_KEY in your .env file!


In [4]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o")

print("✅ LangChain initialized with LangSmith tracing")

✅ LangChain initialized with LangSmith tracing


## 2. Basic Tracing

Every LangChain call is now automatically traced!

In [5]:
# Simple chain with automatic tracing
prompt = ChatPromptTemplate.from_template(
    "You are a helpful bakery assistant. Answer: {question}"
)

chain = prompt | llm | StrOutputParser()

# This will be traced in LangSmith!
result = chain.invoke({"question": "What makes a good chocolate cake?"})

print("Answer:", result)
print("\n✅ Check LangSmith dashboard to see the trace!")
print("   Go to: https://smith.langchain.com")

Answer: A good chocolate cake is the perfect balance of rich flavor, moist texture, and an appealing appearance. Here are some key elements that contribute to making a great chocolate cake:

1. **Quality Ingredients:** Use high-quality cocoa powder or chocolate, preferably with a higher cocoa content for a rich flavor. Fresh eggs, real butter, and pure vanilla extract will also enhance the taste.

2. **Moisture:** A good chocolate cake should be moist without being overly dense. Ingredients like buttermilk, sour cream, or even coffee can add moisture and depth of flavor.

3. **Texture:** The cake should be fluffy with a tender crumb. Properly sifting the dry ingredients and not over-mixing the batter can help achieve this. Incorporating air by creaming the butter and sugar well can also contribute to a lighter texture.

4. **Balance of Flavors:** The bitterness of the chocolate should be balanced with the sweetness from sugar and the richness from fats like butter or oil. A hint of sal

## 3. Tracing with Custom Names and Metadata

In [6]:
from langsmith import traceable
from langchain_core.runnables import RunnableConfig

@traceable(name="bakery_recommendation")
def get_cake_recommendation(occasion: str, preferences: str) -> str:
    """Get cake recommendation with custom tracing"""

    prompt = ChatPromptTemplate.from_template(
        """Recommend a cake for this occasion and preferences.

        Occasion: {occasion}
        Preferences: {preferences}

        Provide a specific recommendation."""
    )

    chain = prompt | llm | StrOutputParser()

    # Add metadata to trace
    config = RunnableConfig(
        run_name="cake_recommendation",
        tags=["recommendation", "customer_facing"],
        metadata={
            "occasion": occasion,
            "preferences": preferences,
            "version": "1.0"
        }
    )

    return chain.invoke(
        {"occasion": occasion, "preferences": preferences},
        config=config
    )

# Test with tracing
result = get_cake_recommendation(
    occasion="birthday party",
    preferences="chocolate lover"
)

print("Recommendation:", result)
print("\n✅ This trace includes custom tags and metadata!")

Recommendation: For a birthday party with a chocolate-loving honoree, I recommend a decadent Chocolate Fudge Cake. This cake is rich, moist, and intensely chocolate-flavored, making it a perfect centerpiece for the celebration. 

The layers are made with high-quality cocoa and dark chocolate to enhance the depth of flavor. The cake is generously filled and frosted with a luscious chocolate ganache or a silky chocolate buttercream, depending on preference. For an extra touch of indulgence, consider adding a layer of chocolate mousse inside or a drizzle of chocolate ganache on top. Garnish with chocolate shavings or curls for added visual appeal and texture.

This cake will surely delight any chocolate enthusiast and leave a lasting impression on all the party guests.

✅ This trace includes custom tags and metadata!


## 4. Tracing Agents and Tools

In [7]:
from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_core.prompts import MessagesPlaceholder

@tool
@traceable(name="check_inventory_tool")
def check_inventory(product_name: str) -> str:
    """Check product inventory status.

    Args:
        product_name: Name of the product to check

    Returns:
        Inventory status
    """
    # Simulate inventory check
    import random

    in_stock = random.random() < 0.8

    if in_stock:
        stock = random.randint(5, 20)
        return f"✅ {product_name} in stock: {stock} units"
    else:
        return f"❌ {product_name} out of stock"

@tool
@traceable(name="calculate_price_tool")
def calculate_price(product: str, quantity: int) -> str:
    """Calculate order price.

    Args:
        product: Product name
        quantity: Quantity to order

    Returns:
        Price calculation
    """
    base_price = 45
    total = base_price * quantity
    delivery = 10 if total < 100 else 0

    return f"Price: ${total}, Delivery: ${delivery}, Total: ${total + delivery}"

# Create agent
tools = [check_inventory, calculate_price]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are BakeryAI assistant. Help customers with orders."),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=False
)

print("✅ Agent created with traced tools")

✅ Agent created with traced tools


In [8]:
# Run agent - every step will be traced!
result = agent_executor.invoke({
    "input": "Check if chocolate cake is in stock and calculate price for 3"
})

print("Result:", result['output'])
print("\n✅ Check LangSmith - you'll see:")
print("   - Agent reasoning")
print("   - Tool calls (check_inventory, calculate_price)")
print("   - LLM calls")
print("   - Token usage")
print("   - Latency for each step")

Result: The chocolate cake is in stock with 8 units available. The price for 3 chocolate cakes is $135, and there's no additional delivery charge, bringing the total to $135. Would you like to place an order?

✅ Check LangSmith - you'll see:
   - Agent reasoning
   - Tool calls (check_inventory, calculate_price)
   - LLM calls
   - Token usage
   - Latency for each step


## 5. Capturing User Feedback

In [9]:
from langsmith import Client

# Initialize LangSmith client
try:
    client = Client()
    print("✅ LangSmith client initialized")
except Exception as e:
    print(f"⚠️  LangSmith client not available: {e}")
    print("   Make sure LANGCHAIN_API_KEY is set in .env")
    client = None

# Function to capture feedback
def capture_user_feedback(run_id: str, score: float, comment: str = ""):
    """Capture user feedback for a run

    Args:
        run_id: The LangSmith run ID
        score: Rating from 0-1 (or 1-5 scaled to 0-1)
        comment: Optional feedback comment
    """
    if client:
        try:
            client.create_feedback(
                run_id=run_id,
                key="user_rating",
                score=score,
                comment=comment
            )
            print(f"✅ Feedback captured: {score} stars")
        except Exception as e:
            print(f"⚠️  Could not capture feedback: {e}")

# Example: Capture feedback after agent run
# In production, you'd get run_id from the trace
print("\nExample feedback capture:")
print("  capture_user_feedback(run_id='abc123', score=0.8, comment='Helpful!')")

✅ LangSmith client initialized

Example feedback capture:
  capture_user_feedback(run_id='abc123', score=0.8, comment='Helpful!')


## 6. Creating Datasets for Testing

In [10]:
# Create a test dataset
test_cases = [
    {
        "input": "What is your refund policy?",
        "expected": "Should mention 24 hours and store credit"
    },
    {
        "input": "Check if chocolate cake is available",
        "expected": "Should use check_inventory tool"
    },
    {
        "input": "How much for 2 vanilla cakes?",
        "expected": "Should use calculate_price tool"
    }
]

# Create dataset in LangSmith
if client:
    try:
        dataset_name = "bakery-ai-test-cases"

        # Create dataset
        dataset = client.create_dataset(
            dataset_name=dataset_name,
            description="Test cases for BakeryAI agent"
        )

        # Add examples
        for case in test_cases:
            client.create_example(
                inputs={"input": case["input"]},
                outputs={"expected": case["expected"]},
                dataset_id=dataset.id
            )

        print(f"✅ Created dataset: {dataset_name}")
        print(f"   Examples: {len(test_cases)}")
    except Exception as e:
        print(f"⚠️  Dataset creation failed: {e}")
else:
    print("💡 Dataset structure defined (requires LangSmith API key to create)")
    for i, case in enumerate(test_cases, 1):
        print(f"\n{i}. Input: {case['input']}")
        print(f"   Expected: {case['expected']}")

✅ Created dataset: bakery-ai-test-cases
   Examples: 3


## 7. Running Evaluations

In [11]:
from langsmith.evaluation import evaluate

def run_evaluation():
    """Run evaluation on test dataset"""

    if not client:
        print("⚠️  LangSmith client not available")
        return

    # Define evaluation function
    def my_evaluator(run, example):
        """Simple evaluator - checks if output contains expected keywords"""
        output = run.outputs.get("output", "")
        expected = example.outputs.get("expected", "")

        # Simple keyword matching
        score = 1.0 if any(word in output.lower() for word in expected.lower().split()) else 0.0

        return {"key": "keyword_match", "score": score}

    # Run evaluation
    try:
        results = evaluate(
            lambda inputs: agent_executor.invoke(inputs),
            data="bakery-ai-test-cases",
            evaluators=[my_evaluator],
            experiment_prefix="bakery-eval"
        )

        print("✅ Evaluation complete!")
        print(f"   Results: {results}")
    except Exception as e:
        print(f"⚠️  Evaluation failed: {e}")

# Run evaluation
print("🧪 Running evaluation...")
run_evaluation()

🧪 Running evaluation...
View the evaluation results for experiment: 'bakery-eval-4ea4e596' at:
https://smith.langchain.com/o/570e9dfa-bc13-4000-96aa-f798e00a4212/datasets/42b09b39-211b-4e26-891e-8a177e68407b/compare?selectedSessions=36d29c66-3ff7-475d-b029-a0ec498e1eae




0it [00:00, ?it/s]

✅ Evaluation complete!
   Results: <ExperimentResults bakery-eval-4ea4e596>


## 8. Custom Tracing for Complex Workflows

In [13]:
from langsmith import traceable
import time

# Create separate functions for each step
@traceable(name="validate_customer")
def validate_customer(customer_id: str):
    """Validate customer"""
    time.sleep(0.1)  # Simulate
    return True

@traceable(name="check_inventory")
def check_inventory_step(product: str):
    """Check inventory for product"""
    # Replace with your actual check_inventory.invoke() call
    return {"available": True, "quantity": 100}

@traceable(name="calculate_price")
def calculate_price_step(product: str, quantity: int):
    """Calculate price for order"""
    # Replace with your actual calculate_price.invoke() call
    base_price = 25.0 if "Cake" in product else 10.0
    return {"total": base_price * quantity, "unit_price": base_price}

@traceable(name="create_order")
def create_order_step(customer_id: str, product: str, quantity: int):
    """Create order record"""
    time.sleep(0.1)  # Simulate
    return f"ORD{int(time.time())}"

@traceable(name="order_processing_workflow")
def process_order(customer_id: str, product: str, quantity: int):
    """Complete order processing with custom tracing"""

    results = {}

    # Step 1: Validate customer
    results['customer_valid'] = validate_customer(customer_id)

    # Step 2: Check inventory
    results['inventory'] = check_inventory_step(product)

    # Step 3: Calculate price
    results['price'] = calculate_price_step(product, quantity)

    # Step 4: Create order
    results['order_id'] = create_order_step(customer_id, product, quantity)

    return results

# Test
result = process_order(
    customer_id="CUST123",
    product="Chocolate Cake",
    quantity=2
)

print("Order Processing Result:")
for key, value in result.items():
    print(f"  {key}: {value}")

print("\n✅ Check LangSmith - you'll see hierarchical trace of all steps!")

Order Processing Result:
  customer_valid: True
  inventory: {'available': True, 'quantity': 100}
  price: {'total': 50.0, 'unit_price': 25.0}
  order_id: ORD1761432412

✅ Check LangSmith - you'll see hierarchical trace of all steps!


## 9. Performance Monitoring

In [14]:
import time
from datetime import datetime

class PerformanceMonitor:
    """Monitor application performance"""

    def __init__(self):
        self.metrics = {
            'total_requests': 0,
            'total_latency': 0,
            'errors': 0,
            'success': 0
        }

    @traceable(name="monitored_request")
    def process_request(self, request_type: str, data: dict):
        """Process request with monitoring"""
        start_time = time.time()

        try:
            # Process based on type
            if request_type == "chat":
                result = agent_executor.invoke(data)
            else:
                result = {"output": "Unknown request type"}

            # Update metrics
            self.metrics['success'] += 1
            latency = time.time() - start_time
            self.metrics['total_latency'] += latency
            self.metrics['total_requests'] += 1

            return {
                "success": True,
                "result": result,
                "latency": latency
            }

        except Exception as e:
            self.metrics['errors'] += 1
            self.metrics['total_requests'] += 1

            return {
                "success": False,
                "error": str(e),
                "latency": time.time() - start_time
            }

    def get_stats(self):
        """Get performance statistics"""
        avg_latency = (self.metrics['total_latency'] / self.metrics['total_requests']
                      if self.metrics['total_requests'] > 0 else 0)

        success_rate = (self.metrics['success'] / self.metrics['total_requests'] * 100
                       if self.metrics['total_requests'] > 0 else 0)

        return {
            'total_requests': self.metrics['total_requests'],
            'success_rate': f"{success_rate:.1f}%",
            'avg_latency': f"{avg_latency:.2f}s",
            'errors': self.metrics['errors']
        }

# Test monitoring
monitor = PerformanceMonitor()

test_requests = [
    {"input": "Check chocolate cake inventory"},
    {"input": "Calculate price for 2 vanilla cakes"},
    {"input": "What is your refund policy?"}
]

print("📊 Running monitored requests...\n")

for req in test_requests:
    result = monitor.process_request("chat", req)
    print(f"✅ Request: {req['input'][:40]}...")
    print(f"   Latency: {result['latency']:.2f}s\n")

# Show statistics
stats = monitor.get_stats()
print("\n📈 Performance Statistics:")
for key, value in stats.items():
    print(f"   {key}: {value}")

📊 Running monitored requests...

✅ Request: Check chocolate cake inventory...
   Latency: 1.46s

✅ Request: Calculate price for 2 vanilla cakes...
   Latency: 1.48s

✅ Request: What is your refund policy?...
   Latency: 3.11s


📈 Performance Statistics:
   total_requests: 3
   success_rate: 100.0%
   avg_latency: 2.02s
   errors: 0


## 10. LangSmith Dashboard Overview

**In the LangSmith dashboard, you can:**

### 📊 Traces
- See every LLM call with inputs/outputs
- Token usage per call
- Latency breakdown
- Tool executions

### 📈 Analytics
- Request volume over time
- Average latency trends
- Error rates
- Cost tracking

### 🧪 Datasets
- Store test cases
- Version control
- Regression testing

### ⭐ Feedback
- User ratings
- Comments
- Track improvements

### 🎯 Experiments
- A/B test prompts
- Compare models
- Evaluate performance

**Go explore:** https://smith.langchain.com

## Summary: What We Built

### ✅ Session 4.1 Achievements:

1. **LangSmith Setup**: Configured tracing for all chains
2. **Custom Tracing**: Added metadata and tags
3. **Agent Tracing**: Monitored tool calls and reasoning
4. **User Feedback**: Captured ratings and comments
5. **Test Datasets**: Created evaluation suites
6. **Evaluations**: Automated quality testing
7. **Performance Monitoring**: Real-time metrics
8. **Complex Workflows**: Hierarchical tracing

### 🎯 Production Observability:

**Now you can:**
- ✅ See every step of agent execution
- ✅ Track token usage and costs
- ✅ Identify bottlenecks and errors
- ✅ Test changes with datasets
- ✅ Collect user feedback
- ✅ Monitor performance over time

### 🚀 Next: Notebook 4.2

We'll deploy BakeryAI with **LangServe**:
- Create REST API endpoints
- FastAPI integration
- Async handling
- Production deployment
- Load balancing