# Project Session 1.1: Introduction to LangChain & BakeryAI Project Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Umx-ec4OzN63p7Xv--ZoopZ_Rxc1W_-_?usp=sharing)


## 🍰 Welcome to BakeryAI!

Throughout this course, we'll build **BakeryAI** - an intelligent customer service assistant for a bakery business. By the end of the course, you'll have a production-ready application that can:

- Answer customer questions about products
- Take orders conversationally
- Provide personalized recommendations
- Handle customer service inquiries
- Access company policies and procedures

### Session 1 Goals: Foundation

Today we'll build the foundation:
✅ Set up LangChain environment  
✅ Load and explore bakery data  
✅ Create a basic product information chatbot  
✅ Build conversational memory  

### The BakeryAI Data

We have:
- **Structured data**: CSV files with products, customers, orders
- **Unstructured data**: Company policies, handbooks, SOPs
- **Product catalogs**: Detailed information in various formats

## 1. Installation & Setup

In [None]:
# Install required packages
!pip install -q langchain langchain-openai langchain-core langchain-community
!pip install -q python-dotenv pandas

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.0/76.0 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m39.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.[0m[31m
[0m

In [None]:
from google.colab import userdata
import os

# Set OpenAI API key from Google Colab's user environment or default
def set_openai_api_key(default_key: str = "YOUR_API_KEY") -> None:
    """Set the OpenAI API key from Google Colab's user environment or use a default value."""
    #if not (userdata.get("OPENAI_API_KEY") or "OPENAI_API_KEY" in os.environ):
    os.environ["OPENAI_API_KEY"] = userdata.get("MDX_OPENAI_API_KEY") or default_key


set_openai_api_key()
#set_openai_api_key("sk-...")

In [None]:
!git clone https://github.com/IvanReznikov/mdx-langchain-conclave

Cloning into 'mdx-langchain-conclave'...
remote: Enumerating objects: 19, done.[K
remote: Counting objects: 100% (19/19), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 19 (delta 1), reused 19 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (19/19), 239.34 KiB | 15.96 MiB/s, done.
Resolving deltas: 100% (1/1), done.


In [None]:
import os
import pandas as pd
from langchain_openai import ChatOpenAI

# Initialize LLM
llm = ChatOpenAI(model="gpt-5-nano")

print("✅ Environment loaded successfully!")

✅ Environment loaded successfully!


## 2. 🍰 Exploring BakeryAI Data

Let's load and understand our bakery data.

In [None]:
import pandas as pd

try:
    cakes_df = pd.read_csv('/content/mdx-langchain-conclave/data/cake_descriptions.csv', encoding='cp1252')
    customers_df = pd.read_csv('/content/mdx-langchain-conclave/data/customers.csv', encoding='cp1252')
    orders_df = pd.read_csv('/content/mdx-langchain-conclave/data/orders.csv', encoding='cp1252')

    print("✅ Data loaded successfully!\n")
    print(f"📊 Cakes: {len(cakes_df)} products")
    print(f"👥 Customers: {len(customers_df)} customers")
    print(f"📦 Orders: {len(orders_df)} orders")

except FileNotFoundError:
    print("⚠️  Data files not found. Please ensure data files are in the 'data/' folder.")
    print("Creating sample data for demonstration...\n")

    # Sample data for demonstration
    cakes_df = pd.DataFrame({
        'Name': ['Chocolate Truffle Cake', 'Vanilla Bean Cake', 'Red Velvet Cake'],
        'Category': ['Chocolate', 'Vanilla', 'Specialty'],
        'Description': [
            'Rich chocolate cake with truffle filling',
            'Classic vanilla cake with bean specks',
            'Velvety red cake with cream cheese frosting'
        ],
        'Energy_kcal': [450, 380, 420],
        'Available': [True, True, True]
    })

✅ Data loaded successfully!

📊 Cakes: 22 products
👥 Customers: 50 customers
📦 Orders: 100 orders


## 3. Your First BakeryAI Query

Let's create a simple chatbot that can answer questions about our products.

In [None]:
from langchain_core.messages import SystemMessage, HumanMessage

# Create system prompt with product context
def create_bakery_context():
    """Create context from available cakes"""
    context = "Available Cakes:\n"
    for _, cake in cakes_df.iterrows():
        context += f"- {cake['Name']}: {cake['Description']}\n"
    return context

# System message for BakeryAI
system_prompt = f"""
You are BakeryAI, a helpful customer service assistant for a premium bakery.
You help customers learn about products, place orders, and answer questions.

{create_bakery_context()}

Be friendly, informative, and help customers find the perfect cake for their needs.
"""

print("System Prompt Created:")
print(system_prompt[:300] + "...")

System Prompt Created:

You are BakeryAI, a helpful customer service assistant for a premium bakery.
You help customers learn about products, place orders, and answer questions.

Available Cakes:
- Torta della Nonna Amore: A nostalgic Tuscan-inspired cake with creamy ricotta and lemon warmth
- Festiva della Sicilia: A zes...


In [None]:
# Test BakeryAI
def ask_bakery_ai(question):
    """Simple function to query BakeryAI"""
    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=question)
    ]
    response = llm.invoke(messages)
    return response.content

# Test queries
questions = [
    "What chocolate cakes do you have?",
    "Tell me about your Red Velvet Cake",
    "Which cake would be good for a birthday?"
]

for q in questions:
    print(f"\n🙋 Customer: {q}")
    print(f"🍰 BakeryAI: {ask_bakery_ai(q)}")
    print("-" * 80)


🙋 Customer: What chocolate cakes do you have?
🍰 BakeryAI: We do have several chocolate-forward options. Here are our chocolate cakes:

- Ferrari Redline Fudge: Dense chocolate fudge cake with a bold red mirror glaze
- ChocoCaramel Birthday Surprise: Rich chocolate with hidden caramel and a crunchy surprise
- Black Forest Cake: Classic chocolate base with cream and cherries
- Sacher Torte: Austrian dark chocolate cake with fruity jam filling
- Chocolate Truffle Cake: Deep, pure chocolate cake for chocolate lovers

Would you like more details on any of these, or help picking one for a specific occasion? Also, tell me your serving size and any dietary preferences, and I can tailor a recommendation.
--------------------------------------------------------------------------------

🙋 Customer: Tell me about your Red Velvet Cake
🍰 BakeryAI: Our Red Velvet Cake is a premium take on a Southern classic—moist, velvety, and perfectly balanced.

- Flavor and texture: A cocoa-kissed red velvet cake

## 4. Enhancing with Structured Data

Let's make responses more data-driven by pulling specific information.

In [None]:
def get_cake_info(cake_name):
    """Get detailed information about a specific cake"""
    cake = cakes_df[cakes_df['Name'].str.contains(cake_name, case=False, na=False)]
    if not cake.empty:
        return cake.iloc[0].to_dict()
    return None

def enhanced_bakery_response(question):
    """Enhanced response with structured data"""
    # Check if asking about specific cake
    for cake_name in cakes_df['Name']:
        if cake_name.lower() in question.lower():
            cake_info = get_cake_info(cake_name)
            if cake_info:
                enhanced_prompt = f"""
                Customer Question: {question}

                Specific Product Information:
                - Name: {cake_info['Name']}
                - Category: {cake_info['Category']}
                - Description: {cake_info['Description']}
                - Calories: {cake_info['Energy_kcal']} kcal

                Provide a helpful response using this information.
                """
                messages = [
                    SystemMessage(content="You are BakeryAI, a helpful bakery assistant."),
                    HumanMessage(content=enhanced_prompt)
                ]
                return llm.invoke(messages).content

    # Default response
    return ask_bakery_ai(question)

# Test enhanced version
print("🙋 Customer: Tell me more about the Chocolate Truffle Cake")
print(f"🍰 BakeryAI: {enhanced_bakery_response('Tell me more about the Chocolate Truffle Cake')}")

🙋 Customer: Tell me more about the Chocolate Truffle Cake
🍰 BakeryAI: Here’s a concise look at the Chocolate Truffle Cake:

- Name: Chocolate Truffle Cake
- Category: R
- Description: Deep, rich chocolate cake for purists
- Calories: 470 kcal per serving

What to expect:
- Flavor: Intense, pure chocolate experience designed for chocolate lovers.
- Best enjoyed with: A cup of coffee or espresso to enhance the richness. A scoop of vanilla ice cream or a light whipped cream can add contrast if you like.

Notes:
- Ingredients and allergen information aren’t listed here. If you have dietary restrictions, I can check or help you with alternatives.
- If you’d like, I can share available sizes, portions, or pull together a detailed ingredients list.

Would you like to know the full ingredient list, allergen details, or pricing/options for this cake?


## 5. Batch Processing for Product Descriptions

Generate marketing descriptions for all products.

In [None]:
# Generate marketing descriptions
def generate_marketing_copy(cake_name):
    prompt = f"Write a 2-sentence enticing marketing description for {cake_name}"
    return llm.invoke(prompt).content

# Generate for first 3 cakes
print("📝 Generated Marketing Copy:\n")
for cake in cakes_df['Name'].head(3):
    print(f"\n{cake}:")
    print(generate_marketing_copy(cake))

📝 Generated Marketing Copy:


Torta della Nonna Amore:
Indulge in Torta della Nonna Amore, a kiss of crema pasticcera cradled in a golden, flaky crust. Each slice, crowned with toasted pine nuts and a hint of lemon, delivers nostalgia, warmth, and irresistible Italian amore.

Festiva della Sicilia:
Dive into Festiva della Sicilia, where sun-soaked streets burst with music, color, and the irresistible scent of citrus, seafood, and almond pastries. Savor age-old flavors, wander through vibrant markets and timeless traditions, and let the island's warmth carry you into nights of dance, drama, and unforgettable memories.

Dubai Midnight Pistachio Fantasy:
Dubai Midnight Pistachio Fantasy seduces the senses with a lush blend of roasted pistachio, amber, and velvet vanilla that glows like a lantern-lit Dubai night. Let this intoxicating fragrance carry you into a dream of opulence and mystery, leaving a soft, unforgettable trail of midnight sweetness.


## 6. Streaming for Real-Time Responses

Improve user experience with streaming responses.

In [None]:
def stream_bakery_response(question):
    """Stream response to customer"""
    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=question)
    ]

    print("🍰 BakeryAI: ", end="", flush=True)
    for chunk in llm.stream(messages):
        print(chunk.content, end="", flush=True)
    print("\n")

# Test streaming
print("🙋 Customer: Can you recommend a cake for a wedding anniversary?\n")
stream_bakery_response("Can you recommend a cake for a wedding anniversary?")

🙋 Customer: Can you recommend a cake for a wedding anniversary?

🍰 BakeryAI: Absolutely—congrats on the anniversary! Here are some elegant options that pair beautifully with a celebration vibe. I’ve included what makes each a good fit for an anniversary, plus what you might expect in flavor.

- Velvet Dream Cloud
  - Why it’s great: light, cloud-like texture with a refined, creamy finish. Feels luxurious but not heavy—perfect for a milestone “you two” moment.
  - Flavor note: soft sponge with subtle creaminess.

- Opera Cake
  - Why it’s great: classic and sophisticated French style with refined layers. Great for a timeless, elegant anniversary vibe.
  - Flavor note: almond sponge, coffee buttercream, chocolate glaze in layered structure.

- Tiramisu Cake
  - Why it’s great: romantic and crowd-pleasing, with a familiar dessert charm that many couples love.
  - Flavor note: coffee-kissed layers, creamy mascarpone filling.

- Red Velvet Cake
  - Why it’s great: romantic red color, soft c

## 7. 📊 BakeryAI Analytics Helper

Use LLM to help analyze our data.

In [None]:
# Get data summary
data_summary = f"""
We have {len(cakes_df)} cakes in our catalog.
Categories: {cakes_df['Category'].unique().tolist()}
Average calories: {cakes_df['Energy_kcal'].mean():.0f} kcal
"""

analysis_prompt = f"""
You are a business analyst for BakeryAI.

Data Summary:
{data_summary}

Provide 3 actionable insights about our product catalog.
"""

insights = llm.invoke(analysis_prompt)
print("📊 Business Insights:")
print(insights.content)

📊 Business Insights:
Here are 3 actionable insights to optimize your cake catalog given: 22 SKUs, categories G and R, and an average of 399 kcal per cake.

1) Optimize category mix for higher margin and faster turnover
- What it implies: With 2 categories (G and R) and 22 SKUs, the mix and performance by category likely drive most revenue and margin.
- Actions:
  - Quantify category-level sales and gross margin (G vs. R) and identify which category is underperforming or overrepresented.
  - If one category dominates, either: (a) prune 1–2 low-velocity SKUs and reallocate shelf space, or (b) accelerate 2–3 new SKUs in the underrepresented category that fill gaps (flavors, formats, regional preferences).
  - Run a 6–8 week pilot to test the impact of the adjusted mix and bundle promotions to shift demand toward the target category.

2) Introduce a clearer calorie segmentation to broaden appeal
- What it implies: The catalog averages 399 kcal, which sits in a mid-to-indulgent range. This 

## 8. 🎯 Exercise 1: Build a Product Recommender

**Task**: Create a function that:
1. Takes customer preferences (e.g., "chocolate lover", "health conscious")
2. Analyzes our product catalog
3. Recommends the top 2 cakes with reasoning
4. Returns structured output

In [None]:
def recommend_cakes(customer_preference):
    """
    Recommend cakes based on customer preferences

    Args:
        customer_preference: Description of customer preferences

    Returns:
        dict: Recommendations with reasoning
    """
    # TODO: Implement this function
    # Hint: Pass cake catalog to LLM and ask for recommendations
    pass

# Test your function
# print(recommend_cakes("I love chocolate and want something rich"))
# print(recommend_cakes("Looking for something light and not too sweet"))

## 9. 🎯 Exercise 2: Order Intent Classifier

**Task**: Build a classifier that determines if a message is:
- A product inquiry
- An order attempt
- A complaint
- General question

In [None]:
def classify_customer_intent(message):
    """
    Classify the intent of a customer message

    Args:
        message: Customer message

    Returns:
        str: Intent category
    """
    # TODO: Implement intent classification
    pass

# Test messages
# test_messages = [
#     "What flavors of cake do you have?",
#     "I'd like to order a chocolate cake for tomorrow",
#     "My order arrived damaged",
#     "What are your opening hours?"
# ]
# for msg in test_messages:
#     print(f"Message: {msg}")
#     print(f"Intent: {classify_customer_intent(msg)}\n")

## Summary: What We Built

### ✅ Session 1.1 Achievements:

1. **Environment Setup**: LangChain + BakeryAI data loaded
2. **Basic Chatbot**: Answers product questions
3. **Data Integration**: Pulls structured information from CSV
4. **Enhanced Responses**: Context-aware product information
5. **Streaming**: Real-time customer interaction

### 🚀 BakeryAI Progress: 20%

```
[████░░░░░░░░░░░░░░░░] 20%
```

### Next Steps:

In the remaining Session 1 notebooks, we'll add:
- **Notebook 1.2**: Multi-provider support & embeddings for semantic search
- **Notebook 1.3**: Prompt templates for consistent customer interactions
- **Notebook 1.4**: Conversation memory to remember customer context

**🎯 End Goal**: A complete customer service chatbot that remembers conversations, searches products semantically, and handles multiple interaction types!