# Getting Started with Agentune Simulate

This notebook demonstrates how to use the Agentune Simulate library to create realistic customer-agent conversations using RAG (Retrieval-Augmented Generation) techniques.

## What you'll learn:
- Load conversation data from CSV files
- Set up vector stores for RAG-based simulation
- Run simulations with both in-memory and persistent storage
- Analyze simulation results

## Installation

First, install the required dependencies:

In [None]:
!pip install langchain-chroma pandas
!pip install agentune-simulate

## Import Required Libraries

In [None]:
import logging
import nest_asyncio
from pathlib import Path
import pandas as pd
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_chroma import Chroma
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from agentune.simulate.models import Outcomes
from agentune.simulate.rag import conversations_to_langchain_documents
from agentune.simulate.simulation.session_builder import SimulationSessionBuilder

# Import example utilities
from utils import load_conversations_from_csv, extract_outcomes_from_conversations

## Setup and API Key Configuration

This example uses OpenAI models, but any LangChain-compatible LLM can be supported. Configure your API key for the model provider you choose:

In [None]:
import os
import getpass

# Set up OpenAI API key
if not os.getenv("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

print("✓ API key configured")

In [None]:
# Configure basic logging to see simulation progress
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s')
logging.getLogger('httpx').setLevel(logging.WARNING)

# Fix asyncio event loop for Jupyter notebooks, to allow async code execution
nest_asyncio.apply()

print("✓ Logging configured")
print("✓ Asyncio event loop configured for Jupyter")

## Load and Explore Sample Data

**Important**: For any data format or source, you must convert your data to `Conversation` objects for the Agentune Simulate to work.

This example shows loading from CSV format:

In [None]:
# load_conversations_from_csv is an example utility function that converts CSV data to Conversation objects
# Example data is based on dhc2 dataset
# You need to implement a similar function for your data format and schema
conversations = load_conversations_from_csv("data/sample_conversations.csv")

print(f"Loaded {len(conversations)} conversations")
print(f"Sample conversation has {len(conversations[0].messages)} messages")
print(f"First message: {conversations[0].messages[0].content[:100]}...")

In [None]:
# Explore the data structure
df = pd.read_csv("data/sample_conversations.csv")
print("Dataset overview:")
print(f"- Total messages: {len(df)}")
print(f"- Unique conversations: {df['conversation_id'].nunique()}")
print(f"- Message distribution: {df['sender'].value_counts().to_dict()}")
print(f"- Outcome distribution: {df['outcome_name'].value_counts().to_dict()}")

df.head()

## Extract Outcomes for Simulation

Extract unique outcomes that our simulation will try to achieve:

In [None]:
# Extract unique outcomes from conversations
# Alternatively, you can define outcomes manually if you know them in advance
unique_outcomes = extract_outcomes_from_conversations(conversations)
outcomes = Outcomes(outcomes=tuple(unique_outcomes))

print(f"Found {len(unique_outcomes)} unique outcomes:")
for outcome in unique_outcomes:
    print(f"- {outcome.name}: {outcome.description}")

**Note**: You can also define outcomes manually if you know them in advance, instead of extracting them from existing conversations.

## Demo 1: Quick Simulation with InMemoryVectorStore

Let's start with a simple in-memory vector store for quick demonstration:

In [None]:
# Setup models - OpenAI models work well, other LangChain-compatible models can also be used
# Note: gpt-4o has been tested and performs best for realistic conversations
chat_model = ChatOpenAI(model="gpt-4o", temperature=0.7)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Use all conversations for vector store training (simplified approach)
# Advanced: you could split data to reserve some conversations for validation
training_conversations = conversations
print(f"Using {len(training_conversations)} conversations for vector store training")

In [None]:
# Create in-memory vector store
documents = conversations_to_langchain_documents(training_conversations)
vector_store = InMemoryVectorStore.from_documents(documents, embeddings)

print(f"✓ Created in-memory vector store with {len(documents)} documents")

In [None]:
# Build and run simulation session
# SimulationSessionBuilder provides an opinionated, easy-to-use interface for creating simulation sessions
# For more advanced configuration, see the documentation
session = SimulationSessionBuilder(
    default_chat_model=chat_model,
    outcomes=outcomes,
    vector_store=vector_store,
    max_messages=10
).build()

# Run simulation - provide real conversations to use as starting points
base_conversations = conversations[:5]  # Use first 5 conversations as starting points
result = await session.run_simulation(real_conversations=base_conversations)    # 5 simulated conversations will be generated

print("✓ Simulation completed!")

In [None]:
# Analyze results using built-in methods
print(result.generate_summary())

## Demo 2: Persistent Storage with Chroma

Chroma is a popular vector store for production use, allowing you to store vector data persistently and reuse it across sessions. Other LangChain-compatible vector stores can also be used.

For production use, you'll want persistent vector storage. Here's how to use Chroma:

In [None]:
# Create persistent Chroma vector store
persist_directory = "./chroma_db"

chroma_store = Chroma(
    collection_name="conversation_examples",
    embedding_function=embeddings,
    persist_directory=persist_directory
)

chroma_store.add_documents(documents)
print(f"✓ Added {len(documents)} documents to Chroma")

In [None]:
# Build session with Chroma vector store  
chroma_session = SimulationSessionBuilder(
    default_chat_model=chat_model,
    outcomes=outcomes,
    vector_store=chroma_store,
).build()

# Run simulation with Chroma
chroma_result = await chroma_session.run_simulation(real_conversations=base_conversations)

print("✓ Chroma simulation completed!")

In [None]:
# Compare results and save
print("=== CHROMA RESULTS ===")
print(chroma_result.generate_summary())

# Save results to file using built-in method
output_file = "chroma_simulation_results.json"
chroma_result.save_to_file(output_file)
print(f"\n✓ Results saved to {Path(output_file).absolute()}")

## Next Steps

Now that you've seen the basics, you can:

1. **Use your own data**: Convert your conversations to the expected format and load them using `load_conversations_from_csv()` or implement a custom loader
2. **Scale up**: Run larger simulations with more conversations and longer interactions
3. **Analysis**: Use the structured results for deeper analysis and visualization

### Converting your data to the expected format:

If you have tabular data, convert it to a DataFrame with these columns:
- `conversation_id`: Unique identifier for each conversation
- `sender`: Either "customer" or "agent"
- `content`: The message content
- `timestamp`: ISO format timestamp
- `outcome_name`: Name of the conversation outcome
- `outcome_description`: Description of the outcome

Then use `load_conversations_from_csv()` to convert to `Conversation` objects.

### Resources:
- [Documentation](../README.md)
- [Streamlit web interface](../streamlit/)