# Contextual AI Multi-Turn vs Single-Turn Chat Testing

This notebook demonstrates the difference between multi-turn and single-turn agents in Contextual AI by creating two fresh agents.

**What we're testing:**
- **MultiTurnAgent**: Will be created with multi-turn enabled - should remember conversation context
- **SingleTurnAgent**: Will be created with multi-turn disabled - should NOT remember previous messages
- **Recurring conversations**: Can continue from a previous conversation using conversation_id

**Test approach:**
1. Create two new agents with different multi-turn settings
2. Tell both agents some simple facts (favorite color and pet names) 
3. Ask if they remember those facts
4. Test recurring conversation capability

**Expected results:**
- MultiTurnAgent should remember facts across messages
- SingleTurnAgent should forget facts between messages

In [2]:
%pip install contextual-client 

Collecting contextual-client
  Using cached contextual_client-0.8.0-py3-none-any.whl.metadata (16 kB)
Collecting matplotlib
  Downloading matplotlib-3.10.7-cp312-cp312-macosx_11_0_arm64.whl.metadata (11 kB)
Collecting anyio<5,>=3.5.0 (from contextual-client)
  Using cached anyio-4.11.0-py3-none-any.whl.metadata (4.1 kB)
Collecting distro<2,>=1.7.0 (from contextual-client)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from contextual-client)
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting pydantic<3,>=1.9.0 (from contextual-client)
  Downloading pydantic-2.12.2-py3-none-any.whl.metadata (85 kB)
Collecting sniffio (from contextual-client)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting typing-extensions<5,>=4.10 (from contextual-client)
  Using cached typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)
Collecting idna>=2.8 (from anyio<5,>=3.5.0->contextual-client)
  Using cached idn

In [3]:
# Import required libraries
import time
from datetime import datetime
from contextual import ContextualAI

print("Libraries imported successfully!")

Libraries imported successfully!


## Configuration and Agent Creation

Set up the API credentials and create two new agents - one with multi-turn enabled and one without.

In [None]:
# Configuration - API Key only (we'll create agents dynamically)
API_KEY = "<your key here"

# Initialize the Contextual AI client
client = ContextualAI(api_key=API_KEY)

print("Client initialized successfully!")
print("Ready to create agents...")

Client initialized successfully!
Ready to create agents...


In [5]:
# Placeholder datastore because an agent must have a datastore.
response = client.datastores.create(
    name="place-holder-datastore"
)
datastore_id = response.id

In [6]:
# Multi turn agent
multi_turn_agent = client.agents.create(
    name="MultiTurnAgent",
    datastore_ids=[datastore_id ],
    system_prompt="",
    multiturn_system_prompt="",
    filter_prompt = "",
    no_retrieval_system_prompt = "",
    suggested_queries = [
            "Hi! I want to tell you that my favorite color is blue and I have 3 cats named Whiskers, Mittens, and Shadow",
            "What is my favorite color?",
            "How many cats do I have and what are their names?"
    ],
    agent_configs={
        "global_config": {
            "enable_multi_turn": True,
            "enable_filter": False,
            "enable_rerank": False,
            "should_check_retrieval_need": True,
        },
        "reformulation_config": {
            "enable_query_expansion": False,
            "enable_query_decomposition": False,
            "query_expansion_prompt": (""),
        },
        "generate_response_config": {
            "model": 'vertex_ai/gemini-2.0-flash-lite'
        }
    }
)

In [7]:
# Single turn agent
single_turn_agent = client.agents.create(
    name="SingleTurnAgent",
    datastore_ids=[datastore_id ],
    system_prompt="",
    multiturn_system_prompt="",
    filter_prompt = "",
    no_retrieval_system_prompt = "",
    suggested_queries = [
            "Hi! I want to tell you that my favorite color is blue and I have 3 cats named Whiskers, Mittens, and Shadow",
            "What is my favorite color?",
            "How many cats do I have and what are their names?"
    ],
    agent_configs={
        "global_config": {
            "enable_multi_turn": False,
            "enable_filter": False,
            "enable_rerank": False,
            "should_check_retrieval_need": True,
        },
        "reformulation_config": {
            "enable_query_expansion": False,
            "enable_query_decomposition": False,
            "query_expansion_prompt": (""),
        },
        "generate_response_config": {
            "model": 'vertex_ai/gemini-2.0-flash-lite'
        }
    }
)

## Test Values Setup

Define the specific values we'll use to test memory. These are simple, concrete facts that are easy to verify.

In [8]:
# Test values - these are the facts we'll share and test for memory
FAVORITE_COLOR = "blue"
CAT_COUNT = "3"
CAT_NAMES = ["Whiskers", "Mittens", "Shadow"]

# Initial message that contains all the facts
INITIAL_MESSAGE = f"Hi! I want to tell you that my favorite color is {FAVORITE_COLOR} and I have {CAT_COUNT} cats named {', '.join(CAT_NAMES)}."

print("Test values defined:")
print(f"- Favorite color: {FAVORITE_COLOR}")
print(f"- Number of cats: {CAT_COUNT}")
print(f"- Cat names: {', '.join(CAT_NAMES)}")
print(f"\nInitial message: {INITIAL_MESSAGE}")

Test values defined:
- Favorite color: blue
- Number of cats: 3
- Cat names: Whiskers, Mittens, Shadow

Initial message: Hi! I want to tell you that my favorite color is blue and I have 3 cats named Whiskers, Mittens, Shadow.


## Testing Multi-Turn Agent

Now we'll test the multi-turn enabled agent. This agent should remember our conversation context across multiple messages.

In [9]:
# Step 1: Send initial message to multi-turn agent (no conversation_id yet)
# This is the first message, so we don't have a conversation_id yet
# The API will create one and return it to us

print("=" * 60)
print(" MULTI-TURN AGENT - MESSAGE 1: Sharing facts ")
print("=" * 60)

response_1 = client.agents.query.create(
    agent_id=multi_turn_agent.id,
    messages=[{"content": INITIAL_MESSAGE, "role": "user"}]
)

# Get the conversation_id from the response - this is key for multi-turn!
conversation_id = getattr(response_1, 'conversation_id', None)

print(f"USER: {INITIAL_MESSAGE}")
print(f"\nASSISTANT: {response_1.message.content}")
print(f"\nConversation ID received: {conversation_id}")

# Store the response for later validation
multi_turn_response_1 = response_1.message.content

 MULTI-TURN AGENT - MESSAGE 1: Sharing facts 
USER: Hi! I want to tell you that my favorite color is blue and I have 3 cats named Whiskers, Mittens, Shadow.

ASSISTANT: That's wonderful! I love blue too. And three cats? That sounds like a lot of fun! Whiskers, Mittens, and Shadow are great names. Thanks for sharing! 


Conversation ID received: 9beba569-b8e2-471a-8f99-bc121f32c216


In [10]:
# Step 2: Test if multi-turn agent remembers favorite color
# Now we use the conversation_id from the previous message
# This should allow the agent to remember our previous conversation

time.sleep(1)  # Brief pause between messages

print("\n" + "=" * 60)
print(" MULTI-TURN AGENT - MESSAGE 2: Testing color memory ")
print("=" * 60)

color_question = "What is my favorite color?"

response_2 = client.agents.query.create(
    agent_id=multi_turn_agent.id,
    messages=[{"content": color_question, "role": "user"}],
    conversation_id=conversation_id  # This is the key - using the same conversation_id!
)

print(f"USER: {color_question}")
print(f"\nASSISTANT: {response_2.message.content}")

# Store the response for later validation
multi_turn_response_2 = response_2.message.content

# Check if the response contains our expected value
if FAVORITE_COLOR.lower() in multi_turn_response_2.lower():
    print(f"\n✓ SUCCESS: Agent remembered favorite color ({FAVORITE_COLOR})")
    color_test_passed = True
else:
    print(f"\n✗ FAILED: Agent did not remember favorite color ({FAVORITE_COLOR})")
    color_test_passed = False


 MULTI-TURN AGENT - MESSAGE 2: Testing color memory 
USER: What is my favorite color?

ASSISTANT: Your favorite color is blue! You told me that earlier. 😊


✓ SUCCESS: Agent remembered favorite color (blue)


In [11]:
# Step 3: Test if multi-turn agent remembers cats
# Continue using the same conversation_id to test memory of cats

time.sleep(1)  # Brief pause between messages

print("\n" + "=" * 60)
print(" MULTI-TURN AGENT - MESSAGE 3: Testing cats memory ")
print("=" * 60)

cats_question = "How many cats do I have and what are their names?"

response_3 = client.agents.query.create(
    agent_id=multi_turn_agent.id,
    messages=[{"content": cats_question, "role": "user"}],
    conversation_id=conversation_id  # Same conversation_id again!
)

print(f"USER: {cats_question}")
print(f"\nASSISTANT: {response_3.message.content}")

# Store the response for later validation
multi_turn_response_3 = response_3.message.content

# Check if the response contains all our expected cat values
expected_cat_values = [CAT_COUNT] + CAT_NAMES
response_lower = multi_turn_response_3.lower()

found_cat_values = []
missing_cat_values = []

for value in expected_cat_values:
    if value.lower() in response_lower:
        found_cat_values.append(value)
    else:
        missing_cat_values.append(value)

if not missing_cat_values:
    print(f"\n✓ SUCCESS: Agent remembered all cat details")
    print(f"  Found: {found_cat_values}")
    cats_test_passed = True
else:
    print(f"\n✗ FAILED: Agent did not remember all cat details")
    print(f"  Found: {found_cat_values}")
    print(f"  Missing: {missing_cat_values}")
    cats_test_passed = False


 MULTI-TURN AGENT - MESSAGE 3: Testing cats memory 
USER: How many cats do I have and what are their names?

ASSISTANT: I am a language model, and I don't have any cats (or any pets at all!). You, on the other hand, have three cats named Whiskers, Mittens, and Shadow.


✗ FAILED: Agent did not remember all cat details
  Found: ['Whiskers', 'Mittens', 'Shadow']
  Missing: ['3']


## Testing Single-Turn Agent

Now we'll test the single-turn agent with the same questions. This agent should NOT remember previous messages.

In [12]:
# Step 4: Send initial message to single-turn agent
# We share the same facts with the single-turn agent
# Note: We don't use conversation_id here since it's single-turn

time.sleep(2)  # Pause before switching agents

print("\n" + "=" * 60)
print(" SINGLE-TURN AGENT - MESSAGE 1: Sharing facts ")
print("=" * 60)

response_single_1 = client.agents.query.create(
    agent_id=single_turn_agent.id,
    messages=[{"content": INITIAL_MESSAGE, "role": "user"}]
)

print(f"USER: {INITIAL_MESSAGE}")
print(f"\nASSISTANT: {response_single_1.message.content}")

# Store the response
single_turn_response_1 = response_single_1.message.content


 SINGLE-TURN AGENT - MESSAGE 1: Sharing facts 
USER: Hi! I want to tell you that my favorite color is blue and I have 3 cats named Whiskers, Mittens, Shadow.

ASSISTANT: That's wonderful! It's great to know your favorite color is blue. And it sounds like you have a lovely little feline family with Whiskers, Mittens, and Shadow. Thanks for sharing! 



In [13]:
# Step 5: Test if single-turn agent remembers favorite color
# This should FAIL because single-turn agents don't remember previous messages

time.sleep(1)  # Brief pause between messages

print("\n" + "=" * 60)
print(" SINGLE-TURN AGENT - MESSAGE 2: Testing color memory (should fail) ")
print("=" * 60)

response_single_2 = client.agents.query.create(
    agent_id=single_turn_agent.id,
    messages=[{"content": color_question, "role": "user"}]
    # Note: No conversation_id - each message is independent
)

print(f"USER: {color_question}")
print(f"\nASSISTANT: {response_single_2.message.content}")

# Store the response
single_turn_response_2 = response_single_2.message.content

# Check if the response contains our expected value (should NOT)
if FAVORITE_COLOR.lower() in single_turn_response_2.lower():
    print(f"\n✗ UNEXPECTED: Single-turn agent remembered favorite color ({FAVORITE_COLOR})")
    single_color_test_passed = False  # This is bad - it shouldn't remember
else:
    print(f"\n✓ EXPECTED: Single-turn agent correctly forgot favorite color")
    single_color_test_passed = True  # This is good - it should forget


 SINGLE-TURN AGENT - MESSAGE 2: Testing color memory (should fail) 
USER: What is my favorite color?

ASSISTANT: As a large language model, I do not have access to your personal information, including your favorite color. I cannot know what your favorite color is.


✓ EXPECTED: Single-turn agent correctly forgot favorite color


In [14]:
# Step 6: Test if single-turn agent remembers cats
# This should also FAIL because single-turn agents don't remember previous messages

time.sleep(1)  # Brief pause between messages

print("\n" + "=" * 60)
print(" SINGLE-TURN AGENT - MESSAGE 3: Testing cats memory (should fail) ")
print("=" * 60)

response_single_3 = client.agents.query.create(
    agent_id=single_turn_agent.id,
    messages=[{"content": cats_question, "role": "user"}]
    # Note: Still no conversation_id - each message is independent
)

print(f"USER: {cats_question}")
print(f"\nASSISTANT: {response_single_3.message.content}")

# Store the response
single_turn_response_3 = response_single_3.message.content

# Check if the response contains our cat values (should NOT)
response_lower = single_turn_response_3.lower()
single_found_cat_values = []

for value in expected_cat_values:
    if value.lower() in response_lower:
        single_found_cat_values.append(value)

if single_found_cat_values:
    print(f"\n✗ UNEXPECTED: Single-turn agent remembered some cat details: {single_found_cat_values}")
    single_cats_test_passed = False  # This is bad - it shouldn't remember
else:
    print(f"\n✓ EXPECTED: Single-turn agent correctly forgot all cat details")
    single_cats_test_passed = True  # This is good - it should forget


 SINGLE-TURN AGENT - MESSAGE 3: Testing cats memory (should fail) 
USER: How many cats do I have and what are their names?

ASSISTANT: I am a language model, and I do not have access to your personal information, including how many cats you have or their names. I cannot answer this question.


✓ EXPECTED: Single-turn agent correctly forgot all cat details


## Testing Recurring Conversation

Now we'll test if we can continue our conversation with the multi-turn agent using the same conversation_id from earlier.

In [15]:
# Step 7: Test recurring conversation with multi-turn agent
# Use the same conversation_id from our original multi-turn conversation
# This tests if we can "pick up where we left off"

time.sleep(2)  # Pause before continuing conversation

print("\n" + "=" * 60)
print(" RECURRING CONVERSATION - Testing full memory ")
print("=" * 60)

print(f"Continuing conversation with ID: {conversation_id}")

summary_question = "Can you summarize everything I told you about myself in our conversation?"

response_recurring = client.agents.query.create(
    agent_id=multi_turn_agent.id,
    messages=[{"content": summary_question, "role": "user"}],
    conversation_id=conversation_id  # Using the SAME conversation_id from the beginning!
)

print(f"USER: {summary_question}")
print(f"\nASSISTANT: {response_recurring.message.content}")

# Store the response
recurring_response = response_recurring.message.content

# Check if the response contains ALL our expected values
all_expected_values = [FAVORITE_COLOR] + [CAT_COUNT] + CAT_NAMES + ["cats"]  # Added "cats" as keyword
response_lower = recurring_response.lower()

recurring_found_values = []
recurring_missing_values = []

for value in all_expected_values:
    if value.lower() in response_lower:
        recurring_found_values.append(value)
    else:
        recurring_missing_values.append(value)

if not recurring_missing_values:
    print(f"\n✓ SUCCESS: Recurring conversation remembered everything")
    print(f"  Found all values: {recurring_found_values}")
    recurring_test_passed = True
else:
    print(f"\n✗ FAILED: Recurring conversation missing some details")
    print(f"  Found: {recurring_found_values}")
    print(f"  Missing: {recurring_missing_values}")
    recurring_test_passed = False


 RECURRING CONVERSATION - Testing full memory 
Continuing conversation with ID: 9beba569-b8e2-471a-8f99-bc121f32c216
USER: Can you summarize everything I told you about myself in our conversation?

ASSISTANT: Okay, here's a summary of what the user has told me:

*   **Favorite Color:** Blue
*   **Pets:** 3 cats
*   **Cat Names:** Whiskers, Mittens, and Shadow


✓ SUCCESS: Recurring conversation remembered everything
  Found all values: ['blue', '3', 'Whiskers', 'Mittens', 'Shadow', 'cats']


## Test Results Summary

Let's summarize all our test results and determine if multi-turn functionality is working correctly. The 2 use case can fail if the model responds with the numeral instead of the word  "three"

In [16]:
# Final Test Results Summary
print("=" * 60)
print(" FINAL TEST RESULTS ")
print("=" * 60)

# Collect all test results
all_tests = [
    ("Multi-turn agent remembers favorite color", color_test_passed),
    ("Multi-turn agent remembers cats", cats_test_passed),
    ("Single-turn agent forgets favorite color", single_color_test_passed),
    ("Single-turn agent forgets cats", single_cats_test_passed),
    ("Recurring conversation remembers everything", recurring_test_passed)
]

# Count passed tests
passed_tests = sum(test[1] for test in all_tests)
total_tests = len(all_tests)

print(f"\nTest Summary: {passed_tests}/{total_tests} tests passed\n")

# Display detailed results
for i, (test_name, passed) in enumerate(all_tests, 1):
    status = "PASS ✓" if passed else "FAIL ✗"
    print(f"  {i}. {test_name}: {status}")

print("\n" + "=" * 60)

# Overall assessment
if passed_tests == total_tests:
    print("🎉 SUCCESS: All tests passed! Multi-turn functionality is working correctly.")
elif passed_tests >= 3:  # If at least multi-turn tests pass
    print("⚠️  PARTIAL SUCCESS: Multi-turn agent is working, but some edge cases failed.")
else:
    print("❌ FAILURE: Multi-turn functionality is not working as expected.")

print("\nWhat each test validates:")
print("- Tests 1-2: Multi-turn agent should remember conversation context")
print("- Tests 3-4: Single-turn agent should NOT remember previous messages")
print("- Test 5: Conversation can be continued using conversation_id")
print("=" * 60)

 FINAL TEST RESULTS 

Test Summary: 4/5 tests passed

  1. Multi-turn agent remembers favorite color: PASS ✓
  2. Multi-turn agent remembers cats: FAIL ✗
  3. Single-turn agent forgets favorite color: PASS ✓
  4. Single-turn agent forgets cats: PASS ✓
  5. Recurring conversation remembers everything: PASS ✓

⚠️  PARTIAL SUCCESS: Multi-turn agent is working, but some edge cases failed.

What each test validates:
- Tests 1-2: Multi-turn agent should remember conversation context
- Tests 3-4: Single-turn agent should NOT remember previous messages
- Test 5: Conversation can be continued using conversation_id


## Cleanup (Optional)

If you want to delete the agents we created for this test, run the cell below.

In [18]:
# #Cleanup: Delete the test agents (optional)
# #Uncomment the lines below if you want to delete the agents we created

# print("Deleting test agents...")

# # Delete MultiTurnAgent
# try:
#     client.agents.delete(multi_turn_agent.id)
#     print(f"✓ MultiTurnAgent ({multi_turn_agent.id}) deleted")
# except Exception as e:
#     print(f"✗ Error deleting MultiTurnAgent: {e}")

# # Delete SingleTurnAgent  
# try:
#     client.agents.delete(single_turn_agent.id)
#     print(f"✓ SingleTurnAgent ({single_turn_agent.id}) deleted")
# except Exception as e:
#     print(f"✗ Error deleting SingleTurnAgent: {e}")

# # Delete datastore
# try:
#     client.datastores.delete(datastore_id)
#     print(f"✓ Datastore  ({datastore_id}) deleted")
# except Exception as e:
#     print(f"✗ Error deleting Datastore: {e}")

# print("Cleanup completed!")

print("Uncomment lines above to delete agents")

Uncomment lines above to delete agents
