In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Get started with Vertex AI Memory Bank

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fagents%2Fagent_engine%2Fmemory_bank%2Fget_started_with_memory_bank.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| Authors |
| --- |
| [Kimberly Milam](https://github.com/klmilam) |
 |[Ivan Nardini](https://github.com/inardini) |

## Overview

This notebook is a hands-on guide to mastering **Vertex AI Memory Bank**, a service for building stateful, context-aware conversational AI agents. You will learn how to give your agent a persistent, long-term memory, allowing it to recall user preferences and past interactions across multiple sessions to provide truly personalized experiences. We will apply these concepts to a practical, real-world retail scenario: building a sophisticated assistant for an online fashion store.

By the end of this tutorial, you will not only understand the core concepts of Memory Bank but also know how to apply them to build an assistant that remembers preferences, recalls purchase history, and maintains context across conversations.

Here's a high-level overview of the steps we'll take:

* **Initial Setup**: We will begin with the fundamentals, configuring a new Memory Bank instance and learning how to create user sessions to store and retrieve conversation history.
* **Advanced Retrieval & Personalization**: We will explore advanced retrieval capabilities, leveraging similarity search to recall the most relevant memories and use them to personalize the agent's responses.
* **Memory Customization**: We will dive into domain-specific adaptation by defining custom memory topics and providing few-shot examples to improve the accuracy of data extraction.
* **Lifecycle Management**: Finally, we will address essential operational aspects like configuring Time-To-Live (TTL) for data retention and compliance, and managing the overall lifecycle of memories.

## Get started

### Install Google Gen AI SDK and other required packages

First, let's install the Vertex AI SDK. We're specifying a version greater than 1.111.0 to ensure we have all the latest Memory Bank features.

In [None]:
%pip install --upgrade --quiet google-cloud-aiplatform>=1.111.0

### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

import vertexai

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

client = vertexai.Client(project=PROJECT_ID, location=LOCATION)

### Import libraries

We're importing standard Python libraries and, importantly, several class-based types from the Vertex AI SDK.

You can configure Memory Bank using plain Python dictionaries, but using these dedicated classes (MemoryBankConfig, TtlConfig, etc.) is a best practice. It provides type safety and makes your code more readable.

In [None]:
import datetime
import os
import uuid
import warnings

from IPython.display import Markdown, display

warnings.filterwarnings("ignore")

from google.genai.types import Content, Part

# Import class-based types for Memory Bank
from vertexai import types

To make the code more readable, we're creating shorter aliases for these long class names. This is a common Python practice that helps keep our code clean and concise without sacrificing the benefits of using the typed classes.


In [None]:
# Basic configuration types
MemoryBankConfig = types.ReasoningEngineContextSpecMemoryBankConfig
SimilaritySearchConfig = (
    types.ReasoningEngineContextSpecMemoryBankConfigSimilaritySearchConfig
)
GenerationConfig = types.ReasoningEngineContextSpecMemoryBankConfigGenerationConfig

# Advanced configuration types
TtlConfig = types.ReasoningEngineContextSpecMemoryBankConfigTtlConfig
GranularTtlConfig = (
    types.ReasoningEngineContextSpecMemoryBankConfigTtlConfigGranularTtlConfig
)
CustomizationConfig = types.MemoryBankCustomizationConfig
MemoryTopic = types.MemoryBankCustomizationConfigMemoryTopic
ManagedMemoryTopic = types.MemoryBankCustomizationConfigMemoryTopicManagedMemoryTopic
CustomMemoryTopic = types.MemoryBankCustomizationConfigMemoryTopicCustomMemoryTopic
GenerateMemoriesExample = types.MemoryBankCustomizationConfigGenerateMemoriesExample
ConversationSource = (
    types.MemoryBankCustomizationConfigGenerateMemoriesExampleConversationSource
)
ConversationSourceEvent = (
    types.MemoryBankCustomizationConfigGenerateMemoriesExampleConversationSourceEvent
)
ExampleGeneratedMemory = (
    types.MemoryBankCustomizationConfigGenerateMemoriesExampleGeneratedMemory
)
ManagedTopicEnum = types.ManagedTopicEnum

## The basics of Vertex AI Memory Bank

Let's start with the fundamentals. We'll create a Memory Bank, add a conversation, and perform basic memory generation and retrieval.

### Create Agent Engine with Simple Configuration

The AgentEngine resource acts as the top-level container for your Memory Bank instance. To create one, we need to provide a configuration.

Here, MemoryBankConfig has two key parts:

1. `similarity_search_config`: This specifies the **embedding model** used for similarity searches. Choosing the right model is important. If you expect multilingual conversations, you should opt for a model like `text-multilingual-embedding-002`. For now, `text-embedding-005` is a great default.  
2. `generation_config`: This defines the **LLM** that will extract and consolidate memories from conversations. The default, `gemini-2.5-flash`, is a fast and capable model perfect for this task.

In [None]:
print("🧠 Creating simple memory configuration...\n")

basic_memory_config = MemoryBankConfig(
    # Which embedding model to use for similarity search
    similarity_search_config=SimilaritySearchConfig(
        embedding_model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/text-embedding-005"
    ),
    # Which LLM to use for extracting memories from conversations
    generation_config=GenerationConfig(
        model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/gemini-2.5-flash"
    ),
)

print("✅ Simple memory configuration created!")

Now, we create the AgentEngine resource. In this context, the AgentEngine acts as the top-level container for our Memory Bank instance. By default, Memory Bank is enabled when you create an Agent Engine. This call provisions the necessary backend infrastructure.

In [None]:
print("🛠️ Creating agent engine with basic configuration...\n")

agent_engine = client.agent_engines.create(
    config={"context_spec": {"memory_bank_config": basic_memory_config}}
)

agent_engine_name = agent_engine.api_resource.name
print("✅ Agent Engine created with basic configuration!")
print(f"   Resource Name: {agent_engine_name}")

### Create a Session and Store a Simple Conversation

A **Session** is a chronological log of a single interaction between a user and your agent. It's the raw material from which memories are made. A critical piece of information here is the user\_id. Memories are mapped to this ID, which allows the agent to recall information for a specific user across different sessions.

> Note: Using Vertex AI Agent Engine Session is not the only option supported. [Provide the source conversation directly in JSON format](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/memory-bank/generate-memories#json-format) if you're using a different session storage from Agent Engine Sessions.

In [None]:
print("💬 Creating a session for our customer...\n")

user_id = "customer_sarah_" + str(uuid.uuid4())[:4]

# Create a session
session = client.agent_engines.sessions.create(
    name=agent_engine_name,
    user_id=user_id,
    config={"display_name": f"Shopping session for {user_id}"},
)

session_name = session.response.name
print(f"✅ Session created: {session_name}")
print(f"   Customer: {user_id}")

This is the raw conversational data we'll use. It's a simple list of dictionaries, each representing a turn in the dialogue.

In [None]:
# Add a simple conversation
simple_conversation = [
    {
        "role": "user",
        "message": "Hi! I'm Sarah. I bought a navy blazer from you last month for $159 and it fits perfectly!",
    },
    {
        "role": "model",
        "message": "Hello Sarah! Wonderful to hear you're enjoying your navy blazer!",
    },
    {
        "role": "user",
        "message": "Yes! I wear size M in jackets but size L in sweaters. I prefer fitted jackets.",
    },
    {
        "role": "model",
        "message": "I've noted your size preferences - M for fitted jackets and L for sweaters.",
    },
    {
        "role": "user",
        "message": "I'm looking for a winter coat now. I only buy during sales though - never pay full price! Budget is $150-200.",
    },
    {
        "role": "model",
        "message": "I'll help you find winter coats on sale within your $150-200 budget.",
    },
]

Here, we loop through our conversation and append each turn as an event to the session we created earlier. This persists the conversation history, making it available for memory generation.

In [None]:
print("⬆️ Adding conversation to session...\n")

invocation_id = 0

for turn in simple_conversation:
    client.agent_engines.sessions.events.append(
        name=session_name,
        author=user_id,  # Required by Sessions
        invocation_id=str(invocation_id),  # Required by Sessions
        timestamp=datetime.datetime.now(
            tz=datetime.timezone.utc
        ),  # Required by Sessions
        config={
            "content": {"role": turn["role"], "parts": [{"text": turn["message"]}]}
        },
    )

    invocation_id += 1
    icon = "👤" if turn["role"] == "user" else "🤖"
    print(f"{icon} {turn['message']}")

print("\n✅ Conversation added to session!")

### Generate Memories from Conversation

Now let's see what memories are automatically extracted using the default configuration.

This is the core of memory generation. The generate method kicks off an asynchronous, long-running operation that performs two main steps:

1. **Extraction**: The generation model reads the conversation and extracts key facts. With the default configuration, it looks for information that matches pre-defined **Managed Topics** like `USER_PERSONAL_INFO` and `USER_PREFERENCES`.  
2. **Consolidation**: Memory Bank intelligently merges new facts with existing memories, avoiding duplicates and resolving contradictions.

> Note: The `wait_for_completion=True` flag is the default setting and makes this a blocking call, which is useful for this tutorial. In production, you would typically set it to `False` to run in the background.

In [None]:
print("🧠 Generating memories with default configuration...\n")

# Generate memories from the session
operation = client.agent_engines.memories.generate(
    name=agent_engine_name,
    vertex_session_source={"session": session_name},
    config={"wait_for_completion": True},
)

print("\n✅ Memories generated!")

The result of the generate operation is a list of generated memories, each with an associated action (CREATED, UPDATED, or DELETED). We'll loop through the response and use the get method to fetch the full text (fact) of each newly created memory.

In [None]:
if operation.response and operation.response.generated_memories:
    print(f"✅ Generated {len(operation.response.generated_memories)} memories:\n")

    for i, gen_memory in enumerate(operation.response.generated_memories, 1):
        if gen_memory.action != "DELETED" and gen_memory.memory:
            try:
                full_memory = client.agent_engines.memories.get(
                    name=gen_memory.memory.name
                )
                print(f"   {i}. {full_memory.fact}")
            except Exception as e:
                print(f"Could not retrieve memory: {e}")
else:
    print("No memories generated")

print("\n💡 Note: These memories were extracted using default managed topics")

### Simple Memory Retrieval

Let's retrieve all memories for our customer.
Now, let's retrieve the memories we just created. The simplest method is scope-based retrieval. A "scope" is a set of key-value pairs that defines a collection of memories. By providing {"user_id": user_id}, we are asking for all memories that exactly match this scope.

In [None]:
print(f"📚 Retrieving all memories for {user_id}...\n")

# Simple retrieval - get all memories
results = client.agent_engines.memories.retrieve(
    name=agent_engine_name, scope={"user_id": user_id}
)

all_memories = list(results)
print(f"Found {len(all_memories)} memories:\n")

for i, retrieved_memory in enumerate(all_memories, 1):
    print(f"{i}. {retrieved_memory.memory.fact}")

print("\n✅ Basic Memory Bank setup complete!")

## Advanced Retrieval and Personalization

Retrieving all memories is good, but retrieving the most relevant memories is great. This is where similarity search shines.

### **Similarity Search**

Now let's use similarity search to find only relevant memories for specific queries.

To make our similarity search more interesting, let's add more conversational turns to our session, which will generate a richer set of memories.


In [None]:
# Add more conversation to have richer memories
additional_conversation = [
    {
        "role": "user",
        "message": "Hi! I bought a black leather jacket from you last year for $299. It was perfect!",
    },
    {
        "role": "model",
        "message": "Great to hear you loved the black leather jacket Sarah! Let me find similar styles.",
    },
    {
        "role": "user",
        "message": "Also, remember I prefer shopping during sales. My shipping address is 123 Main St, San Francisco.",
    },
]

We append these new turns to the same session object as before.

In [None]:
print("💬 Adding more conversation...\n")

# Add each turn to the session
for turn in additional_conversation:
    client.agent_engines.sessions.events.append(
        name=session_name,
        author=user_id,  # Required by Sessions
        invocation_id=str(invocation_id),  # Required by Sessions
        timestamp=datetime.datetime.now(
            tz=datetime.timezone.utc
        ),  # Required by Sessions
        config={
            "content": {"role": turn["role"], "parts": [{"text": turn["message"]}]}
        },
    )

    invocation_id += 1
    icon = "👤" if turn["role"] == "user" else "🤖"
    print(f"{icon} {turn['message']}")

print("\n✅ Conversation added to session!")

Now, we run the generate process again. Memory Bank will process the *entire* conversation history in the session, extract new facts, and consolidate them with the memories that already exist.

In [None]:
print("🧠 Generating additional memories with default configuration...\n")

operation = client.agent_engines.memories.generate(
    name=agent_engine_name,
    vertex_session_source={"session": session_name},
    config={"wait_for_completion": True},
)

print("✅ Additional memories generated!")

Let's look at the new memories.

In [None]:
if operation.response and operation.response.generated_memories:
    print(f"✅ Generated {len(operation.response.generated_memories)} new memories:\n")

    for i, gen_memory in enumerate(operation.response.generated_memories, 1):
        if gen_memory.action != "DELETED" and gen_memory.memory:
            try:
                full_memory = client.agent_engines.memories.get(
                    name=gen_memory.memory.name
                )
                print(f"   {i}. {full_memory.fact}")
            except Exception as e:
                print(f"Could not retrieve memory: {e}")
else:
    print("No new memories generated")

print("\n💡 Note: These memories were extracted using default managed topics")

Now, instead of retrieving all memories, we provide a search_query. Memory Bank embeds this query and compares it to the embedded memory facts, returning the most similar ones.

* `top_k`: Limits the number of results returned.  
* `distance`: The Euclidean distance between the query and memory embedding. A smaller distance means higher relevance.

In [None]:
# Similarity search - find relevant memories for specific queries
search_queries = [
    "What's the customer's name?",
    "What are the customer's style preferences?",
    "What is the customer's budget?",
]

print("🔍 Testing similarity search:\n")

for query in search_queries:
    print(f'Query: "{query}"')

    # Similarity search with top_k parameter
    results = client.agent_engines.memories.retrieve(
        name=agent_engine_name,
        scope={"user_id": user_id},
        similarity_search_params={
            "search_query": query,
            "top_k": 2,  # Get top 2 most relevant
        },
    )

    memories = list(results)
    if memories:
        for mem in memories:
            distance = mem.distance if hasattr(mem, "distance") else "N/A"
            print(f"   → {mem.memory.fact}")
            print(f"     (Distance: {distance})")
    print()

### Use memories for personalization

This function demonstrates the real-world application of Memory Bank using the **Retrieval-Augmented Generation (RAG)** pattern.

1. **Retrieve**: First, we use similarity search to fetch memories relevant to the user's current query.  
2. **Augment**: We then insert these retrieved memories directly into the prompt we send to Gemini.  
3. **Generate**: Gemini uses this augmented context to produce a highly personalized, factually-grounded response that it couldn't have generated on its own.

In [None]:
def personalize_shopping_experience(customer_id, product_query):
    """Use memories to personalize product recommendations with Gemini."""
    # Initialize Gemini client
    from google import genai

    genai_client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

    print(f"🛍️ Personalizing experience for: {product_query}\n")

    # Retrieve relevant memories
    results = client.agent_engines.memories.retrieve(
        name=agent_engine_name,
        scope={"user_id": customer_id},
        similarity_search_params={"search_query": product_query, "top_k": 3},
    )

    memories = list(results)

    print("📋 Customer Context (from memories):")
    memory_context = []
    for mem in memories:
        print(f"   • {mem.memory.fact}")
        memory_context.append(mem.memory.fact)

    # Use Gemini to generate personalized recommendations based on memories
    print("\n🤖 Generating personalized recommendations with Gemini...\n")

    # Create prompt with memory context
    prompt = f"""You are a personal shopping assistant for an online fashion store.
    Based on the following customer information from their history, provide 3 personalized product recommendations.

    Customer History:
    {chr(10).join(f"- {fact}" for fact in memory_context)}

    Customer Query: {product_query}

    Please provide 3 specific product recommendations with:
    1. Product name and price
    2. Why it matches their preferences
    3. Any special offers or alerts relevant to their shopping behavior

    Format your response as a numbered list with clear explanations."""

    # Generate recommendations with Gemini
    response = genai_client.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt,
    )

    print("🎯 Personalized Recommendations from Gemini:")
    display(Markdown(response.text))

    return response.text


# Test personalization with Gemini
recommendations = personalize_shopping_experience(
    user_id, "winter jacket recommendations"
)

## Customizing Memory Extraction

The default managed topics are great for general-purpose agents, but for our specialized retail assistant, we can do better. We can teach Memory Bank about our specific domain using **Custom Topics** and **Few-Shot Examples**.

#### Custom Topics

Now let's customize what types of memories we want to extract by defining custom topics.

By default, Memory Bank uses pre-defined "managed topics" to identify and save meaningful information from conversations. These topics include:

* **Personal information (`USER_PERSONAL_INFO`)**: Key details about the user, like their name or hobbies.
* **User preferences (`USER_PREFERENCES`)**: The user's stated likes and dislikes.
* **Key conversation events (`KEY_CONVERSATION_DETAILS`)**: Important outcomes or milestones reached in the dialogue.
* **Explicit instructions (`EXPLICIT_INSTRUCTIONS`)**: Information the user directly asks the agent to remember or forget.

You can customize which information is saved by providing a `CustomizationConfig` to use a subset of these topics or to define your own custom topics.

##### Define Custom Memory Topics

While Memory Bank provides general-purpose **Managed Topics**, the real power comes from defining your own **Custom Topics** to tailor memory extraction to your specific domain. For our retail assistant, topics like purchase_history and size_information are far more useful than generic ones. Each custom topic has a label and a detailed description that instructs the extraction LLM on what to look for.

In [None]:
print("🎨 Defining custom topics...\n")

custom_topics = [
    # Keep some managed topics
    MemoryTopic(
        managed_memory_topic=ManagedMemoryTopic(
            managed_topic_enum=ManagedTopicEnum.USER_PERSONAL_INFO
        )
    ),
    MemoryTopic(
        managed_memory_topic=ManagedMemoryTopic(
            managed_topic_enum=ManagedTopicEnum.USER_PREFERENCES
        )
    ),
    # Add custom topics specific to e-commerce
    MemoryTopic(
        custom_memory_topic=CustomMemoryTopic(
            label="purchase_history",
            description="""Details about past purchases including product names,
                          prices, dates, and customer satisfaction with the products.""",
        )
    ),
    MemoryTopic(
        custom_memory_topic=CustomMemoryTopic(
            label="size_information",
            description="""Customer's clothing and shoe sizes for different types
                          of apparel (shirts, pants, jackets, shoes).""",
        )
    ),
    MemoryTopic(
        custom_memory_topic=CustomMemoryTopic(
            label="shopping_behavior",
            description="""Shopping patterns including preferred shopping times,
                          budget ranges, sale preferences, and brand loyalties.""",
        )
    ),
]

print("✅ Custom topics defined!")

##### Create customization configuration and Memory Bank config with customization

We now package our list of custom topics into a CustomizationConfig object. This object is then included in a new MemoryBankConfig. This tells our Agent Engine to use our specific topics for memory extraction instead of the defaults.


In [None]:
print(
    "🧠 Creating customization configuration and Memory Bank config with customization...\n"
)

# Create customization configuration
customization_config = CustomizationConfig(memory_topics=custom_topics)

# Create Memory Bank config with customization
custom_memory_config = MemoryBankConfig(
    similarity_search_config=SimilaritySearchConfig(
        embedding_model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/text-embedding-005"
    ),
    generation_config=GenerationConfig(
        model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/gemini-2.5-flash"
    ),
    # NEW: Add customization
    customization_configs=[customization_config],
)

print("✅ Customization configuration created!")

##### Create new agent engine with custom configuration

To see the effects of our new configuration clearly, we'll create a brand new `AgentEngine`. This ensures that its memory generation process is governed only by our new custom topics.

In [None]:
print("🛠️ Creating new agent engine with custom configuration...\n")

# Create new agent engine with custom configuration
custom_agent_engine = client.agent_engines.create(
    config={"context_spec": {"memory_bank_config": custom_memory_config}}
)

custom_engine_name = custom_agent_engine.api_resource.name
print("✅ Agent Engine created with custom topics!")
print("   Custom topics: purchase_history, size_information, shopping_behavior")

##### Check Memories change with Custom Topics

Before we created custom topics, Memory Bank used default managed topics. Now with custom topics, let's generate memories from the same conversation and see the difference\!

We need a new session associated with our new, custom-configured Agent Engine.

In [None]:
print("💬 Creating session for custom engine...")

# Create session for custom engine
custom_session = client.agent_engines.sessions.create(
    name=custom_engine_name,
    user_id=user_id,
    config={"display_name": f"Custom topics session for {user_id}"},
)
print(f"✅ Session created: {custom_session.response.name}")

We'll add the exact same conversation from the beginning of the tutorial to this new session. This allows for a direct, "apples-to-apples" comparison.

In [None]:
print("⬆️ Adding conversation to session...")

full_conversation = simple_conversation + additional_conversation

invocation_id = 1

# Add full conversation to the new session
for turn in full_conversation:
    client.agent_engines.sessions.events.append(
        name=custom_session.response.name,
        author=turn["role"],
        invocation_id=str(invocation_id),
        timestamp=datetime.datetime.now(tz=datetime.timezone.utc),
        config={
            "content": {"role": turn["role"], "parts": [{"text": turn["message"]}]}
        },
    )
    invocation_id += 1
print("✅ Conversation added to session!")

Now, we trigger memory generation on our new engine. The underlying extraction process will now be guided by the descriptions of our custom topics (purchase_history, size_information, etc.).

In [None]:
print("\n🧠 Generating memories WITH custom topics...")

# Generate new memories with custom topics
custom_operation = client.agent_engines.memories.generate(
    name=custom_engine_name,
    vertex_session_source={"session": custom_session.response.name},
    config={"wait_for_completion": True},
)
print("✅ Memories generated with Custom Topics!")
print()
print(
    "**⚠️ Note**: It might require few minutes for the new topics to be processed. Please be patient."
)

Let's check memories generated with custom topics.

In [None]:
# Return memories with custom topics
print("Memories with custom topics:")
if custom_operation.response and custom_operation.response.generated_memories:
    for i, gen_memory in enumerate(custom_operation.response.generated_memories, 1):
        if gen_memory.action != "DELETED" and gen_memory.memory:
            try:
                full_memory = client.agent_engines.memories.get(
                    name=gen_memory.memory.name
                )
                print(f"   {i}. {full_memory.fact}")
            except Exception as e:
                print(f"Could not retrieve memory: {e}")

By comparing the memories generated by the default engine with those from our custom-topic engine, you can see the direct impact of the customization. The new memories are more structured and aligned with our specific retail domain.


In [None]:
print("📊 Check How Custom Topics Change Memory Extraction...\n")

# Return memories with default topics
print("Memories with default topics:")
pager = client.agent_engines.memories.list(name=agent_engine_name)
all_memories = list(pager)
for i, retrieved_memory in enumerate(all_memories, 1):
    print(f"   {i}. {retrieved_memory.fact}")
print()

# Return memories with custom topics
print("Memories with custom topics:")
custom_pager = client.agent_engines.memories.list(name=custom_engine_name)
custom_memories = []
for page in custom_pager:
    custom_memories.append(page)
for i, retrieved_memory in enumerate(custom_memories, 1):
    print(f"   {i}. {retrieved_memory.fact}")

### Add Few-Shot Examples for Better Extraction

Few-shot examples help Memory Bank understand exactly how to extract memories for your custom topics.

#### Create few-shot examples

When using custom topics, you should **always** provide few-shot examples. These examples demonstrate the desired extraction behavior to the model. You provide a sample conversation and the exact memory fact you expect to be generated, which helps the model learn the nuances of your domain.

In [None]:
print("🎨 Defining few-shot examples...\n")

few_shot_examples = [
    GenerateMemoriesExample(
        conversation_source=ConversationSource(
            events=[
                ConversationSourceEvent(
                    content=Content(
                        role="user",
                        parts=[
                            Part(
                                text="I bought a blue denim jacket last month for $89 and love it!"
                            )
                        ],
                    )
                ),
                ConversationSourceEvent(
                    content=Content(
                        role="model",
                        parts=[
                            Part(
                                text="Great to hear you're enjoying your denim jacket!"
                            )
                        ],
                    )
                ),
            ]
        ),
        generated_memories=[
            ExampleGeneratedMemory(
                fact="Customer purchased a blue denim jacket for $89 last month and is satisfied with it"
            )
        ],
    ),
    GenerateMemoriesExample(
        conversation_source=ConversationSource(
            events=[
                ConversationSourceEvent(
                    content=Content(
                        role="user",
                        parts=[
                            Part(
                                text="I wear size L in shirts but size M in jackets because I like a fitted look"
                            )
                        ],
                    )
                )
            ]
        ),
        generated_memories=[
            ExampleGeneratedMemory(
                fact="Customer wears size L in shirts and size M in jackets, prefers fitted look for jackets"
            )
        ],
    ),
]

print("✅ Few-shot examples defined!")

#### Create customization with few-shot examples

Now we add our list of few-shot examples to the CustomizationConfig object, alongside our custom topics.


In [None]:
print("⚙️ Creating customization with few-shot examples...\n")

# Add examples to configuration
advanced_customization = CustomizationConfig(
    memory_topics=custom_topics,
    generate_memories_examples=few_shot_examples,  # NEW: Add examples
)

print("✅ Customization created with few-shot examples!")

#### Update the agent engine memory configuration with few-shot examples

We're now creating our most advanced memory configuration. It includes the base model settings, our custom topics, and the new few-shot examples.


In [None]:
print("🧠 Updating the agent engine with few-shot examples...\n")

# Update the agent engine memory bank config
advanced_memory_config = MemoryBankConfig(
    similarity_search_config=SimilaritySearchConfig(
        embedding_model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/text-embedding-005"
    ),
    generation_config=GenerationConfig(
        model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/gemini-2.5-flash"
    ),
    customization_configs=[advanced_customization],
)

print("✅ Customization added to engine configuration!")

#### Update the existing engine with the new memory configuration

Instead of creating a new engine, this time we'll use the update method. This allows us to apply a new configuration to an existing engine *without* losing any of the memories already stored within it. This is how you would evolve your agent's memory capabilities in a production environment.

In [None]:
# Update existing engine
updated_engine = client.agent_engines.update(
    name=custom_engine_name,
    config={"context_spec": {"memory_bank_config": advanced_memory_config}},
)

print("✅ Agent Engine updated with few-shot examples!")
print("   Memory Bank now better understands your domain-specific patterns")

#### Check Memories change with Few-shot Examples

To test the impact of our few-shot examples, we create another new session on the *updated* engine.

In [None]:
# Create a new session to test few-shot impact

print("🛠️ Creating new session for few-shot engine...")
fewshot_session = client.agent_engines.sessions.create(
    name=custom_engine_name,  # Using the updated engine with few-shot examples
    user_id=user_id,
    config={"display_name": f"Few-shot test session for {user_id}"},
)
print(f"✅ Session created: {fewshot_session.response.name}")

Once again, we add the same original conversation to this new session to ensure a fair test.


In [None]:
print("⬆️ Adding conversation to session...")

# Add original conversation to the new
invocation_id = 1

for turn in full_conversation:
    client.agent_engines.sessions.events.append(
        name=fewshot_session.response.name,
        author=turn["role"],
        invocation_id=str(invocation_id),
        timestamp=datetime.datetime.now(tz=datetime.timezone.utc),
        config={
            "content": {"role": turn["role"], "parts": [{"text": turn["message"]}]}
        },
    )
    invocation_id += 1

print("✅ Conversation added to session!")

We generate memories one last time. The extraction model is now guided by our custom topics and our specific examples.


In [None]:
print("\n🧠 Generating memories WITH few-shot examples...")

# Generate memories with few-shot configuration
fewshot_operation = client.agent_engines.memories.generate(
    name=custom_engine_name,
    vertex_session_source={"session": fewshot_session.response.name},
    config={"wait_for_completion": True},
)

print("✅ Memories Generated with Few-Shot Examples:\n")
print(
    "⚠️ Note: It might require few minutes for the new topics to be processed. Please be patient."
)

Comparing the initial memories to this final set demonstrates the power of few-shot examples. The extracted facts are now much more precise, granular, and useful for our retail agent, showing that the model has learned the specific patterns we demonstrated.


In [None]:
print("📊 Check How Few-shot Examples Change Memory Extraction...")

# Return memories with default topics
print("Memories without few-shot examples:")
pager = client.agent_engines.memories.list(name=agent_engine_name)
all_memories = list(pager)
for i, retrieved_memory in enumerate(all_memories, 1):
    print(f"   {i}. {retrieved_memory.fact}")
print()

# Return memories with custom topics
print("Memories with custom topics and few-shot examples:")
custom_pager = client.agent_engines.memories.list(name=custom_engine_name)
custom_memories = []
for page in custom_pager:
    custom_memories.append(page)
for i, retrieved_memory in enumerate(custom_memories, 1):
    print(f"   {i}. {retrieved_memory.fact}")

### Time-To-Live (TTL) & Memory Management

Managing data responsibly is crucial. Memory Bank provides Time-To-Live (TTL) settings to automatically expire and delete memories after their expiration time elapses. This is essential for privacy regulations (like GDPR), data hygiene, and cost management.

#### Add TTL for compliance

##### Define TTL configuration

 Here, we define a granular_ttl_config to set different retention periods for memories based on how they were created or updated.


In [None]:
print("⏱️ Define TTL configuration...")

# Define granular TTL for different operations
ttl_config = TtlConfig(
    granular_ttl_config=GranularTtlConfig(
        create_ttl="2592000s",  # 30 days for manually created memories
        generate_created_ttl="7776000s",  # 90 days for generated memories
        generate_updated_ttl="7776000s",  # 90 days for updated memories
    )
)

print("✅ TTL configuration defined!")

##### Update the agent engine memory configuration with TTL configuration

We now create our final configuration object, which includes our advanced customization (topics and examples) and the new TTL settings.

In [None]:
print("🧠 Updating the agent engine with TTL configuration...")

advanced_memory_config = MemoryBankConfig(
    # Basic configuration
    similarity_search_config=SimilaritySearchConfig(
        embedding_model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/text-embedding-005"
    ),
    generation_config=GenerationConfig(
        model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/gemini-2.5-flash"
    ),
    # Customization
    customization_configs=[advanced_customization],
    # NEW: TTL configuration
    ttl_config=ttl_config,
)

print("✅ TTL configuration added to engine configuration!")

##### Update the existing Agent Engine with TTL configuration

We use the update method one last time to apply our TTL policy to the Agent Engine. From this point forward, any new or updated memories in this engine will automatically have an expiration time set according to these rules.

In [None]:
print("🛠️ Updating the agent engine with TTL configuration...\n")

updated_engine = client.agent_engines.update(
    name=custom_engine_name,
    config={"context_spec": {"memory_bank_config": advanced_memory_config}},
)

print("✅ Agent Engine updated with TTL configuration!")
print("   Memory Bank now will retain memories for a limited time")

##### (Optional) Create memories with specific TTL

In addition to engine-level TTL policies, you can specify a per-memory TTL when you create it manually using memories.create. This overrides the engine's default. This is useful for short-lived data, like the contents of a shopping cart, which you might only want to remember for 7 days for a cart recovery campaign.

> Note: The `create` method has limited features. It does not perform memory consolidation and it might generate duplicate memories. For production applications, we recommend to [use `generate` method](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/memory-bank/generate-memories#consolidate-pre-extracted-memories) with pre-extracted memories.

In [None]:
print("\n📝 Creating memories with TTL...\n")

# Create a memory that will expire
temporary_memory = client.agent_engines.memories.create(
    name=custom_engine_name,
    fact="Customer has items in cart: 2 jackets worth $350",
    scope={"user_id": user_id},
    config={"ttl": "604800s"},  # 7 days - for cart recovery
)
print("✅ Created temporary memory (7-day TTL):")
print("   Customer has items in cart: 2 jackets worth $350")

Here's another example of manual creation, this time for a long-term fact like a customer's VIP status, which we want to retain for a full year.

In [None]:
# Create a longer-term memory
permanent_memory = client.agent_engines.memories.create(
    name=custom_engine_name,
    fact="Customer is a VIP member since 2024",
    scope={"user_id": user_id},
    config={"ttl": "31536000s"},  # 1 year - for loyalty
)

print("\n✅ Created long-term memory (1-year TTL):")
print("     Customer is a VIP member since 2024")

#### Memory Management Operations

Let's explore memory management: updating, listing, and deleting memories.

##### List memories

The list method allows you to fetch all memories stored within a Memory Bank instance. It returns a pager object, which you can iterate through to handle large numbers of memories efficiently.

In [None]:
# List memories using pager
pager = client.agent_engines.memories.list(name=custom_engine_name)
all_memories = list(pager)

print(f"Total memories in Memory Bank: {len(all_memories)}")

##### Get a specific memory

If you know the full resource name of a memory, you can fetch its complete content directly using the get method.

In [None]:
if temporary_memory:
    retrieved_memory = client.agent_engines.memories.get(
        name=temporary_memory.response.name
    )
    print(f"   Memory Resource Name: {retrieved_memory.name}")
    print(f"   Memory created in: {retrieved_memory.create_time}")
    print(f"   Memory updated in: {retrieved_memory.update_time}")
    print(f"   Memory expires in: {retrieved_memory.expire_time}")
    print(f"   Retrieved specific memory: {retrieved_memory.fact}")
    print(f"   Scope: {retrieved_memory.scope}")

##### (Optional) Delete a memory

Finally, you can permanently delete a specific memory by its resource name using the delete method.


In [None]:
client.agent_engines.memories.delete(name=temporary_memory.response.name)
print("Memory deleted!")

## Cleaning up

It's always a best practice in cloud development to clean up resources you no longer need to avoid incurring unexpected costs. This final cell deletes the AgentEngine resources we created throughout this tutorial.


In [None]:
delete_agent_engines = True

if delete_agent_engines:
    # Delete agent engines
    client.agent_engines.delete(name=agent_engine_name, force=True)
    client.agent_engines.delete(name=custom_engine_name, force=True)