# File Search Store Management for Rickbot

A notebook to experiment with the FileSearchStore and how it can be used to manage file search in the Rickbot Agent.

The best way to run this notebook is from Google Colab.

<a target="_blank" href="https://colab.research.google.com/github/derailed-dash/rickbot-adk/blob/main/notebooks/file_search_store.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

## Pre-Reqs and Notes

- The `file_search_stores` is a feature exclusive to the Gemini Developer API. 
  - It does not work with the Vertex AI API or the Gen AI SDK in Vertex AI mode.
  - Therefore: don't set env vars for `GOOGLE_CLOUD_LOCATION` or `GOOGLE_GENAI_USE_VERTEXAI` and do not initialise Vertex AI.
- Make sure you have an up-to-date version of the `google-genai` package installed. 
  - Versions older than 1.49.0 do not support the File Search Tool.
  - You can upgrade all packages using `uv sync --upgrade`.
  - Or just `google-genai` using `uv sync --upgrade-package google-genai`
  - Or, if using `pip`: `pip install --upgrade google-genai`.
  - You can add to your `pyproject.toml` file; since we don't explicitly need it outside 
- Add your Gemini API Key to Colab as a secret. Then you can retrieve it using `userdata.get("GEMINI_API_KEY")`

## Setup

In [None]:
import glob
import os
import time

from dotenv import load_dotenv
from google import genai
from google.genai import types
from pydantic import BaseModel


### Local Only

If running locally, setup the Google Cloud environment:

```bash
source scripts/setup-env.sh
```

Then to install the package dependencies into the virtual environment, use the `uv` tool:

1. From your agent's root directory, run `make install` to set up the virtual environment (`.venv`).
2. In this Jupyter notebook, select the kernel from the `.venv` folder to ensure all dependencies are available.

In [None]:
# Load env vars
if load_dotenv():
    print("Successfully loaded environment variables.")
else:
    print("Failed to load environment variables.")

GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    raise ValueError("GEMINI_API_KEY environment variable not set.")
else:
    print("Successfully loaded Gemini API key.")

MODEL = os.getenv("MODEL")
if not MODEL:
    print("Warning: MODEL environment variable not set.")
else:
    print(f"Successfully loaded model: {MODEL}")

### Or In Colab

In [None]:
%pip install -q -U "google-genai>=1.49.0"

In [None]:
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get("GEMINI_API_KEY")
os.environ["MODEL"] = userdata.get("MODEL")

### Client Initialisation

In [None]:
client = genai.Client()
STORE_NAME = "rickbot-dazbo-ref" # as per personalities.yaml

## Store Management

### View All Stores

In [24]:
try:
    for a_store in client.file_search_stores.list():
        print(a_store)
except Exception as e:
    print(f"Error listing stores (check creds?): {e}")

name='fileSearchStores/demofilestore-5fayifsabstz' display_name='demo-file-store' create_time=datetime.datetime(2026, 1, 9, 22, 0, 19, 996699, tzinfo=TzInfo(0)) update_time=datetime.datetime(2026, 1, 9, 22, 0, 19, 996699, tzinfo=TzInfo(0)) active_documents_count=1 pending_documents_count=None failed_documents_count=None size_bytes=5948
name='fileSearchStores/rickbotdazboref-kw1ir7goyfuq' display_name='rickbot-dazbo-ref' create_time=datetime.datetime(2026, 1, 9, 23, 41, 30, 121403, tzinfo=TzInfo(0)) update_time=datetime.datetime(2026, 1, 9, 23, 41, 30, 121403, tzinfo=TzInfo(0)) active_documents_count=1 pending_documents_count=None failed_documents_count=None size_bytes=7422743


### Retrieve the Store

Here's a utility function to retrieve the store(s) that match a given display name. Note that display name is not unique, so this function returns the first matching store.

In [None]:
def get_store(store_name: str):
    """Retrieve a store by its display name"""
    try:
        for a_store in client.file_search_stores.list():
            if a_store.display_name == store_name:
                return a_store
    except Exception as e:
        print(f"Error in get_store path: {e}")
    return None

### Create the Store (One Time)

Once you've created the store, save the store ID for use in your application.

In [None]:
if not get_store(STORE_NAME):
    file_search_store = client.file_search_stores.create(config={"display_name": STORE_NAME})
    print(f"Created store: {file_search_store.name}")
else:
    print(f"Store {STORE_NAME} already exists.")

### View the Store

We can interrogate a store and see what files have been uploaded to it.

In [23]:
file_search_store = get_store(STORE_NAME)
if not file_search_store:
    print(f"Store {STORE_NAME} not found.")
else:
    print(file_search_store)

    # List all documents in the store
    # The 'parent' argument is the resource name of the store
    docs = client.file_search_stores.documents.list(parent=file_search_store.name)
    try:
        doc_list = list(docs)
        print(f"Docs in {STORE_NAME}: {len(doc_list)}")

        if not doc_list:
            print("No documents found in the store.")
        else:
            for i, doc in enumerate(doc_list):
                section_heading = f"Document {i}:"
                print("-" * len(section_heading))
                print(section_heading)
                print("-" * len(section_heading))
                print(f"  Display name:{doc.display_name}")
                print(f"  ID: {doc.name}")
                print(f"  Metadata: {doc.custom_metadata}")
    except Exception as e:
        print(f"Error listing docs (might be empty): {e}")

name='fileSearchStores/rickbotdazboref-kw1ir7goyfuq' display_name='rickbot-dazbo-ref' create_time=datetime.datetime(2026, 1, 9, 23, 41, 30, 121403, tzinfo=TzInfo(0)) update_time=datetime.datetime(2026, 1, 9, 23, 41, 30, 121403, tzinfo=TzInfo(0)) active_documents_count=1 pending_documents_count=None failed_documents_count=None size_bytes=7422743
Docs in rickbot-dazbo-ref: 1
-----------
Document 0:
-----------
  Display name:Sequencing Cloud Migration to Reduce Cost: What to Migrate and When
  ID: fileSearchStores/rickbotdazboref-kw1ir7goyfuq/documents/sequencing-cloud-migration--bnrw4u8z5bf4
  Metadata: [CustomMetadata(
  key='title',
  string_value='Sequencing Cloud Migration to Reduce Cost: What to Migrate and When'
), CustomMetadata(
  key='file_name',
  string_value='8c - Sequencing Cloud Migration to Reduce Cost_ What to Migrate and When _ by Dazbo (Darren Lester) _ Google Cloud - Community _ Mar, 2025 _ Medium.pdf'
), CustomMetadata(
  key='author',
  string_value='Dazbo (Darren L

### Delete Store(s)


In [None]:
# First, point to the right store. For example:
store_to_delete = get_store("rickbot-adk-file-search-store")

# Delete the store
if store_to_delete:
    print(f"Deleting: {store_to_delete.name}")
    # Uncomment to delete
    # client.file_search_stores.delete(name=store_to_delete.name, config={'force': True})
else:
    print("Store not found.")

## Upload and Process Files

Now we need to place the files in a suitable local folder to upload to the store.

In [None]:
# UPLOAD_PATH = "/content/upload-files/"
UPLOAD_PATH = "../scratch/"

Create some utility classes and functions:

In [None]:
class DocumentMetadata(BaseModel):
    """Metadata for a document"""    
    title: str
    author: str
    abstract: str

def delete_doc(doc):
    """
    Delete document(s) from its file search store.
    Note that the doc already references its file search store.
    So we don't need to pass the file search store name.
    """
    print(f"♻️  Deleting duplicate: '{doc.display_name}' (ID: {doc.name})")
    client.file_search_stores.documents.delete(name=doc.name, config={"force": True})
    time.sleep(2)  # small throttle and allow propagation

def generate_metadata(file_name: str, temp_file) -> DocumentMetadata:
    """Generate metadata for a document"""

    print(f"Extracting metadata from {file_name}...")
    response = client.models.generate_content(
        model=MODEL,
        contents=[
            """Please extract title, author, and short abstract from this document. 
            Each value should be under 200 characters.

            Abstracts should be succinct and NOT include preamble text like `This document describes...`

            Example bad abstract: 
            Now I want to cover a key consideration that can potentially 
            save you more in future IT spend than any other decision you can make: 
            embracing open source as a core element of your cloud strategy.

            Example good abstract:
            How you can significantly reduce IT spend by embracing open source
            as a core component of your cloud strategy.

            Example bad abstract:
            This article discusses how you can design your cloud landing zone.

            Example good abstract:
            How to design your cloud landing zone according to best practices.
            """,
            temp_file,
        ],
        config={
            "response_mime_type": "application/json",
            "response_schema": DocumentMetadata,
        },
    )

    metadata: DocumentMetadata = response.parsed
    print(f"Title: {metadata.title}")
    print(f"Author: {metadata.author}")
    print(f"Abstract: {metadata.abstract}")

    return metadata

def upload_doc(file_path, file_search_store):
    """Upload a document to the file search store"""

    file_name = os.path.basename(file_path)

    print(f"Uploading {file_name} for metadata extraction...")
    temp_file = client.files.upload(file=file_path)

    # Verify file is active (ready for inference)
    while temp_file.state.name == "PROCESSING":
        print("Still uploading...", end="\r")
        time.sleep(2)
        temp_file = client.files.get(name=temp_file.name)

    if temp_file.state.name != "ACTIVE":
        raise RuntimeError(f"File upload failed with state: {temp_file.state.name}")

    # Now let's check if this is a replacement of an existing file
    # If so, we should delete the existing entry first
    # Iterate through all docs in the store
    for doc in client.file_search_stores.documents.list(parent=file_search_store.name):
        should_delete = False

        # Match by Display Name
        if doc.display_name == file_name:
            should_delete = True

        # Match by Custom Metadata
        # This catches docs where display_name was set to the Title
        elif doc.custom_metadata:
            for meta in doc.custom_metadata:
                if meta.key == "file_name" and meta.string_value == file_name:
                    should_delete = True
                    break

        if should_delete:
            delete_doc(doc)

    metadata = generate_metadata(file_name, temp_file)

    # Import the file into the file search store with custom metadata
    operation = client.file_search_stores.upload_to_file_search_store(
        file_search_store_name=file_search_store.name,
        file=file_path,
        config={
            "display_name": metadata.title,  # or we could determine the title
            # 'chunking_config' : chunking_config["chunking_config"],
            "custom_metadata": [
                {"key": "title", "string_value": metadata.title},
                {"key": "file_name", "string_value": file_name},
                {"key": "author", "string_value": metadata.author},
                {"key": "abstract", "string_value": metadata.abstract},
            ],
        },
    )

    # Wait until import is complete
    while not operation.done:
        time.sleep(5)
        print("Still importing...")
        operation = client.operations.get(operation)

    print(f"{file_name} successfully uploaded and indexed")

Now actually **upload and process our documents**:

In [None]:
file_search_store = get_store(STORE_NAME)
if file_search_store is None:
    print(f"Store {STORE_NAME} not found.")
else:
    print(f"Uploading files to {file_search_store.name}...")
    files_to_upload = glob.glob(f"{UPLOAD_PATH}/*")
    if files_to_upload:
        for file_path in files_to_upload:
            print(f"Uploading {file_path}")
            upload_doc(file_path, file_search_store)
        print("Upload complete.")
    else:
        print(f"No files found in {UPLOAD_PATH}")

## Verify with Query

Now that the data is uploaded, let's verify we can retrieve it using the File Search Tool.

In [None]:
# Retrieve the store again to be sure
store = get_store(STORE_NAME)
question = """Give me a brief 4 step plan to optimise migration to cloud, 
achieving the fastest ROI and lowest overall TCO"""

if store:
    print(f"Querying store: {store.name} ({store.display_name})")

    try:
        # Use the File Search Tool
        if hasattr(types, "FileSearch"):
            print("FileSearch tool config...")
            response = client.models.generate_content(
                model=MODEL,
                contents=question,
                config=types.GenerateContentConfig(
                    tools=[types.Tool(file_search=types.FileSearch(file_search_store_names=[store.name]))]
                ),
            )
            print("\nResponse:")
            print(response.text)
        else:
            print("types.FileSearch not found. Skipping in-notebook query verification.")

    except Exception as e:
        print(f"Query failed: {e}")

else:
    print("Store not found, cannot verify.")