# Gemini Document Verification Demo

This notebook demonstrates the core Gemini AI functionalities used in the Content Verification Tool:

1. **Creating a Corpus** (File Search Store)
2. **Uploading Reference Documents** with AI-generated metadata
3. **Querying the Corpus** for information
4. **Verification Layer** - verifying document chunks against the corpus

## Prerequisites

```bash
pip install google-genai python-dotenv
```

## API Key Setup (choose one method)

**Option 1: .env file** (recommended)
```bash
echo "GEMINI_API_KEY=your_api_key_here" >> .env
```

**Option 2: Environment variable**
```bash
export GEMINI_API_KEY="your_api_key_here"
```

**Option 3: Set directly in notebook** (see next cell)

## Setup: Import Libraries and Initialize Client

In [1]:
import os
import time
import json
from pathlib import Path
from dotenv import load_dotenv
from google import genai
from google.genai import types

# Load environment variables from .env file (if it exists)
load_dotenv()

# Get API key from .env, environment variable, or set manually below
api_key = os.getenv("GEMINI_API_KEY")

# OPTION 3: Set API key manually (uncomment and replace with your key)
# api_key = "your_api_key_here"

if not api_key:
    raise ValueError(
        "GEMINI_API_KEY not found. Please set it using one of these methods:\n"
        "  1. Create a .env file with: GEMINI_API_KEY=your_key\n"
        "  2. Set environment variable: export GEMINI_API_KEY=your_key\n"
        "  3. Uncomment the line above and set api_key directly"
    )

# Initialize Gemini client
client = genai.Client(api_key=api_key)
print("‚úì Gemini client initialized successfully")

‚úì Gemini client initialized successfully


## Part 1: Creating a Corpus (File Search Store)

A **File Search Store** acts as a corpus - a searchable knowledge base of reference documents.

Key features:
- Automatically chunks and indexes documents
- Generates embeddings for semantic search
- Supports metadata filtering
- **Free** storage and query-time embedding (only pay for initial indexing)

In [2]:
# Create a File Search store (corpus)
store_name_display = f"Demo Corpus - {int(time.time())}"

print(f"Creating File Search store: {store_name_display}")
store = client.file_search_stores.create(
    config={'display_name': store_name_display}
)

print(f"\n‚úì Store created successfully!")
print(f"  Store ID: {store.name}")
print(f"  Display Name: {store.display_name}")

# Save store name for later use
CORPUS_STORE_NAME = store.name

Creating File Search store: Demo Corpus - 1763268225

‚úì Store created successfully!
  Store ID: fileSearchStores/demo-corpus-1763268225-icl6yo5kvmtt
  Display Name: Demo Corpus - 1763268225


## Part 2: Uploading Documents with AI Metadata

We'll:
1. Create a sample reference document
2. Use **Gemini Flash Lite** to generate metadata (summary, keywords, document type)
3. Upload to the File Search store with metadata

In [3]:
# Create a sample reference document
sample_doc_content = """SERVICE AGREEMENT

This Service Agreement ("Agreement") is entered into on January 15, 2024, between:

Company A ("Provider") - 123 Main Street, San Francisco, CA 94102
Company B ("Client") - 456 Market Street, San Francisco, CA 94103

TERMS AND CONDITIONS:

1. Services: Provider agrees to deliver software development services as specified in the Statement of Work.

2. Payment Terms: Client agrees to pay $150,000 for the services, payable in three installments:
   - $50,000 upon signing
   - $50,000 at project midpoint
   - $50,000 upon completion

3. Timeline: The project shall commence on February 1, 2024, and be completed by June 30, 2024.

4. Intellectual Property: All work product shall become the property of the Client upon final payment.

5. Confidentiality: Both parties agree to maintain confidentiality of proprietary information.

Signed:
John Smith, CEO, Company A
Jane Doe, CEO, Company B
"""

# Save to temporary file
import tempfile
temp_file = tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False, encoding='utf-8')
temp_file.write(sample_doc_content)
temp_file.close()

print("‚úì Sample reference document created")
print(f"  Content preview: {sample_doc_content[:100]}...")

‚úì Sample reference document created
  Content preview: SERVICE AGREEMENT

This Service Agreement ("Agreement") is entered into on January 15, 2024, between...


### Step 2.1: Generate Metadata with Gemini Flash Lite

We use **gemini-2.5-flash-lite** for fast, cost-effective metadata generation.

In [4]:
from pydantic import BaseModel, Field

GEMINI_METADATA_MODEL = "gemini-2.5-flash-lite"
CASE_CONTEXT = (
    "This is a contract verification case for software development services between two companies."
)
METADATA_PROMPT = f"""Analyze this document in the context of: {CASE_CONTEXT}

Provide a JSON response with the following fields:
- summary: A 2-3 sentence summary of the document
- contextualization: How this document relates to the case context
- document_type: The type of document (e.g., contract, invoice, receipt, report)
- keywords: A list of 5-10 key terms or concepts from the document

Return only valid JSON, no markdown formatting."""

class DocumentMetadata(BaseModel):
    summary: str = Field(description="2-3 sentence summary of the document")
    contextualization: str = Field(description="Relationship to the provided case context")
    document_type: str = Field(description="Document classification (e.g., contract, invoice)")
    keywords: list[str] = Field(description="Key concepts extracted from the document")

metadata: dict[str, str | list[str]] | None = None
uploaded_file = None

try:
    print("Uploading file to Gemini for metadata extraction...")
    uploaded_file = client.files.upload(file=temp_file.name)

    while uploaded_file.state == "PROCESSING":
        print("  Processing upload...")
        time.sleep(1)
        uploaded_file = client.files.get(name=uploaded_file.name)

    print(f"‚úì File processed successfully: {uploaded_file.state}")

    print("Requesting structured metadata response...")
    response_best = client.models.generate_content(
        model=GEMINI_METADATA_MODEL,
        contents=[uploaded_file, METADATA_PROMPT],
        config=types.GenerateContentConfig(
            response_mime_type="application/json",
            response_schema=DocumentMetadata,
        ),
    )

    print(f"Raw response length: {len(response_best.text)} characters")
    has_parsed = hasattr(response_best, "parsed") and response_best.parsed is not None
    print(f"Parsed attribute available: {has_parsed}")

    metadata_model = (
        response_best.parsed
        if has_parsed
        else DocumentMetadata(**json.loads(response_best.text))
    )
    metadata = metadata_model.model_dump()

    print("‚úì Metadata parsed, validated, and serialized to dict")
    print(f"  Document Type: {metadata['document_type']}")
    summary_preview = (
        metadata["summary"][:100] + "..."
        if len(metadata["summary"]) > 100
        else metadata["summary"]
    )
    keywords_preview = ", ".join(metadata["keywords"][:5]) if metadata["keywords"] else "n/a"
    print(f"  Summary Preview: {summary_preview}")
    print(f"  Keywords: {keywords_preview}")
finally:
    if uploaded_file is not None:
        print("Cleaning up uploaded file from Gemini File API...")
        client.files.delete(name=uploaded_file.name)

print("Metadata dictionary ready for downstream cells.")

Uploading file to Gemini for metadata extraction...
‚úì File processed successfully: FileState.ACTIVE
Requesting structured metadata response...
Raw response length: 907 characters
Parsed attribute available: True
‚úì Metadata parsed, validated, and serialized to dict
  Document Type: Service Agreement
  Summary Preview: This Service Agreement outlines the terms for software development services between Company A (Provi...
  Keywords: Service Agreement, Software Development, Company A, Company B, Payment Terms
Cleaning up uploaded file from Gemini File API...
Metadata dictionary ready for downstream cells.


### Step 2.2: Upload to File Search Store with Metadata

In [5]:
# Create custom metadata for File Search
custom_metadata = [
    types.CustomMetadata(
        key="summary",
        string_value=metadata['summary'][:256]  # Limit to 256 chars (API max)
    ),
    types.CustomMetadata(
        key="document_type",
        string_value=metadata['document_type']
    ),
    types.CustomMetadata(
        key="keywords",
        string_list_value=types.StringList(values=metadata['keywords'][:10])
    )
]

# Upload file directly to File Search store
# Note: This SDK version expects a file path, not a File object
print("\nAdding file to File Search store...")
operation = client.file_search_stores.upload_to_file_search_store(
    file_search_store_name=CORPUS_STORE_NAME,
    file=temp_file.name,  # Pass the file path
    config=types.UploadToFileSearchStoreConfig(
        custom_metadata=custom_metadata,
        display_name='Service Agreement - Company A & B'
    )
)

print(f"Upload operation started: {operation.name}")

# Wait for indexing (simplified - just wait without polling)
print("Indexing document (this may take a few seconds)...")
time.sleep(10)  # Give it time to index

print("\n‚úì Document upload initiated!")
print(f"  Store: {CORPUS_STORE_NAME}")
print(f"  Operation: {operation.name}")
print("\nNote: The file is being indexed in the background.")
print("It should be available for queries within 30-60 seconds.")

# Clean up temp file
Path(temp_file.name).unlink()
print("\n‚úì Temporary file cleaned up")


Adding file to File Search store...
Upload operation started: fileSearchStores/demo-corpus-1763268225-icl6yo5kvmtt/upload/operations/service-agreement-company-a-8hnxmbbox10g
Indexing document (this may take a few seconds)...

‚úì Document upload initiated!
  Store: fileSearchStores/demo-corpus-1763268225-icl6yo5kvmtt
  Operation: fileSearchStores/demo-corpus-1763268225-icl6yo5kvmtt/upload/operations/service-agreement-company-a-8hnxmbbox10g

Note: The file is being indexed in the background.
It should be available for queries within 30-60 seconds.

‚úì Temporary file cleaned up


## Part 3: Querying the Corpus

Now that we have a corpus with indexed documents, let's query it using Gemini's File Search capability.

In [6]:
# Define queries to test
queries = [
    "What is the payment amount mentioned in the contract?",
    "When does the project start and end?",
    "Who are the parties involved in this agreement?",
    "What are the payment terms?"
]

# Configure File Search tool
file_search_tool = types.Tool(
    file_search=types.FileSearch(
        file_search_store_names=[CORPUS_STORE_NAME]
    )
)

print("\n" + "=" * 80)
print("QUERYING THE CORPUS")
print("=" * 80)

for i, query in enumerate(queries, 1):
    print(f"\nüìù Query {i}: {query}")
    print("-" * 80)
    
    # Query with File Search
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=query,
        config=types.GenerateContentConfig(
            tools=[file_search_tool]
        )
    )
    
    print(f"\nüí° Answer: {response.text}\n")
    
    # Show grounding sources if available
    if hasattr(response, 'candidates') and response.candidates:
        candidate = response.candidates[0]
        if hasattr(candidate, 'grounding_metadata') and candidate.grounding_metadata:
            if hasattr(candidate.grounding_metadata, 'grounding_chunks'):
                chunks = candidate.grounding_metadata.grounding_chunks
                if chunks:
                    print(f"üìö Sources ({len(chunks)} citations):")
                    for j, chunk in enumerate(chunks[:3], 1):  # Show first 3
                        if hasattr(chunk, 'document') and chunk.document:
                            title = getattr(chunk.document, 'title', 'Document')
                            print(f"  {j}. {title}")
                        if hasattr(chunk, 'content') and hasattr(chunk.content, 'text'):
                            excerpt = chunk.content.text[:150]
                            print(f"     \"{excerpt}...\"")


QUERYING THE CORPUS

üìù Query 1: What is the payment amount mentioned in the contract?
--------------------------------------------------------------------------------

üí° Answer: The contract, a Service Agreement between Company A and Company B, states that the client agrees to pay $150,000 for the services. This payment is to be made in three installments of $50,000 each: one upon signing, one at the project midpoint, and the final one upon completion.

üìö Sources (1 citations):

üìù Query 2: When does the project start and end?
--------------------------------------------------------------------------------

üí° Answer: The project is scheduled to begin on February 1, 2024, and is expected to conclude by June 30, 2024.

üìö Sources (1 citations):

üìù Query 3: Who are the parties involved in this agreement?
--------------------------------------------------------------------------------

üí° Answer: The parties involved in this agreement are Company A, which is referred 

## Part 4: Verification Layer

The **verification layer** checks whether specific claims/chunks from a target document are supported by the corpus.

This is the core functionality of the Document Verification Assistant.

### Step 4.1: Define Chunks to Verify

These are statements from a document we want to verify against our corpus.

In [7]:
# Define chunks to verify
chunks_to_verify = [
    {
        "page": 1,
        "item": "1",
        "text": "The contract was signed on January 15, 2024."
    },
    {
        "page": 1,
        "item": "2",
        "text": "Client agrees to pay $150,000 for the services."
    },
    {
        "page": 1,
        "item": "3",
        "text": "The project timeline is from February 1 to June 30, 2024."
    },
    {
        "page": 1,
        "item": "4",
        "text": "The parties agree to a 90-day warranty period."  # FALSE - not in document
    },
    {
        "page": 1,
        "item": "5",
        "text": "All work product becomes the property of the Client upon final payment."
    }
]

print(f"üìã Defined {len(chunks_to_verify)} chunks for verification")
for chunk in chunks_to_verify:
    print(f"  ‚Ä¢ Page {chunk['page']}, Item {chunk['item']}: {chunk['text'][:60]}...")

üìã Defined 5 chunks for verification
  ‚Ä¢ Page 1, Item 1: The contract was signed on January 15, 2024....
  ‚Ä¢ Page 1, Item 2: Client agrees to pay $150,000 for the services....
  ‚Ä¢ Page 1, Item 3: The project timeline is from February 1 to June 30, 2024....
  ‚Ä¢ Page 1, Item 4: The parties agree to a 90-day warranty period....
  ‚Ä¢ Page 1, Item 5: All work product becomes the property of the Client upon fin...


### Step 4.2: Verify Each Chunk Against Corpus

We use **Gemini 2.5 Flash** with File Search to:
1. Check if the content is supported by the corpus
2. Assign a confidence score (1-10)
3. Provide source citations
4. Explain the reasoning

In [None]:
"""
FIXED VERIFICATION CODE - Replace cell-19 in gemini_demo.ipynb with this

ROOT CAUSE IDENTIFIED:
Using `response_schema` (Pydantic) with tools (File Search) causes the model
to BYPASS the File Search tool entirely! This is why grounding citations are
never returned.

SOLUTION:
- Remove `response_schema` parameter
- Use `response_mime_type="application/json"` ONLY
- Manually parse and validate the JSON response
"""

import json
from pydantic import BaseModel, Field, ValidationError

# Define Pydantic schema for validation (NOT for API)
class VerificationResult(BaseModel):
    verified: bool = Field(description="True if content is found/supported in reference docs")
    verification_score: int = Field(description="Confidence level 1-10", ge=1, le=10)
    verification_source: str = Field(description="Citation with document name and location")
    verification_note: str = Field(description="Brief explanation of reasoning")

print("\n" + "=" * 80)
print("VERIFICATION LAYER - Verifying Chunks Against Corpus (FIXED)")
print("=" * 80)

verification_results = []

for chunk in chunks_to_verify:
    print(f"\n{'='*80}")
    print(f"üìÑ Page {chunk['page']}, Item {chunk['item']}")
    print(f"üìù Chunk: \"{chunk['text']}\"")
    print(f"{'='*80}")

    # Build verification prompt with explicit JSON schema
    case_context = """Service Agreement between Company A and Company B."""

    verification_prompt = f"""You are a document verification assistant with access to reference documents.

## CONTEXT: 

{case_context}

## TASK:

Verify if the following statement is supported by the reference documents.

## STATEMENT:
"{chunk['text']}"

INSTRUCTIONS:
1. Search the reference documents for information about this statement
2. If you find supporting evidence, mark verified=true with high confidence (7-10)
3. If you find contradicting evidence, mark verified=false and explain
4. If you find no relevant information, mark verified=false with low confidence (1-3)

REQUIRED JSON OUTPUT FORMAT:
{{
  "verified": boolean,
  "verification_score": integer (1-10),
  "verification_source": "citation or 'No match found'",
  "verification_note": "brief explanation"
}}

Provide ONLY the JSON object, no other text."""

    # Verify with Gemini Flash + File Search
    # CRITICAL: Do NOT use response_schema - it disables File Search!
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=verification_prompt,
        config=types.GenerateContentConfig(
            temperature=0.1,
            # response_mime_type="application/json",  # Forces JSON format
            # NO response_schema! This is what breaks File Search!
            tools=[file_search_tool]
        )
    )

    # Parse and validate response
    try:
        # Clean response text (may have preamble or markdown)
        response_text = response.text.strip()

        # Remove common preambles
        if response_text.startswith("Here is") or response_text.startswith("```"):
            # Find JSON block
            if "```json" in response_text:
                start = response_text.find("```json") + 7
                end = response_text.find("```", start)
                response_text = response_text[start:end].strip()
            elif response_text.startswith("```"):
                lines = response_text.split('\n')
                response_text = '\n'.join(lines[1:-1])  # Remove first and last lines
            else:
                # Remove first line if it's preamble
                lines = response_text.split('\n')
                response_text = '\n'.join(lines[1:])

        # Parse JSON
        result_dict = json.loads(response_text)

        # Validate with Pydantic
        validated_result = VerificationResult(**result_dict)
        result = {
            "verified": validated_result.verified,
            "verification_score": validated_result.verification_score,
            "verification_source": validated_result.verification_source,
            "verification_note": validated_result.verification_note
        }

    except (json.JSONDecodeError, ValidationError) as e:
        print(f"\n‚ö†Ô∏è  WARNING: Failed to parse/validate response")
        print(f"Response text preview: {response.text[:300]}...")
        print(f"Error: {e}")
        result = {
            "verified": False,
            "verification_score": 1,
            "verification_source": "Parse error",
            "verification_note": f"Failed to parse: {str(e)}"
        }

    # Extract grounding citations from API metadata
    actual_citations = []
    if hasattr(response, 'candidates') and response.candidates:
        candidate = response.candidates[0]
        if hasattr(candidate, 'grounding_metadata') and candidate.grounding_metadata:
            if hasattr(candidate.grounding_metadata, 'grounding_chunks'):
                for grounding_chunk in candidate.grounding_metadata.grounding_chunks:
                    citation = {}

                    # File Search uses retrieved_context
                    if hasattr(grounding_chunk, 'retrieved_context'):
                        ctx = grounding_chunk.retrieved_context
                        citation["title"] = getattr(ctx, 'title', 'Document')
                        citation["excerpt"] = getattr(ctx, 'text', '')[:300]
                    # Fallback to document attribute
                    elif hasattr(grounding_chunk, 'document') and grounding_chunk.document:
                        citation["title"] = getattr(grounding_chunk.document, 'title', 'Document')
                        if hasattr(grounding_chunk, 'content') and hasattr(grounding_chunk.content, 'text'):
                            citation["excerpt"] = grounding_chunk.content.text[:300]

                    if citation:
                        actual_citations.append(citation)

    # Add to results
    result['chunk'] = chunk
    result['actual_citations'] = actual_citations
    verification_results.append(result)

    # Display result
    verified_icon = "‚úÖ" if result.get('verified', False) else "‚ùå"
    print(f"\n{verified_icon} VERIFIED: {result.get('verified', False)}")
    print(f"üìä Confidence Score: {result.get('verification_score', 0)}/10")
    print(f"üìö Source: {result.get('verification_source', 'N/A')}")
    print(f"üí≠ Note: {result.get('verification_note', 'N/A')}")

    if actual_citations:
        print(f"\nüîó Grounding Citations ({len(actual_citations)}):")
        for i, citation in enumerate(actual_citations[:2], 1):  # Show first 2
            print(f"  {i}. {citation.get('title', 'Unknown')}")
            if 'excerpt' in citation:
                excerpt = citation['excerpt'][:150]
                print(f"     \"{excerpt}...\"")
    else:
        print(f"\n‚ö†Ô∏è WARNING: No grounding citations - File Search may not have been used")

    # Small delay to avoid rate limits
    time.sleep(0.5)

print("\n" + "=" * 80)
print("VERIFICATION COMPLETE")
print("=" * 80)

# Calculate and display statistics
total_chunks = len(verification_results)
verified_count = sum(1 for r in verification_results if r['verified'])
total_citations = sum(len(r['actual_citations']) for r in verification_results)

print(f"\nüìä Summary:")
print(f"  Total Chunks: {total_chunks}")
print(f"  Verified: {verified_count}")
print(f"  Total Grounding Citations: {total_citations}")

if total_citations == 0:
    print(f"\n‚ùå PROBLEM: No grounding citations found!")
    print(f"   This means File Search was not used.")
else:
    print(f"\n‚úÖ SUCCESS: File Search is working ({total_citations} citations)")



VERIFICATION LAYER - Verifying Chunks Against Corpus (FIXED)

üìÑ Page 1, Item 1
üìù Chunk: "The contract was signed on January 15, 2024."

‚úÖ VERIFIED: True
üìä Confidence Score: 10/10
üìö Source: cite: 1
üí≠ Note: The Service Agreement explicitly states that it was entered into on January 15, 2024.

üîó Grounding Citations (1):
  1. Service Agreement - Company A & B
     "SERVICE AGREEMENT

This Service Agreement ("Agreement") is entered into on January 15, 2024, between:

Company A ("Provider") - 123 Main Street, San F..."

üìÑ Page 1, Item 2
üìù Chunk: "Client agrees to pay $150,000 for the services."

‚úÖ VERIFIED: True
üìä Confidence Score: 10/10
üìö Source: cite: 1
üí≠ Note: The Service Agreement explicitly states that the Client agrees to pay $150,000 for the services.

üîó Grounding Citations (1):
  1. Service Agreement - Company A & B
     "SERVICE AGREEMENT

This Service Agreement ("Agreement") is entered into on January 15, 2024, between:

Company A ("Provider

In [None]:
# Calculate statistics
total_chunks = len(verification_results)
verified_count = sum(1 for r in verification_results if r['verified'])
unverified_count = total_chunks - verified_count
avg_score = sum(r['verification_score'] for r in verification_results) / total_chunks

print("\n" + "=" * 80)
print("üìä VERIFICATION SUMMARY")
print("=" * 80)
print(f"\nTotal Chunks: {total_chunks}")
print(f"‚úÖ Verified: {verified_count} ({verified_count/total_chunks*100:.1f}%)")
print(f"‚ùå Unverified: {unverified_count} ({unverified_count/total_chunks*100:.1f}%)")
print(f"üìä Average Confidence: {avg_score:.1f}/10")

print("\n" + "-" * 80)
print("BREAKDOWN BY CHUNK:")
print("-" * 80)
for result in verification_results:
    chunk = result['chunk']
    icon = "‚úÖ" if result['verified'] else "‚ùå"
    print(f"{icon} Page {chunk['page']}, Item {chunk['item']} - Score: {result['verification_score']}/10")
    print(f"   \"{chunk['text'][:70]}...\"")
    print()


üìä VERIFICATION SUMMARY

Total Chunks: 5
‚úÖ Verified: 4 (80.0%)
‚ùå Unverified: 1 (20.0%)
üìä Average Confidence: 8.4/10

--------------------------------------------------------------------------------
BREAKDOWN BY CHUNK:
--------------------------------------------------------------------------------
‚úÖ Page 1, Item 1 - Score: 10/10
   "The contract was signed on January 15, 2024...."

‚úÖ Page 1, Item 2 - Score: 10/10
   "Client agrees to pay $150,000 for the services...."

‚úÖ Page 1, Item 3 - Score: 10/10
   "The project timeline is from February 1 to June 30, 2024...."

‚ùå Page 1, Item 4 - Score: 2/10
   "The parties agree to a 90-day warranty period...."

‚úÖ Page 1, Item 5 - Score: 10/10
   "All work product becomes the property of the Client upon final payment..."



## Part 5: Export Verification Results

In [10]:
# Export to JSON
output_data = {
    "corpus_store": CORPUS_STORE_NAME,
    "case_context": case_context,
    "total_chunks": total_chunks,
    "verified_chunks": verified_count,
    "average_confidence": avg_score,
    "results": verification_results
}

output_file = "verification_results.json"
with open(output_file, 'w', encoding='utf-8') as f:
    json.dump(output_data, f, indent=2, ensure_ascii=False)

print(f"‚úì Verification results exported to: {output_file}")

‚úì Verification results exported to: verification_results.json


## Cleanup: Delete File Search Store

**Important:** File Search stores are free, but you may want to clean up test stores.

In [11]:
"""
FIXED CLEANUP CELL - Properly delete File Search store

ERROR: Cannot delete non-empty FileSearchStore
SOLUTION: Delete all documents first, then delete the store
"""

print("\n" + "=" * 80)
print("CLEANUP - Deleting File Search Store")
print("=" * 80)

try:
    # Step 1: List all documents in the store
    print(f"\nüìã Listing documents in store: {CORPUS_STORE_NAME}")

    try:
        documents = list(client.file_search_stores.list_documents(
            file_search_store_name=CORPUS_STORE_NAME
        ))
        print(f"   Found {len(documents)} document(s)")

        # Step 2: Delete each document
        if documents:
            print(f"\nüóëÔ∏è  Deleting documents...")
            for i, doc in enumerate(documents, 1):
                doc_name = doc.name
                print(f"   {i}. Deleting: {doc_name}")
                try:
                    client.file_search_stores.delete_document(name=doc_name)
                    print(f"      ‚úì Deleted")
                except Exception as e:
                    print(f"      ‚ö†Ô∏è  Error: {e}")
        else:
            print(f"   No documents to delete")

    except Exception as e:
        print(f"   ‚ö†Ô∏è  Could not list documents: {e}")
        print(f"   Attempting to delete store with force flag...")

    # Step 3: Delete the store (with force flag to delete any remaining content)
    print(f"\nüóëÔ∏è  Deleting File Search store...")
    client.file_search_stores.delete(
        name=CORPUS_STORE_NAME,
        config={'force': True}  # Force delete even if not empty
    )
    print(f"‚úì Deleted File Search store: {CORPUS_STORE_NAME}")

except Exception as e:
    print(f"\n‚ùå Error during cleanup: {e}")
    print(f"\nTo manually clean up, run:")
    print(f"   client.file_search_stores.delete(name='{CORPUS_STORE_NAME}', config={{'force': True}})")

print("\n" + "=" * 80)
print("CLEANUP COMPLETE")
print("=" * 80)



CLEANUP - Deleting File Search Store

üìã Listing documents in store: fileSearchStores/demo-corpus-1763268225-icl6yo5kvmtt
   ‚ö†Ô∏è  Could not list documents: 'FileSearchStores' object has no attribute 'list_documents'
   Attempting to delete store with force flag...

üóëÔ∏è  Deleting File Search store...
‚úì Deleted File Search store: fileSearchStores/demo-corpus-1763268225-icl6yo5kvmtt

CLEANUP COMPLETE


## Summary

This notebook demonstrated:

1. ‚úÖ **Creating a Corpus** - File Search Store for reference documents
2. ‚úÖ **AI Metadata Generation** - Using Gemini Flash Lite to analyze and tag documents
3. ‚úÖ **Querying the Corpus** - Semantic search with File Search
4. ‚úÖ **Verification Layer** - Verifying chunks against corpus with citations

### Key Models Used

- **gemini-2.5-flash**: Fast, accurate model for verification and querying
- **gemini-2.5-flash-lite**: Ultra-fast, cost-effective model for metadata generation
- **File Search**: Fully managed RAG with automatic chunking and embeddings

### Pricing

- **File Search Storage**: Free
- **File Search Querying**: Free
- **Initial Indexing**: $0.15 per 1M tokens
- **Flash Model**: $0.10 per 1M input tokens, $0.40 per 1M output tokens
- **Flash Lite Model**: $0.075 per 1M input tokens, $0.30 per 1M output tokens

### Next Steps

- Explore the full application in `backend/` and `frontend/`
- Run tests in `tests/` to see comprehensive examples
- Check `backend/app/verification/gemini_service.py` for production implementation