
# üîç Semantic Similarity Search with Pinecone

This notebook demonstrates how incident reports stored in **Pinecone** can be queried
using **semantic similarity**, not keyword matching . 
Useful for testing the pinecone Database . 

**Use case:**  
Find incidents similar to a new citizen report using vector embeddings.

---



## üì¶ Requirements
```bash
pip install pinecone-client google-generativeai
```


In [1]:

# Import libraries
from pinecone import Pinecone
import google.generativeai as genai
import os


ModuleNotFoundError: No module named 'pinecone'


## üîë Configure API Keys
Set your API keys as environment variables before running.


In [None]:

# API keys (recommended via environment variables)
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")

genai.configure(api_key=GEMINI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)



## üìå Connect to Pinecone Index
Make sure the index dimension matches the embedding model (3072 for Gemini).


In [None]:

index_name = "incidents-index"
index = pc.Index(index_name)



## ‚úçÔ∏è Create Embedding for a New Incident
This simulates a new incident report submitted by a citizen.


In [None]:

query_text = """
Overflowing trash bins near the central market causing bad smell
"""

embedding = genai.embed_content(
    model="models/embedding-001",
    content=query_text
)["embedding"]



## üîé Perform Similarity Search
We retrieve the most similar incidents based on semantic meaning.


In [None]:

results = index.query(
    vector=embedding,
    top_k=5,
    include_metadata=True
)

for match in results["matches"]:
    print(f"Score: {match['score']}")
    print(f"Original MongoDB ID: {match['metadata'].get('mongodb_id')}")
    print(f"Text: {match['metadata'].get('text')}")
    print("-" * 50)



## ‚úÖ Why This Matters

- No keyword dependency  
- Finds *conceptually similar* incidents  
- Enables:
  - Duplicate detection
  - Incident clustering
  - RAG-based assistants
  - Smart dashboards


