
Pinecone Serverless Reranking

Last Updated: April 27th, 2025

Daily Challenge: Pinecone Serverless Reranking in Action


Why are we doing this?

Reranking models boost search relevance by assigning similarity scores between a query and documents, then reordering results so the most pertinent information appears first. In contexts like healthcare, this helps clinicians quickly access the most critical clinical notes.


Task Overview & Detailed Explanations

Below is a skeleton pipeline. Each numbered item is an action you must complete. After every instruction, you'll find a clear explanation of what to do and why it's important. Whenever you see ..., replace it with the appropriate code or value, using the hint for guidance.


Part 1: Load Documents & Execute Reranking Model



1. Install Pinecone libraries


pip install pinecone==6.0.1 pinecone-notebooks


    What to do: Run this command in your terminal or notebook to install the Pinecone client library and the notebook helper package.
    Why: You'll need the client package to interact with Pinecone's API and the notebook helper to simplify authentication in environments like Colab.


2. Authenticate with Pinecone


import os
if not os.environ.get("PINECONE_API_KEY"):
   from pinecone_notebooks.colab import Authenticate
   Authenticate()


    What to do: Check if your environment has the PINECONE_API_KEY. If not, call Authenticate() to prompt for it.
    Why: Securely providing your API key lets the client connect to your Pinecone project without hard-coding secrets in your script.


3. Instantiate the Pinecone client


from pinecone import Pinecone
api_key = os.environ["PINECONE_API_KEY"]
environment = "..."  # e.g., "us-west1-gcp"
pc = Pinecone(api_key=api_key, environment=environment)


    What to do: Fill in your Pinecone project's environment string (found in your Pinecone dashboard) in place of .... Then create a Pinecone client instance.
    Why: The client (pc) is your entry point for all Pinecone operations—creating indexes, querying, and reranking.


4. Define your query & documents


query = "Tell me about Apple's products"
documents = [
   ...  # Provide five text strings: some about the fruit, some about the company
]


    What to do: Replace ... with a list of five example sentences that include both references to the fruit “apple” and the company “Apple Inc.”.
    Why: You need a small set of documents to test the reranker's ability to distinguish between different contexts of the same word.


5. Call the reranker


from pinecone import RerankModel
reranked = pc.inference.rerank(
   model="bge-reranker-v2-m3",
   query=query,
   documents=[{"id": str(i), "text": doc} for i, doc in enumerate(documents)],
   top_n=...  # e.g., 3
)


    What to do: Fill in top_n with how many top results you want returned (e.g., 3).
    Why: top_n limits the number of reranked results, so you only retrieve the most relevant documents.


6. Inspect reranked results


def show_reranked(query, matches):
   print(f"Query: {query}")
   for i, m in enumerate(matches):
       ...  # Print the position (i+1), m.score, and m.document.text
show_reranked(query, reranked.matches)


    What to do: Replace ... with code that prints out the rank (i+1), the similarity score m.score, and the document text m.document.text.
    Why: Seeing these values demonstrates how the reranker orders documents and what scores it assigns.


Part 2: Setup a Serverless Index for Medical Notes

1. Install data & model libraries


pip install pandas torch transformers


    What to do: Install pandas for data manipulation, torch for model inference, and transformers for loading embedding models.
    Why: You'll use these libraries to load, embed, and manipulate medical note data.


2. Import modules & define environment settings


import os, time, pandas as pd, torch
from pinecone import Pinecone, ServerlessSpec

cloud = "..."        # e.g., "aws"
region = "..."       # e.g., "us-east-1"
spec = ServerlessSpec(cpu=..., memory_gb=...)
index_name = "pinecone-reranker"

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"], environment=f"{cloud}-{region}")


    What to do: Fill in cloud and region with your Pinecone project's deployment environment. Choose CPU and memory values in ServerlessSpec.
    Why: You're configuring a serverless index tailored to your resource requirements and connecting the client in the proper cloud region.


3. Create or recreate the index


if pc.has_index(index_name):
   pc.delete_index(index_name)
pc.create_index(
   name=index_name,
   dimension=...,           # must match embedding vector size
   serverless_config=spec
)


    What to do: Set dimension equal to your embedding model's output size (e.g., 384).
    Why: The index's dimension must match the embedding vectors you'll insert, otherwise upserts will fail.


Part 3: Load the Sample Data

1. Download & read JSONL


import requests, tempfile

with tempfile.TemporaryDirectory() as tmpdir:
   file_path = os.path.join(tmpdir, "sample_notes_data.jsonl")
   url = "..."  # raw GitHub URL to JSONL file
   resp = requests.get(url)
   resp.raise_for_status()
   open(file_path, "wb").write(resp.content)
   df = pd.read_json(file_path, orient='records', lines=True)


    What to do: Insert the raw URL in place of ... to download the sample medical notes.
    Why: You need a DataFrame of medical notes (with embeddings already available) to index and test queries.


2. Preview the DataFrame


print(df.head())


    What to do: Run this to view the first few rows of the DataFrame.
    Why: Ensures you have the right columns (e.g., id, embedding, metadata) before upserting.


Part 4: Upsert Data into the Index

1. Instantiate index client & upsert


index = pc.Index(index_name)
index.upsert_from_dataframe(df)


    What to do: Create an Index object and call upsert_from_dataframe.
    Why: This pushes all your note embeddings and metadata into Pinecone for later queries.


2. Wait for availability


def is_ready(idx):
   stats = idx.describe_index_stats()
   return stats.total_vector_count > 0

while not is_ready(index):
   time.sleep(5)
print(index.describe_index_stats())


    What to do: Poll until total_vector_count is greater than zero.
    Why: Ensures that upserted vectors are fully indexed before you attempt to query.


Part 5: Query & Embedding Function

1. Define your embedding function


from sentence_transformers import SentenceTransformer

def get_embedding(text):
   model = SentenceTransformer("...")  # e.g., "all-MiniLM-L6-v2"
   return model.encode(text)


    What to do: Provide the name of the sentence-transformer model you plan to use in place of ....
    Why: Converts incoming queries into the same vector space as your indexed notes.


2. Run a semantic search query


question = "..."  # e.g., "what if my patient has leg pain"
emb = get_embedding(question)
results = index.query(vector=emb, top_k=..., include_metadata=True)
matches = sorted(results.matches, key=lambda m: m.score, reverse=True)


    What to do: Replace question, set top_k for number of results (e.g., 5).
    Why: Retrieves the most semantically similar notes from the index based on your clinical query.


Part 6: Display & Rerank Clinical Notes

1. Display initial search results


def show_results(q, matches):
   print(f"Question: {q}")
   for i, m in enumerate(matches):
       ...  # print i+1, m.id, m.score, m.metadata
show_results(question, matches)


    What to do: Fill in the print statement to show rank, vector ID, similarity score, and metadata.
    Why: Helps you see which notes were initially considered most relevant.


2. Prepare documents for reranking


rerank_docs = [
   {"id": m.id, "reranking_field": "; ".join([f"{k}: {v}" for k, v in m.metadata.items()])}
   for m in matches
]
rerank_query = "..."  # e.g., a more specific clinical question


    What to do: Set rerank_query to a refined question that tests finer distinctions (e.g., focusing on a procedure or symptom).
    Why: Constructs a field summarizing each note's metadata for the reranker to use when rescoring.


3. Execute serverless reranking


reranked = pc.inference.rerank(
   model="bge-reranker-v2-m3",
   query=rerank_query,
   documents=rerank_docs,
   rank_fields=["reranking_field"],
   top_n=...  # number of top reranked notes to view
)


    What to do: Choose top_n to specify how many reranked results you need.
    Why: Reranking uses the refined query and metadata field to reorder notes by their new relevance scores.


4. Show reranked results


def show_reranked(q, matches):
   print(f"Refined Query: {q}")
   for i, m in enumerate(matches):
       ...  # print i+1, m.document.id, m.score, m.document.reranking_field
show_reranked(rerank_query, reranked.matches)


    What to do: Complete the print logic to display each reranked note's rank, ID, score, and the reranking_field.
    Why: Allows you to compare how the reranker improves result ordering against the original search.




In [2]:
from dotenv import load_dotenv
import os, requests, tempfile, time, pandas as pd
from sentence_transformers import SentenceTransformer
from pinecone import Pinecone, ServerlessSpec, RerankModel

# chargement de la clé API
load_dotenv()
api_key = os.environ.get("API_PINECONE")
if not api_key:
    raise ValueError("API_PINECONE introuvable dans .env")

In [None]:
# instanciation du client Pinecone
cloud = "aws"
region = "us-east-1"
environment = f"{cloud}-{region}"
pc = Pinecone(api_key=api_key, environment=environment)

# instanciation du client Pinecone
cloud = "aws"
region = "us-east-1"
environment = f"{cloud}-{region}"
pc = Pinecone(api_key=api_key, environment=environment)

# création de l'index serverless
index_name = "medical-notes-index"
dim = 384

spec = ServerlessSpec(cloud=cloud, region=region)

if pc.has_index(index_name):
    pc.delete_index(index_name)
pc.create_index(name=index_name, dimension=dim, spec=spec)
print(f"Création de l'index '{index_name}' avec dimension {dim}")

Création de l’index 'medical-notes-index' avec dimension 384


In [6]:
# téléchargement et chargement des données JSONL
url = "https://raw.githubusercontent.com/pinecone-io/examples/refs/heads/master/docs/data/sample_notes_data.jsonl"
with tempfile.TemporaryDirectory() as tmpdir:
    file_path = os.path.join(tmpdir, "sample_notes_data.jsonl")
    resp = requests.get(url)
    resp.raise_for_status()
    with open(file_path, "wb") as f:
        f.write(resp.content)
    df = pd.read_json(file_path, orient='records', lines=True)

print("Aperçu du DataFrame :")
print(df.head())

Aperçu du DataFrame :
     id                                             values  \
0  P011  [-0.2027486265, 0.2769146562, -0.1509393603, 0...   
1  P001  [0.1842793673, 0.4459365904, -0.0770567134, 0....   
2  P002  [-0.2040648609, -0.1739618927, -0.2897160649, ...   
3  P003  [0.1889383644, 0.2924542725, -0.2335938066, -0...   
4  P004  [-0.12171068040000001, 0.1674752235, -0.231888...   

                                            metadata  
0  {'advice': 'rest, hydrate', 'symptoms': 'heada...  
1  {'tests': 'EKG, stress test', 'symptoms': 'che...  
2  {'HbA1c': '7.2', 'condition': 'diabetes', 'med...  
3  {'symptoms': 'cough, wheezing', 'diagnosis': '...  
4  {'referral': 'dermatology', 'condition': 'susp...  


In [7]:
# insertion dans l'index Pinecone
index = pc.Index(index_name)
index.upsert_from_dataframe(df)
print("Upsert en cours...")

while True:
    stats = index.describe_index_stats()
    if stats.total_vector_count > 0:
        break
    print("En attente de l'index…")
    time.sleep(5)

print("Vecteurs insérés. Détail de l'index :")
print(stats)

sending upsert requests:   0%|          | 0/100 [00:00<?, ?it/s]

Upsert en cours...
En attente de l'index…
En attente de l'index…
En attente de l'index…
En attente de l'index…
En attente de l'index…
En attente de l'index…
En attente de l'index…
En attente de l'index…
En attente de l'index…
Vecteurs insérés. Détail de l'index :
{'dimension': 384,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'': {'vector_count': 100}},
 'total_vector_count': 100,
 'vector_type': 'dense'}


In [8]:
# définition de la fonction d'embed pour la requête
model = SentenceTransformer("all-MiniLM-L6-v2")
def get_embedding(text):
    return model.encode(text).tolist()

# recherche sémantique
question = "What if my patient has leg pain?"
emb = get_embedding(question)
res = index.query(vector=emb, top_k=5, include_metadata=True)
matches = sorted(res.matches, key=lambda m: m.score, reverse=True)

print("\nRésultats brut de la recherche sémantique :")
for i, m in enumerate(matches, 1):
    print(f"{i}. id={m.id}, score={m.score:.4f}, metadata={m.metadata}")


Résultats brut de la recherche sémantique :
1. id=P0100, score=0.5329, metadata={'advice': 'over-the-counter pain relief, stretching', 'symptoms': 'muscle pain'}
2. id=P047, score=0.5078, metadata={'symptoms': 'back pain', 'treatment': 'physical therapy'}
3. id=P095, score=0.5078, metadata={'symptoms': 'back pain', 'treatment': 'physical therapy'}
4. id=P007, score=0.4539, metadata={'surgery': 'knee arthroscopy', 'symptoms': 'pain, swelling', 'treatment': 'physical therapy'}
5. id=P092, score=0.4475, metadata={'condition': 'dehydration', 'treatment': 'IV fluids'}


In [9]:
# préparation des documents pour reranking
rerank_docs = [
    {"id": m.id, "text": "; ".join(f"{k}: {v}" for k, v in (m.metadata or {}).items())}
    for m in matches
]

rerank_query = "Patient reports leg swelling and difficulty walking."

# exécution du reranker avec les corrections
try:
    reranked = pc.inference.rerank(
        model="bge-reranker-v2-m3",
        query=rerank_query,
        documents=rerank_docs,
        top_n=3,
        return_documents=True,
        parameters={"truncate": "END"}
    )
    if not reranked or not reranked.data:
        raise ValueError("Le reranker n'a retourné aucun résultat.")
except Exception as e:
    print("Erreur pendant le reranking :", e)
    exit()

# affichage des résultats rerankés
print("\nRésultats après reranking :")
for i, item in enumerate(reranked.data, 1):
    doc = item["document"]
    score = item["score"]
    text = doc.get("text", "")
    print(f"{i}. id={doc['id']}, score={score:.4f}, text={text[:100]}...")


Résultats après reranking :
1. id=P007, score=0.0768, text=surgery: knee arthroscopy; symptoms: pain, swelling; treatment: physical therapy...
2. id=P0100, score=0.0131, text=advice: over-the-counter pain relief, stretching; symptoms: muscle pain...
3. id=P047, score=0.0065, text=symptoms: back pain; treatment: physical therapy...
