# LAB | Extractive Question Answering

This notebook demonstrates how Pinecone helps you build an extractive question-answering application. To build an extractive question-answering system, we need three main components:

- A vector index to store and run semantic search
- A retriever model for embedding context passages
- A reader model to extract answers

We will use the SQuAD dataset, which consists of **questions** and **context** paragraphs containing question **answers**. We generate embeddings for the context passages using the retriever, index them in the vector database, and query with semantic search to retrieve the top k most relevant contexts containing potential answers to our question. We then use the reader model to extract the answers from the returned contexts.

Let's get started by installing the packages needed for notebook to run:

In [1]:
!pip uninstall pinecone-client # Uninstall the old library first
!pip install pinecone # Install the new official client
!pip install sentence-transformers
!pip install requests



In [2]:
#!pip install pinecone-client
#!pip install sentence-transformers
#!pip install requests

In [3]:
import os
from google.colab import userdata

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['PINECONE_API_KEY'] = userdata.get('PINECONE_API_KEY')

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY')

print("API keys loaded successfully.")

API keys loaded successfully.


# Install Dependencies

In [4]:
!pip install -qU datasets pinecone-client sentence-transformers torch

# Load Dataset

Now let's load the SQUAD dataset from the HuggingFace Model Hub. We load the dataset into a pandas dataframe and filter the title, question, and context columns, and we drop any duplicate context passages.

In [5]:
from datasets import load_dataset

# load the squad dataset into a pandas dataframe
df = load_dataset("squad", split="train").to_pandas()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [6]:
# select only title and context columns
df = df.loc[:, ['title', 'context']]
# or alternatively
# df = df[['title', 'context']]

# drop rows containing duplicate context passages
df = df.drop_duplicates(subset=['context'])

# display the head of the dataframe to confirm changes
print(df.head())

                       title  \
0   University_of_Notre_Dame   
5   University_of_Notre_Dame   
10  University_of_Notre_Dame   
15  University_of_Notre_Dame   
20  University_of_Notre_Dame   

                                              context  
0   Architecturally, the school has a Catholic cha...  
5   As at most other universities, Notre Dame's st...  
10  The university is the major seat of the Congre...  
15  The College of Engineering was established in ...  
20  All of Notre Dame's undergraduate students are...  


In [7]:
# The following code will uninstall the old Pinecone client
# and install the new one. This is a one-time fix.
!pip uninstall -y pinecone-client pinecone
!pip install --upgrade pinecone

print("Pinecone packages have been updated.")

Found existing installation: pinecone-client 6.0.0
Uninstalling pinecone-client-6.0.0:
  Successfully uninstalled pinecone-client-6.0.0
Found existing installation: pinecone 7.3.0
Uninstalling pinecone-7.3.0:
  Successfully uninstalled pinecone-7.3.0
Collecting pinecone
  Using cached pinecone-7.3.0-py3-none-any.whl.metadata (9.5 kB)
Using cached pinecone-7.3.0-py3-none-any.whl (587 kB)
Installing collected packages: pinecone
Successfully installed pinecone-7.3.0
Pinecone packages have been updated.


# Initialize Pinecone Index

The Pinecone index stores vector representations of our context passages which we can retrieve using another vector (query vector). We first need to initialize our connection to Pinecone to create our vector index. For this, we need a free [API key]("https://app.pinecone.io/"), and then we initialize the connection like so:

In [8]:
!pip install -qU langchain-pinecone pinecone-notebooks

In [9]:
from pinecone import Pinecone, ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)

# connect to pinecone environment
pc = Pinecone(
    api_key = PINECONE_API_KEY,
    environment='us-east-1'  # find next to API key in console
)

Now we create a new index called "question-answering" — we can name the index anything we want. We specify the metric type as "cosine" and dimension as 384 because the retriever we use to generate context embeddings is optimized for cosine similarity and outputs 384-dimension vectors.

In [10]:
index_name = "question-answering"

# check if the question-answering index exists
if index_name not in pc.list_indexes().names():
    # create the index if it does not exist
    pc.create_index(
        name=index_name,
        dimension=384,
        metric='cosine',
        spec=spec
    )
    print(f"Index '{index_name}' created successfully.")
else:
    print(f"Index '{index_name}' already exists.")

# connect to the question-answering index we created
index = pc.Index(index_name)

Index 'question-answering' already exists.


# Initialize Retriever

Next, we need to initialize our retriever. The retriever will mainly do two things:

- Generate embeddings for all context passages (context vectors/embeddings)
- Generate embeddings for our questions (query vector/embedding)

The retriever will generate embeddings in a way that the questions and context passages containing answers to our questions are nearby in the vector space. We can use cosine similarity to calculate the similarity between the query and context embeddings to find the context passages that contain potential answers to our question.

We will use a SentenceTransformer model named ``multi-qa-MiniLM-L6-cos-v1`` designed for semantic search and trained on 215M (question, answer) pairs from diverse sources as our retriever.

In [11]:
import torch
from sentence_transformers import SentenceTransformer

# set device to GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

# Load the retriever model from the Hugging Face model hub.
# 'multi-qa-MiniLM-L6-cos-v1' is designed for semantic search and was trained on 215M
# (question, answer) pairs to make queries and relevant context passages
# close in vector space.
retriever = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1', device=device)

print("Retriever model loaded successfully.")


Using device: cuda
Retriever model loaded successfully.


In [12]:
print(df.columns)

Index(['title', 'context'], dtype='object')


# Generate Embeddings and Upsert

Next, we need to generate embeddings for the context passages. We will do this in batches to help us more quickly generate embeddings and upload them to the Pinecone index. When passing the documents to Pinecone, we need an id (a unique value), context embedding, and metadata for each document representing context passages in the dataset. The metadata is a dictionary containing data relevant to our embeddings, such as the article title, context passage, etc.

In [13]:
import pandas as pd
import numpy as np

# Create a sample DataFrame to replicate your issue
data = {'title': ['First Article', 'Second Article'],
        'context': ['This is the first article context.', 'This is the second article context.']}
df = pd.DataFrame(data)

# Print the columns to confirm the names, just as you did
print("Original Columns:")
print(df.columns)
print("-" * 20)

# Option 1: Access the correct column
print("Using the correct column name 'title':")
first_article_title = df['title'][0]
print(f"The title of the first article is: {first_article_title}")
print("-" * 20)

# Option 2: Rename the column
# We use a dictionary to map the old name to the new name.
df.rename(columns={'title': 'article_title'}, inplace=True)

# Now, print the columns again to see the change
print("Columns after renaming:")
print(df.columns)
print("-" * 20)

# Now you can safely access the column using the new name
first_article_title_renamed = df['article_title'][0]
print(f"The title of the first article (using the new name) is: {first_article_title_renamed}")


Original Columns:
Index(['title', 'context'], dtype='object')
--------------------
Using the correct column name 'title':
The title of the first article is: First Article
--------------------
Columns after renaming:
Index(['article_title', 'context'], dtype='object')
--------------------
The title of the first article (using the new name) is: First Article


# Initialize Reader

We use the `deepset/electra-base-squad2` model from the HuggingFace model hub as our reader model. We load this model into a "question-answering" pipeline from HuggingFace transformers and feed it our questions and context passages individually. The model gives a prediction for each context we pass through the pipeline.

In [14]:
from transformers import pipeline

model_name = 'deepset/electra-base-squad2'
# load the reader model into a question-answering pipeline
reader = pipeline(tokenizer=model_name, model=model_name, task='question-answering', device=device)
reader

Device set to use cuda


<transformers.pipelines.question_answering.QuestionAnsweringPipeline at 0x7b26a16d67e0>

Now all the components we need are ready. Let's write some helper functions to execute our queries. The `get_context` function retrieves the context embeddings containing answers to our question from the Pinecone index, and the `extract_answer` function extracts the answers from these context passages.

In [15]:
# This function gets context passages from the Pinecone index
def get_context(question, top_k):
    """
    Generates embeddings for a question and searches a Pinecone index
    to find the most relevant context passages.

    Args:
        question (str): The question string.
        top_k (int): The number of top context passages to retrieve.

    Returns:
        list: A list of context passages (strings).
    """

    # 1. Generate embeddings for the question using the embedder model.
    # NOTE: You must have an 'embedder' object defined elsewhere in your code.
    # This object is typically a HuggingFace SentenceTransformer or similar model.
    # Example: from sentence_transformers import SentenceTransformer
    #          embedder = SentenceTransformer('all-MiniLM-L6-v2')
    xq = embedder.encode(question).tolist()

    # 2. Search the Pinecone index for context passages with the answer.
    # NOTE: You must have an 'index' object from Pinecone initialized elsewhere.
    # Example: import pinecone
    #          index = pinecone.Index('your-index-name')
    xc = index.query(xq, top_k=top_k, include_metadata=True)

    # 3. Extract the context passages from the Pinecone search result.
    c = [match['metadata']['text'] for match in xc['matches']]

    return c


# This function extracts the answer from the context using the reader model
def extract_answer(question, context_list):
    """
    Extracts an answer for a question from a list of context passages
    using a question-answering pipeline.

    Args:
        question (str): The question string.
        context_list (list): A list of context passages (strings).

    Returns:
        dict: A dictionary containing the answer, score, start, and end.
    """
    # NOTE: This function assumes you have a 'reader' pipeline defined,
    # as shown in your previous code cell.

    # The pipeline works best with a single context. For a list of contexts,
    # we'll extract the best answer by feeding each context to the reader
    # and selecting the one with the highest score.

    best_answer = None
    max_score = -1.0

    for context in context_list:
        prediction = reader(question=question, context=context)

        # Check if this prediction is better than the current best
        if prediction['score'] > max_score:
            best_answer = prediction
            max_score = prediction['score']

    return best_answer


In [16]:
from pprint import pprint

# This function extracts the answer from the context passage(s) and returns the best one
def extract_answer(question, context):
    """
    Feeds a question and a list of contexts to a question-answering reader model,
    extracts the best answer, and prints all results sorted by confidence score.

    Args:
        question (str): The question string.
        context (list): A list of context passages (strings).

    Returns:
        list: The list of all results (answers with their scores and contexts),
              sorted from highest to lowest score.
    """

    results = []
    for c in context:
        # Feed the reader the question and a single context to extract an answer
        answer = reader(question=question, context=c)

        # Add the context to the answer dictionary for printing both together
        answer["context"] = c
        results.append(answer)

    # Sort the results based on the score from the reader model, from highest to lowest
    sorted_results = sorted(results, key=lambda x: x['score'], reverse=True)

    # Print the sorted results for easy viewing
    print("All Answers Sorted by Confidence Score:")
    pprint(sorted_results)
    print("-" * 50)

    # Return the sorted list so you can use it later
    return sorted_results


In [17]:
from pprint import pprint
from pinecone import Pinecone, ServerlessSpec
import time

# You should use environment variables for your API key in a real application.
# For this example, replace "YOUR_API_KEY" with your actual key.
API_KEY = "pcsk_2Jdn1q_UrtUYBaA95hb4QxQuxGXV61vDS9iXMjzvpUpo4EPrenEYxgQbnjnDkVD3Qvpvjo"

# This is the name of your index from the Pinecone dashboard.
INDEX_NAME = "question-answering"

# 1. Initialize the Pinecone client
print("Initializing Pinecone client...")
pc = Pinecone(api_key=API_KEY)

# 2. Check if the index exists and create it if necessary
# The correct way to check is to get the list of index names.
existing_indexes = [index.name for index in pc.list_indexes()]

if INDEX_NAME not in existing_indexes:
    print(f"Index '{INDEX_NAME}' not found. Creating it now...")
    pc.create_index(
        name=INDEX_NAME,
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )
    # Wait for the index to be ready
    while not pc.describe_index(INDEX_NAME).status['ready']:
        time.sleep(1)

print(f"Successfully connected to index '{INDEX_NAME}'.")

# 3. Connect to the index
index = pc.Index(INDEX_NAME)

# 4. Upsert (insert) some sample data
# The dimension of the vector is 384. We create a helper function for clarity.
def create_vector_values(base_list):
    """Repeats a small list to create a 384-dimensional vector."""
    repeated_list = base_list * (384 // len(base_list))
    # Add any remaining elements to match the exact dimension
    remaining_elements = 384 % len(base_list)
    repeated_list.extend(base_list[:remaining_elements])
    return repeated_list

sample_vectors = [
    {
        "id": "vector1",
        "values": create_vector_values([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]),
        "metadata": {"text": "What is the capital of France?"}
    },
    {
        "id": "vector2",
        "values": create_vector_values([1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]),
        "metadata": {"text": "How does a vector database work?"}
    },
    {
        "id": "vector3",
        "values": create_vector_values([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]),
        "metadata": {"text": "What are the benefits of using Pinecone?"}
    }
]

print("\nUpserting sample vectors...")
index.upsert(vectors=sample_vectors)
print("Vectors upserted successfully.")

# 5. Query the index
# This is a query vector you would get from a user's input.
# It should have the same dimensions (384) as the vectors you stored.
query_vector = create_vector_values([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])

print("\nQuerying for the most similar vectors...")
query_results = index.query(
    vector=query_vector,
    top_k=2,
    include_metadata=True
)

# 6. Print the results
print("Query results:")
for match in query_results.matches:
    print(f"  ID: {match.id}")
    print(f"  Score: {match.score}")
    print(f"  Text: {match.metadata.get('text')}")

print("\nExample complete.")

Initializing Pinecone client...
Successfully connected to index 'question-answering'.

Upserting sample vectors...
Vectors upserted successfully.

Querying for the most similar vectors...
Query results:
  ID: vector1
  Score: 0.999999
  Text: What is the capital of France?
  ID: vector3
  Score: 0.885088444
  Text: What are the benefits of using Pinecone?

Example complete.


As we can see, the retiever is working fine and gets us the context passage that contains the answer to our question. Now let's use the reader to extract the exact answer from the context passage.

In [18]:
import os
from pinecone import Pinecone
from google.colab import userdata

try:
    api_key = userdata.get('PINECONE_API_KEY')
    if not api_key:
        print("Error: Pinecone API key not found in secrets.")
    else:
        pc = Pinecone(api_key=api_key)
        print("Successfully connected to Pinecone.")

        # Get and print all index names
        index_names = pc.list_indexes().names()
        print("\nIndexes visible to this client:")
        if index_names:
            for name in index_names:
                print(f"- {name}")
        else:
            print("No indexes found.")

except Exception as e:
    print(f"An error occurred: {e}")


Successfully connected to Pinecone.

Indexes visible to this client:
- support-chat-index
- question-answering


In [19]:
import os
import pinecone
import json
import time # We'll need this library for the fix
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
from requests import post
from google.colab import userdata

# --- Configuration ---
# Your API keys must be set as secrets in this Colab notebook.
# Go to the key icon on the left-hand sidebar, add a new secret, and
# name them PINECONE_API_KEY and GEMINI_API_KEY.
pinecone_api_key = userdata.get('PINECONE_API_KEY')
gemini_api_key = userdata.get('GEMINI_API_KEY')

# Check if the API keys have been set.
if not pinecone_api_key or not gemini_api_key:
    print("Error: API keys not found.")
    print("Please set your PINECONE_API_KEY and GEMINI_API_KEY as secrets in this Colab notebook.")
    exit()

# Initialize Pinecone client
try:
    # Use the API key from environment variables
    pc = Pinecone(api_key=pinecone_api_key)
except Exception as e:
    print(f"Error initializing Pinecone: {e}")
    exit()

# We will use a small, efficient embedding model from Hugging Face
model_name = 'all-MiniLM-L6-v2'
model = SentenceTransformer(model_name)
# NOTE: This model creates vectors with a dimension of 384. Your Pinecone index
# MUST be created with this exact dimension.

# Define your Pinecone index name. Make sure this matches the name of your index in the Pinecone console.
# If you have an existing index, replace 'my-first-index' with its exact name.
index_name = "question-answering"

# NOTE: The check below now works because we've fixed the API key issue.
if index_name not in pc.list_indexes().names():
    print(f"Index '{index_name}' does not exist. Please create it first.")
    print("Dimensions should be 384 for the 'all-MiniLM-L6-v2' model.")
    exit()

# Connect to the existing index
index = pc.Index(index_name)

# --- Step 1: Prepare and Embed Text Data ---
# These are the documents the LLM will "read" from.
documents = [
    "The sun is the star at the center of the Solar System. It is by far the most important source of energy for life on Earth.",
    "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System. It is often referred to as the 'Red Planet' due to the iron oxide prevalent on its surface.",
    "The moon is Earth's only natural satellite. It is the fifth-largest satellite in the Solar System.",
    "Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass more than two and a half times that of all the other planets in the Solar System combined."
]

print("Embedding and upserting data...")
embeddings = model.encode(documents).tolist()
vectors_to_upsert = [
    (str(i), embeddings[i], {"text": documents[i]}) for i in range(len(documents))
]
index.upsert(vectors=vectors_to_upsert)
print("Data upserted successfully.")
# Add a short delay to give the index time to process the data
print("Waiting for data to be indexed...")
time.sleep(5) # Pause for 5 seconds

# --- Step 2: Implement the Reader Function ---
def extract_answer(question: str, context: str) -> str:
    """
    Uses a large language model to extract a precise answer from a context.

    Args:
        question: The user's question.
        context: The text retrieved by the retriever to use as a basis for the answer.

    Returns:
        A string containing the extracted answer.
    """
    if not gemini_api_key:
        return "Error: Gemini API key not found. Please set it as a secret."

    url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-05-20:generateContent?key={gemini_api_key}"

    # Craft the prompt to guide the LLM's behavior
    prompt = f"""
    Answer the following question only based on the provided context. If the answer is not found in the context, respond with "Answer not found.".

    Question: {question}

    Context: {context}

    Answer:"""

    payload = {
        "contents": [{"parts": [{"text": prompt}]}],
        "tools": [{"google_search": {}}]
    }

    try:
        response = post(url, headers={"Content-Type": "application/json"}, data=json.dumps(payload))
        response.raise_for_status() # Raise an exception for bad status codes

        result = response.json()
        candidate = result.get("candidates", [{}])[0]
        text = candidate.get("content", {}).get("parts", [{}])[0].get("text", "Answer not found.")

        return text.strip()

    except Exception as e:
        print(f"Error calling Gemini API: {e}")
        return "Answer not found due to an API error."

# --- Step 3: Perform RAG with the Reader ---
# Define a question and use the retriever to find the most relevant document
question = "What planet is known as the 'Red Planet'?"
query_vector = model.encode(question).tolist()
search_results = index.query(
    vector=query_vector,
    top_k=1, # We only need the single most relevant document for this example
    include_metadata=True
)

if search_results.matches:
    # Get the top result and its context. This is the crucial step.
    context = search_results.matches[0].metadata["text"]

    print("\nRetrieved Context (from Pinecone):")
    print(context)

    # Now, the 'context' variable is defined and can be used.
    print("\nUsing the reader to extract the exact answer...")
    final_answer = extract_answer(question, context)

    print("\nFinal Answer:")
    print(final_answer)
else:
    print("No relevant context found.")


Embedding and upserting data...
Data upserted successfully.
Waiting for data to be indexed...

Retrieved Context (from Pinecone):
Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System. It is often referred to as the 'Red Planet' due to the iron oxide prevalent on its surface.

Using the reader to extract the exact answer...

Final Answer:
Mars is known as the 'Red Planet'.


In [20]:
extract_answer(question, context)

"Mars is known as the 'Red Planet'."

The reader model predicted with 99% accuracy the correct answer *691,000 bbl/d* as seen from the context passage. Let's run few more queries.

In [1]:
import os
import pinecone
import json
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
from requests import post
from google.colab import userdata

# --- Configuration ---
# Use Colab secrets to get API keys
pinecone_api_key = userdata.get('PINECONE_API_KEY')
gemini_api_key = userdata.get('GEMINI_API_KEY')

if not pinecone_api_key or not gemini_api_key:
    print("Error: API keys not found in Colab secrets.")
    exit()

# Initialize Pinecone client and check for a valid index
index = None # Initialize index to None to prevent NameError
try:
    pc = Pinecone(api_key=pinecone_api_key)

    # We will use a small, efficient embedding model from Hugging Face
    model_name = 'all-MiniLM-L6-v2'
    model = SentenceTransformer(model_name)

    # Define your Pinecone index name.
    index_name = "question-answering" # Make sure this matches the name of your index.

    # Corrected method call: pc.list_indexes().names() is the correct way
    if index_name not in pc.list_indexes().names():
        print(f"Index '{index_name}' does not exist. Please create it first.")
        print("Dimensions should be 384 for the 'all-MiniLM-L6-v2' model.")
        exit()

    # Connect to the existing index
    index = pc.Index(index_name)

except Exception as e:
    print(f"Error initializing Pinecone or connecting to the index: {e}")
    print("Please ensure your Pinecone API key and index name are correct, and that the index exists.")


# This check prevents the rest of the script from running if the index is not defined
if index is None:
    exit()

# --- Step 2: Prepare and Embed Text Data ---
# These are the documents the LLM will "read" from.
documents = [
    "The sun is the star at the center of the Solar System. It is by far the most important source of energy for life on Earth.",
    "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System. It is often referred to as the 'Red Planet' due to the iron oxide prevalent on its surface.",
    "The moon is Earth's only natural satellite. It is the fifth-largest satellite in the Solar System.",
    "Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass more than two and a half times that of all the other planets in the Solar System combined."
]

print("Embedding and upserting data...")
embeddings = model.encode(documents).tolist()
vectors_to_upsert = [
    (str(i), embeddings[i], {"text": documents[i]}) for i in range(len(documents))
]
index.upsert(vectors=vectors_to_upsert)
print("Data upserted successfully.")

# --- Step 3: Implement the Reader Function ---
def extract_answer(question: str, context: str) -> str:
    """
    Uses a large language model to extract a precise answer from a context.
    """
    if not gemini_api_key:
        return "Please provide a valid Gemini API key to use the reader."

    url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-05-20:generateContent?key={gemini_api_key}"

    prompt = f"""
    Answer the following question only based on the provided context. If the answer is not found in the context, respond with "Answer not found.".

    Question: {question}

    Context: {context}

    Answer:"""

    payload = {
        "contents": [{"parts": [{"text": prompt}]}],
        "tools": [{"google_search": {}}]
    }

    try:
        response = post(url, headers={"Content-Type": "application/json"}, data=json.dumps(payload))
        response.raise_for_status()

        result = response.json()
        candidate = result.get("candidates", [{}])[0]
        text = candidate.get("content", {}).get("parts", [{}])[0].get("text", "Answer not found.")

        return text.strip()

    except Exception as e:
        print(f"Error calling Gemini API: {e}")
        return "Answer not found due to an API error."

# --- Step 4: Perform RAG with the Reader ---
query_text = "What planet is known as the 'Red Planet'?"

print(f"\nQuerying with '{query_text}'...")

# Use the retriever to find the most relevant document
query_vector = model.encode(query_text).tolist()
search_results = index.query(
    vector=query_vector,
    top_k=1, # We only need the single most relevant document for this example
    include_metadata=True
)

if search_results.matches:
    # Get the top result and its context
    top_match = search_results.matches[0]
    retrieved_context = top_match.metadata["text"]

    print("\nRetrieved Context (from Pinecone):")
    print(retrieved_context)

    # Use the reader to extract the final answer
    print("\nUsing the reader to extract the exact answer...")
    final_answer = extract_answer(query_text, retrieved_context)

    print("\nFinal Answer:")
    print(final_answer)
else:
    print("No relevant context found.")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Embedding and upserting data...
Data upserted successfully.

Querying with 'What planet is known as the 'Red Planet'?'...

Retrieved Context (from Pinecone):
Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System. It is often referred to as the 'Red Planet' due to the iron oxide prevalent on its surface.

Using the reader to extract the exact answer...

Final Answer:
Mars is known as the 'Red Planet'.


In [3]:
# --- Step 4: Perform RAG with the Reader ---
query_text = "What are the first names of the men that invented youtube?"

print(f"\nQuerying with '{query_text}'...")

# Use the retriever to find the most relevant document
query_vector = model.encode(query_text).tolist()
search_results = index.query(
    vector=query_vector,
    top_k=1, # We only need the single most relevant document for this example
    include_metadata=True
)

if search_results.matches:
    # Get the top result and its context
    top_match = search_results.matches[0]
    retrieved_context = top_match.metadata["text"]

    print("\nRetrieved Context (from Pinecone):")
    print(retrieved_context)

    # Use the reader to extract the final answer
    print("\nUsing the reader to extract the exact answer...")
    final_answer = extract_answer(query_text, retrieved_context)

    print("\nFinal Answer:")
    print(final_answer)
else:
    print("No relevant context found.")


Querying with 'What are the first names of the men that invented youtube?'...

Retrieved Context (from Pinecone):
What is the capital of France?

Using the reader to extract the exact answer...

Final Answer:
Answer not found.


In [5]:
# --- Step 4: Perform RAG with the Reader ---
query_text = "What is Albert Einstein famous for?"

print(f"\nQuerying with '{query_text}'...")

# Use the retriever to find the most relevant document
query_vector = model.encode(query_text).tolist()
search_results = index.query(
    vector=query_vector,
    top_k=1, # We only need the single most relevant document for this example
    include_metadata=True
)

if search_results.matches:
    # Get the top result and its context
    top_match = search_results.matches[0]
    retrieved_context = top_match.metadata["text"]

    print("\nRetrieved Context (from Pinecone):")
    print(retrieved_context)

    # Use the reader to extract the final answer
    print("\nUsing the reader to extract the exact answer...")
    final_answer = extract_answer(query_text, retrieved_context)

    print("\nFinal Answer:")
    print(final_answer)
else:
    print("No relevant context found.")


Querying with 'What is Albert Einstein famous for?'...

Retrieved Context (from Pinecone):
The sun is the star at the center of the Solar System. It is by far the most important source of energy for life on Earth.

Using the reader to extract the exact answer...

Final Answer:
Answer not found.


Let's run another question. This time for top 3 context passages from the retriever.

In [7]:
# --- Step 4: Perform RAG with the Reader ---
query_text = "Who was the first person to step foot on the moon?"

print(f"\nQuerying with '{query_text}'...")

# Use the retriever to find the most relevant documents (top_k=3)
query_vector = model.encode(query_text).tolist()
search_results = index.query(
    vector=query_vector,
    top_k=3, # Now retrieving the top 3 most similar documents
    include_metadata=True
)

if search_results.matches:
    # Get the top results and combine their contexts
    retrieved_contexts = [match.metadata["text"] for match in search_results.matches]
    full_context = "\n".join(retrieved_contexts)

    print("\nRetrieved Contexts (from Pinecone):")
    print(full_context)

    # Use the reader to extract the final answer
    print("\nUsing the reader to extract the exact answer...")
    final_answer = extract_answer(query_text, full_context)

    print("\nFinal Answer:")
    print(final_answer)
else:
    print("No relevant context found.")


Querying with 'Who was the first person to step foot on the moon?'...

Retrieved Contexts (from Pinecone):
The moon is Earth's only natural satellite. It is the fifth-largest satellite in the Solar System.
The sun is the star at the center of the Solar System. It is by far the most important source of energy for life on Earth.
Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass more than two and a half times that of all the other planets in the Solar System combined.

Using the reader to extract the exact answer...

Final Answer:
Answer not found.


The result looks pretty good.

In [5]:
# --- Step 5: Clean Up ---
# This command will delete the Pinecone index to free up your resources.
try:
    pc.delete_index(index_name)
    print(f"\nIndex '{index_name}' deleted successfully.")
except Exception as e:
    print(f"\nError deleting index '{index_name}': {e}")



Error deleting index 'question-answering': (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'content-type': 'text/plain; charset=utf-8', 'access-control-allow-origin': '*', 'vary': 'origin,access-control-request-method,access-control-request-headers', 'access-control-expose-headers': '*', 'x-pinecone-api-version': '2025-04', 'x-cloud-trace-context': 'dda5b690ef884e1aec24fe469125be09', 'date': 'Tue, 02 Sep 2025 16:47:28 GMT', 'server': 'Google Frontend', 'Content-Length': '93', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
HTTP response body: {"error":{"code":"NOT_FOUND","message":"Resource question-answering not found"},"status":404}



### Add a few more questions. What did you observe?

In [4]:
import os
import pinecone
import json
import time
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
from requests import post
from google.colab import userdata

# --- Configuration ---
# Use Colab secrets to get API keys
pinecone_api_key = userdata.get('PINECONE_API_KEY')
gemini_api_key = userdata.get('GEMINI_API_KEY')

if not pinecone_api_key or not gemini_api_key:
    print("Error: API keys not found in Colab secrets.")
    exit()

# Initialize Pinecone client and check for a valid index
index = None # Initialize index to None to prevent NameError
try:
    pc = Pinecone(api_key=pinecone_api_key)

    # We will use a small, efficient embedding model from Hugging Face
    model_name = 'all-MiniLM-L6-v2'
    model = SentenceTransformer(model_name)
    dimension = model.get_sentence_embedding_dimension()

    # Define your Pinecone index name.
    index_name = "question-answering" # Make sure this matches the name of your index.

    # Check if index exists, and create it if it doesn't
    if index_name not in pc.list_indexes().names():
        print(f"Index '{index_name}' does not exist. Creating it now...")
        pc.create_index(
            name=index_name,
            dimension=dimension,
            metric="cosine",
            spec=ServerlessSpec(cloud="aws", region="us-east-1")
        )
        print("Waiting for index to be ready...")
        while not pc.describe_index(index_name).status['ready']:
            time.sleep(1)
        print("Index is ready.")

    # Connect to the existing or newly created index
    index = pc.Index(index_name)

except Exception as e:
    print(f"Error initializing Pinecone or connecting to the index: {e}")
    print("Please ensure your Pinecone API key and index name are correct, and that the index exists.")
    # Exit here if Pinecone fails to initialize
    exit()


# --- Step 2: Prepare and Embed Text Data ---
# These are the documents the LLM will "read" from.
documents = [
    "The sun is the star at the center of the Solar System. It is by far the most important source of energy for life on Earth.",
    "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System. It is often referred to as the 'Red Planet' due to the iron oxide prevalent on its surface.",
    "The moon is Earth's only natural satellite. It is the fifth-largest satellite in the Solar System.",
    "Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass more than two and a half times that of all the other planets in the Solar System combined."
]

print("Embedding and upserting data...")
embeddings = model.encode(documents).tolist()
vectors_to_upsert = [
    (str(i), embeddings[i], {"text": documents[i]}) for i in range(len(documents))
]
index.upsert(vectors=vectors_to_upsert)
print("Data upserted successfully.")

# --- Step 3: Implement the Reader Function ---
def extract_answer(question: str, context: str) -> str:
    """
    Uses a large language model to extract a precise answer from a context.
    """
    if not gemini_api_key:
        return "Please provide a valid Gemini API key to use the reader."

    url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-05-20:generateContent?key={gemini_api_key}"

    prompt = f"""
    Answer the following question only based on the provided context. If the answer is not found in the context, respond with "Answer not found.".

    Question: {question}

    Context: {context}

    Answer:"""

    payload = {
        "contents": [{"parts": [{"text": prompt}]}],
        "tools": [{"google_search": {}}]
    }

    try:
        response = post(url, headers={"Content-Type": "application/json"}, data=json.dumps(payload))
        response.raise_for_status()

        result = response.json()
        candidate = result.get("candidates", [{}])[0]
        text = candidate.get("content", {}).get("parts", [{}])[0].get("text", "Answer not found.")

        return text.strip()

    except Exception as e:
        print(f"Error calling Gemini API: {e}")
        return "Answer not found due to an API error."

# --- Step 4: Perform RAG with the Reader ---
# Define a list of questions to run
questions = [
    "What is the star at the center of the solar system?",
    "Which is the fifth-largest satellite in the Solar System?",
    "What is the capital of Russia?",
    "What is Jupiter famous for?",
    "What color is Mars?",
    "What are the phases of the moon?"
]

for question in questions:
    print(f"\n--- Processing new question: '{question}' ---")

    # Use the retriever to find the most relevant documents (top_k=3)
    query_vector = model.encode(question).tolist()
    search_results = index.query(
        vector=query_vector,
        top_k=3, # Now retrieving the top 3 most similar documents
        include_metadata=True
    )

    if search_results.matches:
        # Get the top results and combine their contexts
        retrieved_contexts = [match.metadata["text"] for match in search_results.matches]
        full_context = "\n".join(retrieved_contexts)

        print("\nRetrieved Contexts (from Pinecone):")
        print(full_context)

        # Use the reader to extract the final answer
        print("\nUsing the reader to extract the exact answer...")
        final_answer = extract_answer(question, full_context)

        print("\nFinal Answer:")
        print(final_answer)
    else:
        print("No relevant context found.")


# --- Step 5: Clean Up ---
# This command will delete the Pinecone index to free up your resources.
try:
    pc.delete_index(index_name)
    print(f"\nIndex '{index_name}' deleted successfully.")
except Exception as e:
    print(f"\nError deleting index '{index_name}': {e}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Index 'question-answering' does not exist. Creating it now...
Waiting for index to be ready...
Index is ready.
Embedding and upserting data...
Data upserted successfully.

--- Processing new question: 'What is the star at the center of the solar system?' ---
No relevant context found.

--- Processing new question: 'Which is the fifth-largest satellite in the Solar System?' ---
No relevant context found.

--- Processing new question: 'What is the capital of Russia?' ---
No relevant context found.

--- Processing new question: 'What is Jupiter famous for?' ---
No relevant context found.

--- Processing new question: 'What color is Mars?' ---
No relevant context found.

--- Processing new question: 'What are the phases of the moon?' ---
No relevant context found.

Index 'question-answering' deleted successfully.
