In [1]:
import PyPDF2

def extract_text_from_pdf(pdf_path):
    """
    Extracts all text from a given PDF file.

    Args:
        pdf_path (str): The path to the PDF file.

    Returns:
        str: A single string containing all extracted text from the PDF.
    """
    text = ""
    try:
        with open(pdf_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            for page_num in range(len(reader.pages)):
                page = reader.pages[page_num]
                text += page.extract_text()
    except FileNotFoundError:
        print(f"Error: The file at {pdf_path} was not found.")
        return None
    except Exception as e:
        print(f"An error occurred while processing {pdf_path}: {e}")
        return None
    return text

# --- Implementation Instructions ---

# 1. Install PyPDF2:
#    If you don't have PyPDF2 installed, open your terminal or command prompt and run:
#    pip install PyPDF2

# 2. Save the code:
#    Save the above Python code as a .py file (e.g., 'pdf_processor.py').

# 3. Place your PDF files:
#    Make sure 'AmazonElasticComputeCloud-DeveloperGuide-ec2-dg.pdf'
#    and 'Amazon_SageMaker_FAQs.pdf' are in the same directory as your Python script,
#    or provide the full path to your PDF files.

# 4. Use the function to extract text:
#    You can call the function like this:
#    ec2_pdf_path = "AmazonElasticComputeCloud-DeveloperGuide-ec2-dg.pdf"
#    sagemaker_pdf_path = "Amazon_SageMaker_FAQs.pdf"
#
#    ec2_text = extract_text_from_pdf(ec2_pdf_path)
#    sagemaker_text = extract_text_from_pdf(sagemaker_pdf_path)
#
#    if ec2_text:
#        print(f"Extracted text from {ec2_pdf_path} (first 500 chars):\n{ec2_text[:500]}...")
#    if sagemaker_text:
#        print(f"\nExtracted text from {sagemaker_pdf_path} (first 500 chars):\n{sagemaker_text[:500]}...")

In [2]:
ec2_pdf_path = "./AmazonElasticComputeCloud-DeveloperGuide-ec2-dg.pdf"
sagemaker_pdf_path = "./Amazon_SageMaker_FAQs.pdf"

ec2_text = extract_text_from_pdf(ec2_pdf_path)
sagemaker_text = extract_text_from_pdf(sagemaker_pdf_path)

if ec2_text:
    print(f"Extracted text from {ec2_pdf_path} (first 500 chars):\n{ec2_text[:500]}...")
if sagemaker_text:
    print(f"\nExtracted text from {sagemaker_pdf_path} (first 500 chars):\n{sagemaker_text[:500]}...")

Extracted text from ./AmazonElasticComputeCloud-DeveloperGuide-ec2-dg.pdf (first 500 chars):
DeveloperGuideAmazonElasticComputeCloud
Copyright©2025AmazonWebServices,Inc.and/oritsaﬃliates.Allrightsreserved.AmazonElasticComputeCloudDeveloperGuideAmazonElasticComputeCloud:DeveloperGuideCopyright©2025AmazonWebServices,Inc.and/oritsaﬃliates.Allrightsreserved.Amazon'strademarksandtradedressmaynotbeusedinconnectionwithanyproductorservicethatisnotAmazon's,inanymannerthatislikelytocauseconfusionamongcustomers,orinanymannerthatdisparagesordiscreditsAmazon.AllothertrademarksnotownedbyAmazonarethep...

Extracted text from ./Amazon_SageMaker_FAQs.pdf (first 500 chars):
What is Amazon SageMaker?,"Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.""In which Regions is Amazon SageMaker available?","For a list of the supported Amazon SageMaker AWS Regions, please 

Once you have the full text from your PDFs, the next crucial step for a RAG system is chunking. Chunking involves breaking down the large text into smaller, manageable pieces. This is important because:

Context Window Limits: Language models often have a limit on how much text they can process at once (their context window). Chunks ensure we stay within these limits.
Relevance: Smaller chunks help in retrieving more relevant information. If a query matches a small, precise chunk, it's often better than retrieving a huge document where the relevant information might be diluted.
Efficiency: Processing and embedding smaller chunks is generally more efficient.
Good Chunking Strategy: Recursive Character Text Splitter
A highly recommended and robust chunking strategy is using a Recursive Character Text Splitter. This method attempts to split text using a list of characters, trying them in order until the chunks are small enough. This helps to keep sentences and paragraphs together as much as possible, which preserves semantic meaning.

The best Python library for this is langchain-text-splitters (part of the LangChain ecosystem, which is excellent for building RAG systems).

Overview of the Solution
We will:

Install langchain-text-splitters: If you don't have it already.
Define a RecursiveCharacterTextSplitter: We'll specify parameters like chunk_size (the maximum size of each chunk) and chunk_overlap (how much overlap there should be between consecutive chunks to maintain context).
Apply the splitter to your extracted text: This will generate a list of text chunks.

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

def chunk_text(text, chunk_size=1000, chunk_overlap=200):
    """
    Splits a given text into smaller, overlapping chunks using RecursiveCharacterTextSplitter.

    Args:
        text (str): The input text to be chunked.
        chunk_size (int): The maximum number of characters in each chunk.
        chunk_overlap (int): The number of characters to overlap between consecutive chunks.

    Returns:
        list[str]: A list of text chunks.
    """
    if not isinstance(text, str) or not text:
        print("Error: Input text must be a non-empty string.")
        return []

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,  # Use character length
        add_start_index=True, # Add start index to metadata (useful for debugging/tracking)
    )
    chunks = text_splitter.split_text(text)
    return chunks

# --- Implementation Instructions ---

# 1. Install the necessary library:
#    If you haven't already, open your terminal or anaconda prompt and run:
#    pip install langchain-text-splitters

# 2. Ensure you have your extracted text:
#    Before running this, make sure you have executed the previous steps and have
#    your PDF text extracted into variables like 'ec2_text' and 'sagemaker_text'.
#    For example:
#    # ec2_text = extract_text_from_pdf("AmazonElasticComputeCloud-DeveloperGuide-ec2-dg.pdf")
#    # sagemaker_text = extract_text_from_pdf("Amazon_SageMaker_FAQs.pdf")

# 3. Apply the chunking function:
#    Now, you can call the 'chunk_text' function with your extracted PDF texts:

#    # Example usage (assuming ec2_text and sagemaker_text are already defined from previous step):
#    # if ec2_text: # Check if extraction was successful
#    #     ec2_chunks = chunk_text(ec2_text, chunk_size=1000, chunk_overlap=200)
#    #     print(f"\nNumber of chunks from EC2 PDF: {len(ec2_chunks)}")
#    #     print(f"First 2 EC2 chunks:\n{ec2_chunks[0]}\n---\n{ec2_chunks[1]}...")
#
#    # if sagemaker_text: # Check if extraction was successful
#    #     sagemaker_chunks = chunk_text(sagemaker_text, chunk_size=1000, chunk_overlap=200)
#    #     print(f"\nNumber of chunks from SageMaker PDF: {len(sagemaker_chunks)}")
#    #     print(f"First 2 SageMaker chunks:\n{sagemaker_chunks[0]}\n---\n{sagemaker_chunks[1]}...")

In [4]:
ec2_chunks = chunk_text(ec2_text, chunk_size=1000, chunk_overlap=200)
sagemaker_chunks = chunk_text(sagemaker_text, chunk_size=1000, chunk_overlap=200)

In [5]:
from pprint import pprint
pprint(type(ec2_chunks))
pprint(len(ec2_chunks))
print("----------------------------------------------")
pprint(type(sagemaker_chunks))
pprint(len(sagemaker_chunks))

<class 'list'>
1662
----------------------------------------------
<class 'list'>
97


In [6]:
# Install necessary libraries (if you haven't already)
# !pip install sentence-transformers qdrant-client

from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
import uuid # To generate unique IDs for our chunks

# --- 1. Load the Embedding Model ---
# This will download the model the first time it's run
print("Loading Sentence Transformer model...")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("Model loaded.")

def get_embedding(text):
    """
    Generates an embedding for a given text using the loaded model.

    Args:
        text (str): The input text.

    Returns:
        list[float]: A list representing the embedding vector.
    """
    # Encode the text to get the embedding vector
    return embedding_model.encode(text).tolist()

# --- 2. Start Qdrant (External Step) ---
# Before running the Python code below, you need to start Qdrant.
# Open your terminal or command prompt and run the following Docker command:
# docker run -p 6333:6333 -p 6334:6334 \
#     -v $(pwd)/qdrant_storage:/qdrant/storage \
#     qdrant/qdrant

# Explanation of the Docker command:
# -p 6333:6333: Maps the default gRPC port (6333) to your local machine.
# -p 6334:6334: Maps the REST API port (6334) to your local machine.
# -v $(pwd)/qdrant_storage:/qdrant/storage: Creates a persistent volume.
#   This means your data will be saved in a 'qdrant_storage' folder in your current directory,
#   so it won't be lost if you stop and restart the Docker container.
# qdrant/qdrant: The official Qdrant Docker image.

# --- 3. Initialize Qdrant Client ---
print("Initializing Qdrant client...")
qdrant_client = QdrantClient(host="localhost", port=6333) # Connect to the local Qdrant instance
print(qdrant_client)
print(qdrant_client.info()) # Print Qdrant server info to confirm connection
print("Qdrant client initialized.")

# ... (embedding_model and qdrant_client setup as before) ...

# Assuming ec2_chunks and sagemaker_chunks are lists of strings from your chunking step
# Make sure these variables are populated from the previous step in your notebook.

# --- 4. Define Collection Name and Vector Size ---
collection_name = "my_rag_documents"
# The vector size must match the output dimension of your embedding model.
# 'all-MiniLM-L6-v2' produces 384-dimensional embeddings.
vector_size = embedding_model.get_sentence_embedding_dimension() # Gets dimension from model

print(f"Embedding vector size: {vector_size}") # You should see this line in your output

# --- 5. Create Qdrant Collection (This is the critical part!) ---
try:
    # Check if the collection already exists
    qdrant_client.get_collection(collection_name=collection_name)
    print(f"Collection '{collection_name}' already exists.") # You would see this if it already existed
except Exception:
    # If not, create it (or recreate if it existed but was cleared)
    print(f"Creating collection '{collection_name}'...") # You MUST see this line in your output if it's being created
    qdrant_client.recreate_collection(
        collection_name=collection_name,
        vectors_config=models.VectorParams(size=vector_size, distance=models.Distance.COSINE),
        # Cosine distance is standard for many embedding models like Sentence Transformers
    )
    print(f"Collection '{collection_name}' created.") # You MUST see this line if creation was successful

# ... (Collection creation as before) ...

points = []
print("Generating embeddings and preparing points for Qdrant...")

# Process EC2 chunks
print("Processing EC2 chunks...")
for i, chunk in enumerate(ec2_chunks):
    if not chunk.strip(): # Skip empty chunks
        continue
    
    embedding = get_embedding(chunk)
    point_id = str(uuid.uuid4()) # Generate a unique ID for each point

    payload = {
        "text": chunk,
        "source": "AmazonElasticComputeCloud-DeveloperGuide-ec2-dg.pdf", # Actual source filename
        "chunk_index": i # Optional: helps identify chunk order within its original document
    }

    points.append(
        models.PointStruct(
            id=point_id,
            vector=embedding,
            payload=payload
        )
    )

# Process SageMaker chunks
print("Processing SageMaker chunks...")
for i, chunk in enumerate(sagemaker_chunks):
    if not chunk.strip(): # Skip empty chunks
        continue
    
    embedding = get_embedding(chunk)
    point_id = str(uuid.uuid4())

    payload = {
        "text": chunk,
        "source": "Amazon_SageMaker_FAQs.pdf", # Actual source filename
        "chunk_index": i
    }

    points.append(
        models.PointStruct(
            id=point_id,
            vector=embedding,
            payload=payload
        )
    )

if points:
    qdrant_client.upsert(
        collection_name=collection_name,
        wait=True,
        points=points
    )
    print(f"Successfully uploaded {len(points)} chunks to Qdrant collection '{collection_name}'.")
else:
    print("No chunks to upload or all chunks were empty.")

  from .autonotebook import tqdm as notebook_tqdm


Loading Sentence Transformer model...
Model loaded.
Initializing Qdrant client...
<qdrant_client.qdrant_client.QdrantClient object at 0x323e5f230>
title='qdrant - vector search engine' version='1.14.1' commit='530430fac2a3ca872504f276d2c91a5c91f43fa0'
Qdrant client initialized.
Embedding vector size: 384
Collection 'my_rag_documents' already exists.
Generating embeddings and preparing points for Qdrant...
Processing EC2 chunks...
Processing SageMaker chunks...
Successfully uploaded 1759 chunks to Qdrant collection 'my_rag_documents'.


In [7]:
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

# --- 1. Load the Embedding Model (Same as used for indexing!) ---
# Ensure this matches the model used to embed your document chunks.
print("Loading Sentence Transformer model for retrieval...")
retrieval_embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("Retrieval model loaded.")

def get_query_embedding(query_text):
    """
    Generates an embedding for a given query text using the loaded model.

    Args:
        query_text (str): The user's input query.

    Returns:
        list[float]: A list representing the query's embedding vector.
    """
    return retrieval_embedding_model.encode(query_text).tolist()

# --- 2. Initialize Qdrant Client (Same as used for indexing!) ---
print("Initializing Qdrant client for retrieval...")
retrieval_qdrant_client = QdrantClient(host="localhost", port=6333) # Connect to the local Qdrant instance
print("Qdrant client initialized for retrieval.")

# --- 3. Define Collection Name (Same as used for indexing!) ---
collection_name = "my_rag_documents"

def retrieve_context(query, top_k=3):
    """
    Retrieves the most relevant text chunks from Qdrant for a given query.

    Args:
        query (str): The user's input question.
        top_k (int): The number of top similar chunks to retrieve.

    Returns:
        list[dict]: A list of dictionaries, each containing the 'text' of a retrieved chunk
                    and potentially other payload information like 'source'.
    """
    print(f"\nRetrieving context for query: '{query}'")
    query_embedding = get_query_embedding(query)

    try:
        search_result = retrieval_qdrant_client.search(
            collection_name=collection_name,
            query_vector=query_embedding,
            limit=top_k,
            # We want to retrieve the actual text content (payload)
            # You can select specific fields if you don't need the whole payload
            # or set to False if you only need IDs/scores.
            with_payload=True
        )
        print(f"Retrieved {len(search_result)} chunks from Qdrant.")
        print("Search results:" + str(search_result))
        print("+++++++++++++++++++++++++++++++++++++")

        context_chunks = []
        for hit in search_result:
            print(hit)
            print("-------------------------------")
            print(f"  Retrieved chunk (score: {hit.score:.4f}, source: {hit.payload.get('source', 'N/A')}):")
            # Qdrant's payload is a dictionary, so access 'text'
            chunk_text = hit.payload.get('text', 'No text found in payload')
            print(f"    {chunk_text[:150]}...") # Print first 150 chars for preview
            
            context_chunks.append(hit.payload) # Append the entire payload for flexibility
        
        return context_chunks

    except Exception as e:
        print(f"An error occurred during Qdrant search: {e}")
        return []

# --- Implementation Instructions ---

# 1. Ensure Qdrant is still running:
#    The Docker container you started in the previous step needs to be active.
#    (i.e., `docker run -p 6333:6333 ... qdrant/qdrant` is still running in a terminal)

# 2. Run the code:
#    Copy this code into a new cell in your Python notebook and execute it.
#    This will define the `get_query_embedding` and `retrieve_context` functions.

# 3. Test the retrieval:
#    Now you can test it with some sample queries:

#    query1 = "What is Amazon EC2?"
#    retrieved_data1 = retrieve_context(query1, top_k=2)
#    # The 'retrieved_data1' list will contain dictionaries with 'text' and 'source'

#    query2 = "How does SageMaker help with machine learning?"
#    retrieved_data2 = retrieve_context(query2, top_k=2)

#    # You can now see the retrieved text and potentially the source from where it came.
#    # Example of accessing retrieved text:
#    # if retrieved_data1:
#    #     print("\n--- Full retrieved text for Query 1 (first chunk): ---")
#    #     print(retrieved_data1[0]['text'])

Loading Sentence Transformer model for retrieval...
Retrieval model loaded.
Initializing Qdrant client for retrieval...
Qdrant client initialized for retrieval.


In [8]:
query1 = "What is Amazon EC2?"
retrieved_data1 = retrieve_context(query1, top_k=2)
# The 'retrieved_data1' list will contain dictionaries with 'text' and 'source'

query2 = "How does SageMaker help with machine learning?"
retrieved_data2 = retrieve_context(query2, top_k=2)


Retrieving context for query: 'What is Amazon EC2?'
Retrieved 2 chunks from Qdrant.
Search results:[ScoredPoint(id='e44b77a4-f9b5-4252-97ff-d1c9e462152c', version=0, score=0.54200673, payload={'text': 'NoteThere\'smoreonGitHub.FindthecompleteexampleandlearnhowtosetupandrunintheAWSCodeExamplesRepository.#AssociatesanElasticIPaddresswithanAmazonElasticComputeCloud#(AmazonEC2)instance.##Prerequisites:##-TheallocationIDcorrespondingtotheElasticIPaddress.#-TheAmazonEC2instance.##@paramec2_client[Aws::EC2::Client]AninitializedEC2client.#@paramallocation_id[String]TheIDoftheallocationcorrespondingto#theElasticIPaddress.#@paraminstance_id[String]TheIDoftheinstance.#@return[String]TheassocationIDcorrespondingtotheassociationofthe#ElasticIPaddresstotheinstance.#@example#putsallocate_elastic_ip_address(#Aws::EC2::Client.new(region:\'us-west-2\'),#\'eipalloc-04452e528a66279EX\',#\'i-033c48ef067af3dEX\')defassociate_elastic_ip_address_with_instance(ec2_client,allocation_id,instance_id)response=ec2

  search_result = retrieval_qdrant_client.search(


In [9]:
from pprint import pprint
pprint(retrieved_data1)
print("----------------------------------------")
pprint(retrieved_data2)

[{'chunk_index': 474,
  'source': 'AmazonElasticComputeCloud-DeveloperGuide-ec2-dg.pdf',
  'text': 'NoteThere\'smoreonGitHub.FindthecompleteexampleandlearnhowtosetupandrunintheAWSCodeExamplesRepository.#AssociatesanElasticIPaddresswithanAmazonElasticComputeCloud#(AmazonEC2)instance.##Prerequisites:##-TheallocationIDcorrespondingtotheElasticIPaddress.#-TheAmazonEC2instance.##@paramec2_client[Aws::EC2::Client]AninitializedEC2client.#@paramallocation_id[String]TheIDoftheallocationcorrespondingto#theElasticIPaddress.#@paraminstance_id[String]TheIDoftheinstance.#@return[String]TheassocationIDcorrespondingtotheassociationofthe#ElasticIPaddresstotheinstance.#@example#putsallocate_elastic_ip_address(#Aws::EC2::Client.new(region:\'us-west-2\'),#\'eipalloc-04452e528a66279EX\',#\'i-033c48ef067af3dEX\')defassociate_elastic_ip_address_with_instance(ec2_client,allocation_id,instance_id)response=ec2_client.associate_address(allocation_id:allocation_id,instance_id:instance_id)response.association_idre

In [18]:
# !pip install -q  google-generativeai # If you choose Google Gemini

# Placeholder for LLM API integration.
# Replace with actual LLM client and API key.

# --- Example using Google Gemini (Requires an API key) ---
import google.generativeai as genai
import os # To load API key from environment variables

# Configure your API key
# genai.configure(api_key=os.environ.get("GEMINI_API_KEY"))
# You'd set this environment variable in your system or use dotenv
# import dotenv; dotenv.load_dotenv()
# Or directly: genai.configure(api_key="YOUR_GEMINI_API_KEY")

# Create a GenerativeModel instance

import dotenv
dotenv.load_dotenv()
genai.configure(api_key=os.environ.get("GEMINI_API_KEY"))
gemini_model = genai.GenerativeModel('gemini-2.0-flash-thinking-exp-1219') # Or 'gemini-1.5-pro'
def generate_answer(query, context_chunks):
    """
    Generates an answer to the query using the provided context and an LLM.

    Args:
        query (str): The user's original question.
        context_chunks (list[dict]): A list of dictionaries, where each dict
                                     contains 'text' and 'source' of a retrieved chunk.

    Returns:
        str: The generated answer from the LLM.
    """
    if not context_chunks:
        return "I apologize, but I couldn't find relevant information in the documents to answer your question."

    # --- 1. Construct the Context String ---
    # Combine all retrieved chunk texts into a single string
    combined_context = "\n\n".join([chunk['text'] for chunk in context_chunks])

    # Optional: Add source information to the combined context for the LLM if desired
    # or for external tracking, though for pure answer generation, just the text is fine.
    # source_info = "\nSources:\n" + "\n".join(list(set([chunk['source'] for chunk in context_chunks])))
    # combined_context += source_info

    # --- 2. Construct the LLM Prompt ---
    # This is a critical part - clear instructions help the LLM perform better.
    prompt = f"""
    You are a helpful assistant that answers questions based on the provided context only.
    If the answer cannot be found in the context, clearly state that you cannot answer from the provided information.

    Question: {query}

    Context:
    {combined_context}

    Answer:
    """

    print("\n--- Sending to LLM ---")
    print("Prompt being sent (first 500 chars):\n", prompt[:500], "...")

    try:
        # --- 3. Send Prompt to LLM and Get Response (Gemini-specific) ---
        llm_response = gemini_model.generate_content(
            prompt,
            # Optional: Configure generation settings
            generation_config=genai.types.GenerationConfig(
                temperature=0.2, # Lower temperature for more factual, less creative answers
                max_output_tokens=1024 # Limit the length of the response
            ),
            # Optional: Configure safety settings if you want to relax/tighten them
            # safety_settings={
            #     'HARM_CATEGORY_HARASSMENT': 'BLOCK_NONE',
            #     'HARM_CATEGORY_HATE_SPEECH': 'BLOCK_NONE',
            #     'HARM_CATEGORY_SEXUALLY_EXPLICIT': 'BLOCK_NONE',
            #     'HARM_CATEGORY_DANGEROUS_CONTENT': 'BLOCK_NONE',
            # },
        )
        # Gemini's response object might have parts, or raise an error if it's blocked/empty
        # Check if the response actually contains text
        if llm_response.parts:
            return llm_response.text
        else:
            # This handles cases where response might be empty or blocked by safety settings
            print("Gemini response was empty or blocked.")
            # print(f"Prompt feedback: {response.prompt_feedback}") # Uncomment for debugging
            # print(f"Candidates: {response.candidates}") # Uncomment for debugging
            return "I'm sorry, I couldn't generate an answer. The model might have blocked the response due to safety concerns or found no suitable content."

    except genai.types.BlockedPromptException as e:
        print(f"Gemini API Error: Prompt was blocked by safety settings. Details: {e}")
        return "I'm sorry, your request could not be processed due to safety guidelines."
    except Exception as e:
        print(f"An unexpected error occurred while generating the answer with Gemini: {e}")
        return "An error occurred while trying to generate an answer."

# --- Implementation Instructions ---

# 1. Choose your LLM:
#    Decide which LLM API you want to use (e.g., Google Gemini, OpenAI).
#    If using Google Gemini:
#       - pip install -q  google-generativeai
#       - Get your API key from Google AI Studio.
#       - Uncomment and configure `genai.configure(api_key=...)` and `model = genai.GenerativeModel('gemini-pro')`.

# 2. Integrate the LLM client:
#    Replace the "Placeholder for other LLM APIs" section in the `generate_answer` function
#    with the actual code to call your chosen LLM.

# 3. Use the function with your retrieved data:

#    # Example using data from your previous steps:
#    # query1 = "What is Amazon EC2?"
#    # retrieved_data1 = retrieve_context(query1, top_k=2)
#    # final_answer1 = generate_answer(query1, retrieved_data1)
#    # print("\nFinal Answer 1:", final_answer1)
#
#    # query2 = "How does SageMaker help with machine learning?"
#    # retrieved_data2 = retrieve_context(query2, top_k=2)
#    # final_answer2 = generate_answer(query2, retrieved_data2)
#    # print("\nFinal Answer 2:", final_answer2)

## Code to query the RAG system

In [21]:
# Ensure all previous setup cells have been run in your notebook:
# - PDF text extraction
# - Text chunking (into ec2_chunks, sagemaker_chunks)
# - SentenceTransformer model loading (for embeddings)
# - Qdrant client initialization and collection creation/upload
# - Google Gemini API key configuration and model initialization
# - `retrieve_context` and `generate_answer` functions defined

def query_rag_system(user_query, top_k_chunks=3):
    """
    Queries the RAG system to find relevant information and generate an answer.

    Args:
        user_query (str): The question the user wants to ask.
        top_k_chunks (int): The number of top relevant chunks to retrieve from Qdrant.

    Returns:
        str: The generated answer from the LLM based on the retrieved context.
    """
    print(f"\n--- Processing Query: '{user_query}' ---")

    # Step 1: Retrieve relevant context from Qdrant
    print("Retrieving context from vector database...")
    retrieved_data = retrieve_context(user_query, top_k=top_k_chunks)

    if not retrieved_data:
        print("No relevant context found. Cannot generate an answer.")
        return "I couldn't find any relevant information in my documents to answer your question."

    # Step 2: Generate answer using the LLM with the retrieved context
    print("Generating answer using the Language Model...")
    final_answer = generate_answer(user_query, retrieved_data)

    return final_answer

# --- Implementation Instructions ---

# 1. Ensure ALL previous code blocks (from PDF extraction to Gemini setup)
#    have been run successfully in your notebook.
#    Specifically, the `retrieve_context` and `generate_answer` functions
#    must be defined and accessible.

# 2. Run the above `query_rag_system` function definition cell.

# 3. Now, you can query your RAG system!
#    Just call the function with your question:

#    print("--- Query 1 ---")
#    answer1 = query_rag_system("What is the main purpose of Amazon EC2?")
#    print("\nFinal Answer 1:", answer1)
#
#    print("\n--- Query 2 ---")
#    answer2 = query_rag_system("How does Amazon SageMaker simplify machine learning workflows?")
#    print("\nFinal Answer 2:", answer2)
#
#    print("\n--- Query 3 ---")
#    answer3 = query_rag_system("What is an Elastic IP address and how is it used with EC2?")
#    print("\nFinal Answer 3:", answer3)
#
#    print("\n--- Query 4 (Out of context example) ---")
#    answer4 = query_rag_system("What is the capital of France?")
#    print("\nFinal Answer 4:", answer4) # Expect this to say it cannot answer from context

In [26]:
print("--- Query 1 ---")
answer1 = query_rag_system("Explain Amazon SageMaker pricing?")
print("\nFinal Answer 1:", answer1)

--- Query 1 ---

--- Processing Query: 'Explain Amazon SageMaker pricing?' ---
Retrieving context from vector database...

Retrieving context for query: 'Explain Amazon SageMaker pricing?'


  search_result = retrieval_qdrant_client.search(


Retrieved 3 chunks from Qdrant.
Search results:[ScoredPoint(id='8b3388b0-5d91-45b6-bc4d-a3a5a833d428', version=0, score=0.729482, payload={'text': 'to or disclosure of your content. As a customer, you maintain ownership of your content, and you select which AWS services can process, store, and host your content. We do not access your content for any purpose without your consent.""How am I charged for Amazon SageMaker?","You pay for ML compute, storage, and data processing resources you use for hosting the notebook, training the model, performing predictions, and logging the outputs. Amazon SageMaker allows you to select the number and type of instance used for the hosted notebook, training, and model hosting. You pay only for what you use, as you use it; there are no minimum fees and no upfront commitments. See the\xa0Amazon SageMaker pricing page\xa0and the\xa0Amazon SageMaker Pricing calculator\xa0for details.""How can I optimize my Amazon SageMaker costs, such as detecting and stopp

In [13]:
import google.generativeai as genai
import os
import dotenv

# Ensure your API key is configured (same as before)
dotenv.load_dotenv()
genai.configure(api_key=os.environ.get("GEMINI_API_KEY"))

print("Listing available Gemini models:")
for m in genai.list_models():
    # Only print models that support the 'generateContent' method, as that's what we need
    if 'generateContent' in m.supported_generation_methods:
        print(f"  Model name: {m.name}, Supported methods: {m.supported_generation_methods}")

Listing available Gemini models:
  Model name: models/gemini-1.0-pro-vision-latest, Supported methods: ['generateContent', 'countTokens']
  Model name: models/gemini-pro-vision, Supported methods: ['generateContent', 'countTokens']
  Model name: models/gemini-1.5-pro-latest, Supported methods: ['generateContent', 'countTokens']
  Model name: models/gemini-1.5-pro-001, Supported methods: ['generateContent', 'countTokens', 'createCachedContent']
  Model name: models/gemini-1.5-pro-002, Supported methods: ['generateContent', 'countTokens', 'createCachedContent']
  Model name: models/gemini-1.5-pro, Supported methods: ['generateContent', 'countTokens']
  Model name: models/gemini-1.5-flash-latest, Supported methods: ['generateContent', 'countTokens']
  Model name: models/gemini-1.5-flash-001, Supported methods: ['generateContent', 'countTokens', 'createCachedContent']
  Model name: models/gemini-1.5-flash-001-tuning, Supported methods: ['generateContent', 'countTokens', 'createTunedModel']