# Python for AI Projects

## Introduction

**Natural Language Processing**

In this Jupyter notebook - we'll quickly setup our Python environment and get started with our Explore California NLP exercises.

### Challenge Exercises

1. Explore our `locations` NLP dataset
2. Implement keyword, TF-IDF, BM-25 and semantic search functionality
3. Setup simple Retrival-Augmented-Generation (RAG) AI workflow using a local LLM

### Getting Started

To execute each cell in this notebook - you can click on the play button on the left of each cell or hit `command/shift + enter` to run individual cells one-by-one.

In [None]:
# Initial setup steps
# ====================

# Install Python libraries
!pip install --quiet rank_bm25==0.2.2
!pip install --quiet faiss-cpu==1.11.0
!pip install --quiet ctransformers==0.2.27
!pip install --quiet dotenv==0.9.9

# Clone GitHub repo into a "data" folder
!git clone https://github.com/LinkedInLearning/applied-AI-and-machine-learning-for-data-practitioners-5932259.git data

# Need to change directory into "data" to download git lfs data objects
%cd data
!git lfs pull

# Then we need to change directory back up so all our paths are correct
%cd ..

# Turn off future warnings for cleaner output
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

# 1. Data Exploration

We'll begin by calculating the following using our `text_data` column from `locations_df` - this is the equivalent of the structured HTML data we'll extract from the webpage from the Explore California website.

* Total count of locations
* Vocabulary size and most frequent keywords
* Generate a word cloud for `text_data` and compare this with the simplified `description` field
* Create sentence embeddings and visualize clusters in 3D space to identify similar locations based off their `descriptions`

Let's first load in our `locations` datasets using Pandas and we'll get started by exploring our data.

## 1.1 Load Locations Data

In [None]:
# import pandas library
import pandas as pd

# Load in locations dataset
locations_df = pd.read_csv("data/locations.csv")

# View first few rows of dataframe
locations_df.head()

## 1.2 Locations Analysis

> How many unique locations are there?

In [None]:
# How many unique locations are there?
print(f"There are {len(locations_df)} unique locations")

> How many locations per category?

In [None]:
# How many locations per category?
locations_df["category"].value_counts()

> How many locations per region?

In [None]:
# How many locations per region?
locations_df["region"].value_counts()

## 1.3 NLP Analysis

### 1.3.1 Most Frequent Terms

> Identify the top 25 most frequent terms across all locations

In [None]:
# ---------------------------------------
# Goal: Identify the Top 25 Most Frequent Terms Across All Locations
# ---------------------------------------

from sklearn.feature_extraction.text import CountVectorizer  # Used for tokenizing and counting word frequencies

# ---------------------------------------
# Step 1: Initialize CountVectorizer
# ---------------------------------------
# We use uni-gram tokenization (single words) and automatically remove common English stopwords
# This helps us focus on meaningful content-specific terms
vectorizer = CountVectorizer(stop_words="english")

# ---------------------------------------
# Step 2: Fit the vectorizer to the text data
# ---------------------------------------
# The input is a list of raw text strings from the 'text_data' column (assumed to be pre-cleaned)
# This will tokenize each document and build a term-document matrix
X = vectorizer.fit_transform(locations_df['text_data'])

# ---------------------------------------
# Step 3: Aggregate total term frequencies across all documents
# ---------------------------------------
# Get the list of all terms (vocabulary)
terms = vectorizer.get_feature_names_out()

# Sum up the count of each term across all documents
term_counts = X.toarray().sum(axis=0)

# Create a DataFrame showing each term and its total count
word_freq = pd.DataFrame({
    'term': terms,
    'count': term_counts
})

# Sort terms by descending frequency
word_freq = word_freq.sort_values(by="count", ascending=False)

# ---------------------------------------
# Step 4: Output results
# ---------------------------------------
print(f"Total number of unique uni-gram terms: {len(word_freq)}")

# Display the top 25 most frequent terms
word_freq.head(25)


### 1.3.2 Word Cloud Visualization

We'll use our HTML `text_data` and the summary `description` data to build 2 word clouds and compare them side-by-side.

In [None]:
# ---------------------------------------
# Goal: Compare Frequent Terms in 'text_data' vs 'description' Using Side-by-Side Word Clouds
# ---------------------------------------

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# ---------------------------------------
# Step 1: Prepare text inputs
# ---------------------------------------
wordcloud_text_inputs = " ".join(locations_df['text_data'])
wordcloud_description_inputs = " ".join(locations_df['description'])

# ---------------------------------------
# Step 2: Generate WordClouds
# ---------------------------------------
wordcloud_text = WordCloud(
    width=800, height=400, background_color='white'
).generate(wordcloud_text_inputs)

wordcloud_description = WordCloud(
    width=800, height=400, background_color='white'
).generate(wordcloud_description_inputs)

# ---------------------------------------
# Step 3: Plot the WordClouds side-by-side
# ---------------------------------------
fig, axes = plt.subplots(1, 2, figsize=(18, 8))  # Create 1 row, 2 column layout

# Left: WordCloud for HTML text_data
axes[0].imshow(wordcloud_text, interpolation='bilinear')
axes[0].axis('off')
axes[0].set_title("Top Terms in HTML Data", fontsize=24)

# Right: WordCloud for description
axes[1].imshow(wordcloud_description, interpolation='bilinear')
axes[1].axis('off')
axes[1].set_title("Top Terms in Summary Description", fontsize=24)

plt.tight_layout()
plt.show()


### 1.3.2 Embedding and Visualizing Location Descriptions

In this exercise, we’re using a lightweight **Sentence Transformer** model (`all-MiniLM-L6-v2`) to turn each location’s `description` into an **embedding** — a numeric representation of its meaning.

These models are built using the **Sentence-BERT (SBERT)** architecture, which extends BERT to efficiently produce **sentence-level embeddings** that can be compared using cosine similarity. This allows us to capture semantic meaning — not just exact word overlap — in a compact vector format.

We’ll then explore the data using two unsupervised techniques:

- **KMeans clustering** groups together descriptions that are semantically similar — think of it as sorting locations by theme.
- **t-SNE** helps us reduce the 384-dimensional embeddings down to just 3 dimensions so we can visualize them in a chart.

Finally, we plot the results in a **3D interactive Plotly scatter plot**. Each point represents a location, and clusters help reveal patterns in how different descriptions relate to each other.

👉 **Tip:** You can click on cluster names in the legend to isolate them, and use your mouse or trackpad to **pan, zoom, and rotate** around the 3D space for a better view.

![](https://raw.githubusercontent.com/LinkedInLearning/applied-AI-and-machine-learning-for-data-practitioners-5932259/main/images/embedding-workflow.png)

In [None]:
# ---------------------------------------
# 3D Visualization of Location Descriptions Using Sentence Embeddings, Clustering, and t-SNE
# ---------------------------------------

# Import core libraries
from sentence_transformers import SentenceTransformer      # For embedding text into dense vector space
from sklearn.cluster import KMeans                         # For clustering the text embeddings
from sklearn.manifold import TSNE                          # For dimensionality reduction (to 3D)
import plotly.express as px                                # For interactive plotting
import plotly.io as pio                                    # For controlling plotly renderers in Colab

# Ensure that Plotly renders correctly in Google Colab
pio.renderers.default = 'colab'

# ---------------------------------------
# Step 1: Encode Descriptions into Embeddings
# ---------------------------------------
# We use a pre-trained SentenceTransformer model to convert free-text location descriptions
# into dense semantic vectors that capture meaning
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = embedding_model.encode(locations_df['description'].tolist())

# ---------------------------------------
# Step 2: Apply KMeans Clustering on the Embeddings
# ---------------------------------------
# We assign each location to one of 5 semantic clusters using unsupervised learning
kmeans = KMeans(n_clusters=5, random_state=42)
clusters = kmeans.fit_predict(embeddings)

# ---------------------------------------
# Step 3: Reduce Embedding Dimensionality to 3D with t-SNE
# ---------------------------------------
# t-SNE projects high-dimensional embeddings into 3D space for visualization,
# preserving local similarity structure
tsne = TSNE(n_components=3, random_state=42)
embedding_3d = tsne.fit_transform(embeddings)

# ---------------------------------------
# Step 4: Prepare Data for Plotting
# ---------------------------------------
# Select only the relevant columns from the original dataset
locations_plotting_df = locations_df.loc[:, ["location_name", "description"]]

# Create a shortened version of the description (first 25 characters) for concise tooltips
locations_plotting_df["short_description"] = locations_plotting_df['description'].str[:25] + '...'

# Assign cluster labels as strings for categorical coloring in the legend
locations_plotting_df['cluster'] = 'Cluster ' + clusters.astype(str)

# Add the 3D coordinates from the t-SNE projection
locations_plotting_df['x'] = embedding_3d[:, 0]
locations_plotting_df['y'] = embedding_3d[:, 1]
locations_plotting_df['z'] = embedding_3d[:, 2]

# ---------------------------------------
# Step 5: Build an Interactive 3D Scatter Plot with Plotly
# ---------------------------------------
# Each point represents a location, colored by its cluster, and can be explored in 3D
fig = px.scatter_3d(
    locations_plotting_df,
    x='x', y='y', z='z',                      # 3D coordinates
    color='cluster',                         # Use cluster for color grouping
    hover_data={                             # Tooltip configuration
        'location_name': True,
        'short_description': True,
        'x': False, 'y': False, 'z': False
    },
    title="Embedding Location Descriptions",
    labels={'x': 't-SNE X', 'y': 't-SNE Y'},   # Axis labels for readability
    # Order clusters explicitly in the legend
    category_orders={'cluster': sorted(locations_plotting_df['cluster'].unique())}

)

# ---------------------------------------
# Step 6: Format Plot Appearance
# ---------------------------------------
# Make the chart larger and cleaner with defined width, height, and no axis margins
fig.update_layout(
    width=1000,
    height=600,
    margin=dict(l=0, r=0, b=0, t=40)
)

# Tune the size and transparency of the plot markers
fig.update_traces(marker=dict(size=8, opacity=0.7))

# ---------------------------------------
# Step 7: Render the Interactive Chart
# ---------------------------------------
fig.show()


## 

### 1.3.3 Inspecting Embeddings and Plotting Data

To better understand what’s happening under the hood, it’s helpful to look at the actual data we’re working with:

- **Raw embeddings:** After encoding the descriptions with `SentenceTransformer`, each location is represented as a 384-dimensional vector. These embeddings reflect the semantic meaning of each description and are the foundation for clustering and visualization.

- **Plotting DataFrame:** The `locations_plotting_df` DataFrame brings everything together — original fields like `location_name`, the `short_description`, assigned `cluster`, and the 3D coordinates from t-SNE. This structure helps connect the original text with the transformed features used for plotting.

Exploring these structures can help you connect the Python code to the actual data transformations — from text, to vectors, to clusters, to 3D points.

In [None]:
# View the embedding for Yosemite at index 0
print(f"Raw embedding values for Yosemite National Park (length = {len(embeddings[0])})")
print(embeddings[0])

In [None]:
locations_plotting_df.head()

# 2. Implementing Search for Explore California

In this section, we’ll build out three different **search functionalities** for our case study: **Explore California**. These techniques mirror how a travel website might power its **search experience** — helping users find tours and destinations based on what they type in.

We'll explore and compare the following search algorithms:

1. **Keyword Search**  
   A simple approach that checks if the user’s query appears directly in the text.

2. **TF-IDF (Term Frequency–Inverse Document Frequency)**  
   A classic information retrieval technique that scores how important a term is in a document relative to the rest of the dataset.

3. **BM25 (Best Matching 25)**  
   An advanced, ranking-based algorithm that improves on TF-IDF by accounting for term frequency saturation and document length.

Once we’ve implemented and compared these traditional search methods, we’ll move on to a more modern approach — performing another round of **embeddings**, this time using a more advanced **Sentence-BERT** model to enable **semantic search** based on meaning rather than just words.

This will set the stage for building more intelligent and flexible retrieval systems, similar to what powers modern AI-driven search experiences.


## 2.1 Python Implementation

In [None]:
# ---------------------------------------
# 🔎 Unified Search Comparison for Explore California
# ---------------------------------------
# In this cell, we implement 4 search strategies:
# 1. Keyword Search
# 2. TF-IDF Vector Search
# 3. BM25 Ranking
# 4. Semantic Search using Sentence-BERT
# Each approach is defined as a function so we can easily compare results for the same query.
# ---------------------------------------

from sklearn.feature_extraction.text import TfidfVectorizer         # For TF-IDF vector search
from rank_bm25 import BM25Okapi                                     # For BM25 ranking
from sentence_transformers import SentenceTransformer, util        # For semantic search with SBERT
import pandas as pd
import numpy as np

# ---------------------------------------
# Step 1: Prepare the Corpus for Search
# ---------------------------------------
# We'll search against our detailed HTML data in the text_data field
corpus = locations_df["text_data"].fillna('').tolist()

# Tokenize for BM25 (required format: list of lists of words)
tokenized_corpus = [doc.lower().split() for doc in corpus]

# ---------------------------------------
# Step 2: Precompute Representations for Search
# ---------------------------------------

# TF-IDF vectorization
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(corpus)

# BM25 indexing
bm25 = BM25Okapi(tokenized_corpus)

# Sentence-BERT embeddings for semantic similarity (this is the same as above!)
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = embedding_model.encode(corpus, convert_to_tensor=True)


# ---------------------------------------
# Step 3: Define Reusable Search Functions
# ---------------------------------------

def search_keyword(query, top_k=5):
    """Return rows that contain the query term (case-insensitive substring match)."""
    results = [i for i, doc in enumerate(corpus) if query.lower() in doc.lower()]
    return locations_df.iloc[results][:top_k]

def search_tfidf(query, top_k=5):
    """Rank documents by TF-IDF cosine similarity with the query."""
    q_vec = tfidf_vectorizer.transform([query])
    scores = np.dot(tfidf_matrix, q_vec.T).toarray().ravel()
    top_indices = scores.argsort()[::-1][:top_k]
    return locations_df.iloc[top_indices]

def search_bm25(query, top_k=5):
    """Rank documents by BM25 relevance score."""
    tokenized_query = query.lower().split()
    scores = bm25.get_scores(tokenized_query)
    top_indices = np.argsort(scores)[::-1][:top_k]
    return locations_df.iloc[top_indices]

def search_semantic(query, top_k=5):
    """Rank documents by semantic similarity using Sentence-BERT embeddings."""
    query_emb = embedding_model.encode(query, convert_to_tensor=True)
    scores = util.cos_sim(query_emb, embeddings)[0].cpu().numpy()
    top_indices = np.argsort(scores)[::-1][:top_k]
    return locations_df.iloc[top_indices]

# ---------------------------------------
# Step 4: Define a Reusable Comparison Function for Search Outputs
# ---------------------------------------
# This function lets us easily compare the results of all four search methods
# — Keyword Match, TF-IDF, BM25, and Semantic SBERT —
# for any given query. It's useful for validating how different search techniques
# interpret and rank the same input phrase.
# ---------------------------------------

def compare_search_methods(query, top_k=5):
    """
    Run and display results from all four search methods for a given query.
    
    Parameters:
    - query (str): The search string to evaluate
    - top_k (int): Number of top results to return for each method (default = 5)
    """
    
    # Keyword Search (exact substring match)
    print(f"\n🔍 Keyword Search: '{query}'")
    display(search_keyword(query, top_k=top_k))

    # TF-IDF Vector Search (weighted match based on term rarity)
    print(f"\n🧠 TF-IDF Search: '{query}'")
    display(search_tfidf(query, top_k=top_k))

    # BM25 Ranking (term frequency-aware ranking algorithm)
    print(f"\n📚 BM25 Search: '{query}'")
    display(search_bm25(query, top_k=top_k))

    # Semantic Search using Sentence-BERT embeddings
    print(f"\n🤖 Semantic Search (SBERT): '{query}'")
    display(search_semantic(query, top_k=top_k))


## 2.2 Comparing Search Methods

Now that we’ve implemented all four search strategies — **Keyword**, **TF-IDF**, **BM25**, and **Semantic Search (SBERT)** — we’ll use our `compare_search_methods` function to evaluate how each method handles different types of queries.

We’ll start with a few simple, direct queries like:

> *“wine tours”*

Then move on to more natural, conversational questions such as:

> *“Where can I find a good place to workout?”*

This comparison helps reveal the strengths and limitations of each approach:

- **Keyword Search** is very strict — it only returns results that contain the exact words from the query, so it often misses related or reworded content.
- **TF-IDF** and **BM25** offer more flexibility by considering word importance and frequency, though they still rely on exact token overlap.
- **Semantic Search (SBERT)** shines on open-ended or vague queries — it can understand meaning even when the wording doesn’t match exactly, making it ideal for natural language search.

---

👉 **Try it yourself:**  
Run the `compare_search_methods()` function with a few of your own queries to see how the results change.  
You can also adjust the `top_k` parameter to control how many top matches you want to return (e.g., 3, 5, 10).

This is a great way to explore how traditional vs. AI-powered search behaves in a realistic scenario!

In [None]:
compare_search_methods(
  query="wine",
  top_k=5
)

In [None]:
compare_search_methods(
  query="wine tours",
  top_k=2
)

In [None]:
compare_search_methods(
  query="nice beaches near SoCal",  # SoCal is short for Southern California
  top_k=3
)

In [None]:
compare_search_methods(
  query="Where can I find a good place to workout?",
  top_k=5
)

# 3. Retrieval Augmented Generation LLM Workflow

In this next section, we’ll use a **local LLM (TinyLlama)** to answer natural language questions about our Explore California dataset.

We’ll try two approaches:

1. **Direct Q&A (No Context):**  
   First, we’ll ask the model a few questions without providing any background or external knowledge. This helps us see how well a small, locally-run model performs “out of the box.”

2. **Contextual Q&A using RAG:**  
   Next, we’ll implement a simple **retrieval-augmented generation (RAG)** pipeline using the **semantic embeddings** we created earlier. We’ll:
   - Use **`all-MiniLM-L6-v2`**, a Sentence-BERT model, to embed each location’s description into a semantic vector
   - Build a **FAISS** index from those embeddings for fast similarity search
   - Retrieve the most relevant descriptions for a given query
   - Insert those retrieved descriptions into the LLM’s prompt as context
   - Ask the same question again — and compare how much better the answers become


---

**How we’re running TinyLlama locally via Google Colab:**  
We’ll be using the **GGUF version of TinyLlama** provided by [TheBloke](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) on Hugging Face, which is optimized for local inference.  
This model runs with the **`ctransformers`** library and supports fast, quantized loading with minimal memory — perfect for running on Google Colab.

You’ll specify the quantized model file (e.g., `Q4_K_M`) and load it directly via `AutoModelForCausalLM.from_pretrained()`.

---

🔍 **Why this matters:**  
Local models like TinyLlama are fast and run offline, but they come with limitations — especially around **context size** (the number of tokens they can read at once). This can limit how much background knowledge we can include in a single request.

As we move forward, we’ll also explore **cloud-based LLMs** via API calls (like OpenAI, Mistral, or Claude), which support much larger context windows and often produce more accurate and detailed responses — especially for complex or multi-turn questions.

This exercise will help you understand the tradeoffs between small local models and larger cloud-hosted ones — and how retrieval can help bridge that gap.


## 3.1 Python Implementation

In [None]:
# ---------------------------------------
# 🧠 Retrieval-Augmented Generation (RAG) using TinyLlama + MiniLM + FAISS
# ---------------------------------------
# This script compares two approaches to answering questions using a local LLM:
# 1. Direct prompt to TinyLlama (no context)
# 2. Retrieval-Augmented Generation (RAG) using semantic search + FAISS + TinyLlama
#
# Key Components:
# - SentenceTransformer (MiniLM) for embeddings
# - FAISS for fast similarity search
# - TinyLlama (GGUF, via ctransformers) for local inference
# - Token counting to check prompt size before sending to LLM
# ---------------------------------------

# ---------------------------------------
# 1. Import dependencies
# ---------------------------------------
import numpy as np
import pandas as pd
import faiss
import re
import textwrap
from sentence_transformers import SentenceTransformer
from ctransformers import AutoModelForCausalLM
from huggingface_hub import hf_hub_download

# ---------------------------------------
# 2. Prepare your dataset
# ---------------------------------------
# Assumes locations_df has a 'description' column with clean, descriptive text
corpus = locations_df["description"].fillna("").tolist()

# ---------------------------------------
# 3. Generate sentence embeddings using MiniLM
# ---------------------------------------
sbert_model = SentenceTransformer("all-MiniLM-L6-v2")                  # Lightweight and fast
sbert_embeddings = sbert_model.encode(corpus, convert_to_numpy=True)  # Convert to NumPy for FAISS

# ---------------------------------------
# 4. Build a FAISS index for fast similarity search
# ---------------------------------------
dimension = sbert_embeddings.shape[1]  # 384 for MiniLM
index = faiss.IndexFlatL2(dimension)
index.add(sbert_embeddings)

# ---------------------------------------
# 5. Load the TinyLlama local model
# ---------------------------------------
llama_path = hf_hub_download(
    repo_id="TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
    filename="tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
)

tiny_llama_llm = AutoModelForCausalLM.from_pretrained(
    llama_path,
    model_type="llama",         # Required for TinyLlama GGUF format
    gpu_layers=0,               # Set >0 if Colab supports GPU acceleration
    max_new_tokens=512          # Increased output length for richer answers
)

# ---------------------------------------
# 6. Utility: Estimate prompt token count
# ---------------------------------------
def count_tokens(text):
    """
    Estimate token count using basic word+punctuation splitting.
    Approximate — useful for checking against 512-token context window.
    """
    return len(re.findall(r"\w+|[^\w\s]", text, re.UNICODE))

# ---------------------------------------
# 7. Define QA response functions
# ---------------------------------------

def generate_direct_response(query):
    """
    Answer question using TinyLlama with no background knowledge.
    Helps evaluate zero-shot performance on local models.
    """
    prompt = f"""Answer the question as a helpful travel agent based only on your knowledge about California
### Question:\n{query}\n\n
### Response:
    """
    return tiny_llama_llm(prompt)

def generate_rag_response(query, k=3, verbose=True):
    """
    Answer question using RAG: retrieve top-k similar descriptions using FAISS
    and inject them as context for TinyLlama to use in its response.
    Also prints estimated token count for visibility.
    """
    # Semantic search using query embedding
    query_embedding = sbert_model.encode(query, convert_to_numpy=True)
    _, top_indices = index.search(query_embedding.reshape(1, -1), k)

    # Build context: include both location name and description
    context_rows = locations_df.iloc[top_indices[0]]
    context = "\n".join(
        f"{row['location_name']}: {row['description']}" for _, row in context_rows.iterrows()
    )

    # Construct a formatted prompt with context and question
    prompt = f"""Answer the question as a helpful travel agent using only the provided context.

### Context:
{context}

### Question:
{query}

### Response:"""

    # Estimate tokens in the prompt
    prompt_tokens = count_tokens(prompt)
    if verbose:
        print(f"📏 Estimated tokens in prompt: {prompt_tokens}")
    
    # Stop if token limit exceeded
    if prompt_tokens > 512:
        raise ValueError("🚫 Prompt exceeds TinyLlama's 512-token context limit. Try reducing `k` or trimming the context.")

    # Generate response using TinyLlama and return as dict along with context
    return {
        "prompt": prompt,
        "response": tiny_llama_llm(prompt)
    }

# ---------------------------------------
# 8. Compare Direct vs RAG-Enhanced Responses
# ---------------------------------------

# This is a convenience function to print long strings into multiple lines
def wrap_print(text):
    print(textwrap.fill(text, width=80))

def compare_llm_responses(query, k=3):
    """
    Compare TinyLlama's response to a query with and without RAG-enhanced context.

    Parameters:
    - query (str): The natural language question to ask the model
    - k (int): Number of top matching documents to retrieve for RAG
    """
    
    print("=" * 100)
    print(f"\n🧠 Query: {query}")
    
    # 🔹 Step 1: Run Direct Q&A with no background context
    print("\n🤖 Direct LLM Response (No Context Provided):")
    direct_response = generate_direct_response(query)
    wrap_print(direct_response)
    
    # 🔹 Step 2: Run RAG-enhanced Q&A
    print("\n📡 Running Retrieval-Augmented Generation (RAG)...")
    rag_response = generate_rag_response(query, k=k)

    # 🔹 Step 3: Show enhanced prompt that will be fed into the model
    print(f"\n📚 Enhanced Prompt With Retrieved Context (Top {k} Documents):")
    print(rag_response["prompt"])

    # 🔹 Step 4: Show the final model response with RAG
    print("\n💬 RAG-Enhanced Response (With Context):")
    wrap_print(rag_response["response"])
    print("\n" + "=" * 100)


### 3.2 Comparing Simple vs Conversational Queries

Now that we’ve set up both direct querying and our RAG (retrieval-augmented generation) pipeline, it’s time to test how well they perform across different types of natural language queries.

We’ll start with **simple, keyword-style queries** — like `"scenic hikes"` or `"wine tours"` — that resemble what a user might type into a traditional search box.

Then we’ll move on to more **conversational, open-ended questions**, such as:

- `"Where can I go kayaking or canoeing in California?"`
- `"What are some good places for stargazing?"`
- `"I'm traveling with kids — any family-friendly adventures?"`

By comparing the answers returned by:

- A **direct local LLM (TinyLlama)** without any context, and  
- The same model enhanced with **retrieved context from FAISS**,  

we’ll get a better sense of **how retrieval helps smaller models** understand and respond more accurately — especially as the queries become more natural and less keyword-focused.

---

⚠️ **Important Notes:**

- **Responses may vary slightly each time you run them.**
- **Token limits matter.** If the prompt (context + question) exceeds TinyLlama’s 512-token limit, it may raise an error or truncate output.
- **💡 Recommendation:** Keep `top_k` at or below **5** to avoid exceeding the token limit.  
  If you include too many documents in the prompt, the model may not be able to read the full context effectively.
- **Try it yourself:**  
  - Experiment by entering your own questions  
  - Adjust the `top_k` parameter to include more or fewer context items  
  - Re-run the same query to see how the results might vary across runs


In [None]:
compare_llm_responses("scenic hikes", k=3)

In [None]:
compare_llm_responses("wine tours", k=3)

In [None]:
compare_llm_responses("Where can I go kayaking or canoeing in California?", k=3)

In [None]:
compare_llm_responses("What are some good places for stargazing?", k=3)

In [None]:
compare_llm_responses("I'm traveling with kids — any family-friendly adventures?", k=3)