# üéØ RAG for Libraries: Live Demo + Playground

**Two modes in one notebook:**

## üé¨ PART 1: LIVE DEMO (Top)
**As mentioned in the presentation** - this is what we're demoing: the four steps of RAG in action
Streamlined for presentations - 5 cells, ~3-4 minutes

## üî¨ PART 2: TINKERER'S PLAYGROUND (Bottom)
Deep dive with explanations and extra visualizations that mirror Act II of the presentation

**No API keys required for demo** | **Free to run** | **Run cells with ‚ñ∂Ô∏è**

---

*For LYRASIS presentation | [GitHub](https://github.com/radio-shaq/Lyrasis-slides-11-2025)*

---

# üé¨ PART 1: LIVE DEMO

**For presentations:** Run these 5 cells in sequence.

**What you'll see:**
- üìä Beautiful embedding visualization
- üÜö Side-by-side RAG vs no-RAG comparison
- ‚ú® The "wow" moment

---

## Demo Step 0: Preliminary Setup (30 sec)
**This is an interactive way to use Python** - runs in Google's cloud, safe to use

**What this does:** Installs libraries and imports

In [None]:
!pip install -q sentence-transformers chromadb pandas numpy scikit-learn matplotlib seaborn

import pandas as pd
from io import StringIO
from sentence_transformers import SentenceTransformer
import chromadb
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

print("‚úÖ Ready!")

## Demo Step 1: INGEST - Upload Your Documents
**From the presentation: Step 1 of RAG** - Load your library's FAQs, policies, guides

**You can paste your own information here** - CSV format (plain text spreadsheet)

In [None]:
csv_data = """question,answer,category
"What are the library hours?","The library is open Monday-Friday 8:00 AM to 10:00 PM, Saturday 10:00 AM to 6:00 PM, and Sunday 12:00 PM to 8:00 PM. Hours may vary during holidays.",hours
"When does the library close?","Regular closing times are 10:00 PM Monday-Friday, 6:00 PM Saturday, and 8:00 PM Sunday. During finals week we extend to midnight.",hours
"Are you open on weekends?","Yes! We're open Saturdays 10:00 AM - 6:00 PM and Sundays 12:00 PM - 8:00 PM.",hours
"What are your holiday hours?","The library follows the university calendar. We're typically closed on major holidays and have reduced hours during breaks.",hours
"Do you have 24-hour study spaces?","We have a designated 24-hour study lounge on the second floor accessible with your student ID.",hours
"How do I reserve a study room?","Study rooms can be reserved through our online booking system at libcal.yourlibrary.edu. Rooms are available in 2-hour blocks.",facilities
"Can I book a group study room?","Yes! Group study rooms (4-8 people) can be booked online up to 7 days in advance.",facilities
"Do you have private study spaces?","We have individual study carrels on the third floor, first-come first-served.",facilities
"Is there a quiet study area?","Yes, the third floor is silent study. Second floor allows quiet conversation. First floor is collaborative.",facilities
"Can I eat in the library?","Light snacks and drinks with secure lids permitted on all floors except Archives. No hot food or open containers.",facilities
"How do I access databases from home?","Access all library databases off-campus by logging in with your university credentials when prompted.",database
"Why can't I access JSTOR from home?","Make sure you're using the library's link and entering full university credentials. Clear browser cache if needed.",database
"Do I need a VPN to use library resources?","No VPN needed! Our databases use proxy authentication through the library website.",database
"How do I find peer-reviewed articles?","Use databases like JSTOR or ProQuest. Most have a filter for peer-reviewed sources.",database
"What's the difference between a database and Google Scholar?","Library databases provide subscription access with better filtering and guaranteed full-text access.",database
"How many books can I check out?","Undergrads: 20 books. Graduate students: 50 books. Faculty limits vary.",policies
"How long can I keep a book?","3-week loan period with one renewal if no one else requested it.",policies
"Can I renew my books?","Yes! Renew online through your library account or call circulation desk. Once unless on hold.",policies
"What happens if I return a book late?","$0.25/day per item, max $10. Items 30+ days overdue are considered lost.",policies
"Can I renew if it's late?","Yes if no one requested it. You still owe fines for overdue days.",policies
"Do you have textbooks?","Limited textbooks on Course Reserve for in-library or overnight checkout.",collections
"How do I request a book from another library?","Use Interlibrary Loan! Log in to your ILL account. Most items arrive in 5-10 business days.",services
"Can I get articles from other universities?","Yes, through ILL. PDF usually within 2-3 business days.",services
"Is there a scanner I can use?","Yes! Flatbed and large-format scanners on all floors. Scan-to-email on copiers. All free.",technology
"Do you have laptops I can borrow?","Yes, 4-hour checkout at circulation desk with student ID. Windows and Mac available.",technology
"""

data = pd.read_csv(StringIO(csv_data))
print(f"‚úÖ Loaded {len(data)} FAQs")
print(f"üìä Categories: {', '.join(data['category'].unique())}")

## Demo Step 2: STORE - Create Embeddings & Visualize üìä
**Remember the GPS coordinates analogy?** Here's how it actually works - creating meaning-based coordinates

**Accessibility:** Color + shape markers for universal design

In [None]:
# Create embeddings
embedder = SentenceTransformer('all-MiniLM-L6-v2')
texts = [f"Q: {row['question']} A: {row['answer']}" for _, row in data.iterrows()]
embeddings = embedder.encode(texts, show_progress_bar=True)

# Reduce to 2D for visualization
pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(embeddings)

# Accessibility: Use different markers AND colors (universal design)
markers = ['o', 's', '^', 'D', 'v', '*', 'P', 'X']  # circle, square, triangle, diamond, etc.

# Beautiful plot with accessibility
plt.figure(figsize=(14, 9))
categories = data['category'].unique()
colors = plt.cm.Set3(range(len(categories)))

for i, cat in enumerate(categories):
    mask = data['category'] == cat
    marker = markers[i % len(markers)]  # Cycle through markers
    plt.scatter(
        embeddings_2d[mask, 0], embeddings_2d[mask, 1],
        c=[colors[i]], label=cat.title(),
        marker=marker,  # Different shape for each category
        s=200, alpha=0.7, edgecolors='black', linewidth=2
    )

plt.title('üß† Similar FAQs Cluster Together', fontsize=18, fontweight='bold', pad=20)
plt.xlabel('Dimension 1', fontsize=14)
plt.ylabel('Dimension 2', fontsize=14)
plt.legend(title='Category', fontsize=12, title_fontsize=13, loc='best')
plt.grid(alpha=0.3, linestyle='--')
plt.tight_layout()
plt.show()

print("\nüí° Each point = 1 FAQ. Similar topics cluster together!")
print("‚ôø Accessibility: Different shapes + colors (not color alone)")

## Demo Step 3: Build the Semantic Card Catalog
**This is the 'semantic card catalog' concept from the slides** - storing FAQs by meaning, not keywords

**What this does:** Creates vector database with ChromaDB

In [None]:
# Create vector DB (or get existing one)
client = chromadb.EphemeralClient()
collection = client.get_or_create_collection(name="library_faqs")

# Clear existing data if re-running
try:
    collection.delete(ids=[f"faq_{i}" for i in range(100)])  # Delete any old data
except:
    pass  # Collection was empty, that's fine

# Add FAQs to our semantic card catalog
collection.add(
    embeddings=[emb.tolist() for emb in embeddings],
    documents=data['answer'].tolist(),
    metadatas=[{"question": row['question'], "category": row['category']} 
               for _, row in data.iterrows()],
    ids=[f"faq_{i}" for i in range(len(data))]
)

print(f"‚úÖ Semantic card catalog ready with {collection.count()} FAQs")

## Demo Step 4: RETRIEVE + GENERATE üÜö
**Steps 3 & 4 from presentation:** Reference interview (find relevant docs) + AI answers with citations

**See the difference:** Grounded and verifiable vs. generic guessing

In [None]:
def rag_answer(question):
    # RETRIEVE: Like doing a reference interview + browsing the right shelf
    query_emb = embedder.encode([question])[0]
    results = collection.query(query_embeddings=[query_emb.tolist()], n_results=1)
    if results['documents'][0]:
        answer = results['documents'][0][0]
        source = results['metadatas'][0][0]['question']
        # GENERATE: Return answer with citation - grounded and verifiable!
        return f"{answer}\n\nüìö Source: '{source}'"
    return "No info in knowledge base."

def no_rag_answer(question):
    return "The library is typically open during regular business hours. Exact times may vary. Check with your library for accurate information."

# DEMO QUESTION
q = "Can I bring food into the library?"

print("=" * 80)
print(f"‚ùì QUESTION: {q}")
print("=" * 80)

print("\nüî¥ WITHOUT RAG (15-30% hallucination rate):\n")
print(no_rag_answer(q))
print("\n‚ö†Ô∏è  Vague. Generic. Might be wrong. NO SOURCE.\n")

print("=" * 80)

print("\nüü¢ WITH RAG (2-5% hallucination rate - 6.5x more reliable):\n")
print(rag_answer(q))
print("\n‚úÖ Specific. Accurate. Cited. GROUNDED AND VERIFIABLE.\n")

print("=" * 80)

---

# üî¨ PART 2: TINKERER'S PLAYGROUND

**Explore deeper!**

This section mirrors **Act II of the presentation** (How RAG Works) with:
- üß† Clear explanations using library analogies
- üìä Visualizations to understand your data
- üéÆ Interactive query testing
- ü§ñ Real LLM integration (OpenAI & Google Gemini)
- üìö Resources & next steps

**Why this matters:** Part 1 showed you *what* RAG does. Part 2 shows you *how* it works and *how* to customize it.

**Remember from the presentation:** Your information literacy skills apply here - evaluating sources, understanding queries, citing accurately.

---

## üß† What is RAG? (Clear Explanation)

### The Problem: LLMs Don't Know Your Library

When you ask ChatGPT "What are your library hours?" it will:
- ‚ùå Make up generic hours
- ‚ùå Give outdated information (trained on data from 2021-2023)
- ‚ùå Provide no sources
- ‚ùå Hallucinate 15-30% of the time (TruthfulQA, 2022; Watanabe et al., 2025)

**Why?** As mentioned in the presentation: LLMs are trained on the internet, but have no access to YOUR library's current policies.

### The Solution: RAG (Retrieval-Augmented Generation)

**From the presentation:** Think of RAG as giving AI an "open book test" instead of asking it to memorize everything.

**WITHOUT RAG (Closed-book exam):**
```
Question ‚Üí LLM Memory ‚Üí Guessed Answer ‚ùå
```

**WITH RAG (Open-book exam):**
```
Question ‚Üí Search Your FAQs ‚Üí Find Top Matches ‚Üí 
Give Them to LLM ‚Üí Answer Based on YOUR Docs ‚úÖ
```

### The Four Steps of RAG (from Slide 8):

1. **INGEST** ‚Üí Upload your documents (policies, FAQs, catalog data)
2. **STORE** ‚Üí Create semantic card catalog (vector embeddings)
3. **RETRIEVE** ‚Üí Find relevant sources (like reference interview + shelf browsing)
4. **GENERATE** ‚Üí AI answers using ONLY retrieved context + cites sources

### Why RAG is Better - The Statistics:

As shown in the presentation (Slide 3):
- ‚úÖ **Grounded in YOUR docs** - Uses your actual policies
- ‚úÖ **Shows sources** - Cites which FAQ it used
- ‚úÖ **Says "I don't know"** - When it can't find an answer
- ‚úÖ **2-5% hallucination rate** - Instead of 15-30% (6.5x more reliable!)
- ‚úÖ **Always current** - Update your FAQs, update the answers

**ACRL framework:** "RAG enhances generative AI by drawing on external sources... allowing outputs to be more grounded and verifiable." (ACRL AI Competencies, 2025)

**Big idea:** RAG = Research assistant with a library card to YOUR collection.

## üî¢ Understanding Embeddings (GPS Coordinates for Meaning)

**From the presentation:** Embeddings = GPS coordinates for concepts

### Why We Need Embeddings:

Computers can't understand "What are your hours?" vs "When are you open?" means the same thing.

But if we convert them to numbers (embeddings), the computer can see they're similar!

This enables **semantic search** - searching by meaning, not just keywords. Your cataloging and metadata skills apply here!

### How It Works:

- **"What are your hours?"** ‚Üí [0.42, -0.13, 0.87, ...] (384 numbers)
- **"When are you open?"** ‚Üí [0.39, -0.15, 0.85, ...] (similar numbers!)
- **"How do I renew a book?"** ‚Üí [0.91, 0.22, -0.34, ...] (different numbers)

**The magic:** Different words, same meaning = similar embedding vectors (nearby in vector space)

**The analogy from Slide 11:** Like GPS coordinates on Earth - nearby coordinates = nearby locations. Nearby vectors = nearby meanings.

### This Creates Your Semantic Card Catalog:

Just like a traditional card catalog organized books by subject, author, and title - embeddings organize your FAQs by **meaning**.

- Traditional catalog: Books physically near each other on shelf by call number
- Semantic catalog: FAQs "near" each other in vector space by meaning

**Your expertise matters:** Like you evaluate subject headings and controlled vocabularies, you can evaluate whether similar questions are truly semantically related.

In [None]:
# See similarity in action - this is the math behind semantic search
from numpy.linalg import norm

def similarity(a, b):
    """Calculate cosine similarity between two vectors (0-1, higher = more similar)"""
    return np.dot(a, b) / (norm(a) * norm(b))

# Create embeddings for three different questions
q1 = embedder.encode(["What time do you close?"])[0]
q2 = embedder.encode(["When does the library shut?"])[0]
q3 = embedder.encode(["How do I borrow a laptop?"])[0]

print("üîç Embedding Similarity Demo:\n")
print("This shows how the 'GPS coordinates for meaning' actually work:\n")
print(f"Question 1: 'What time do you close?'")
print(f"Question 2: 'When does the library shut?'")
print(f"Similarity: {similarity(q1, q2):.3f} ‚Üí HIGH (same topic!)\n")

print(f"Question 1: 'What time do you close?'")
print(f"Question 3: 'How do I borrow a laptop?'")
print(f"Similarity: {similarity(q1, q3):.3f} ‚Üí LOW (different topics)\n")

print("üí° Why this matters:")
print("   This is how your semantic card catalog finds relevant FAQs!")
print("   Higher score = closer in vector space = better match = better answer")
print("\n   Like a reference interview: understanding synonyms and rephrasing")

## üìä Visualization: FAQs by Category

**What this shows:** How many questions we have in each category.

**Your source evaluation skills apply here:** 
- Which topics have good coverage vs. gaps?
- Does this represent diverse user needs?
- Are there underrepresented categories that need attention?

**This is collection development for your RAG knowledge base.**

In [None]:
import seaborn as sns

# Count FAQs per category
category_counts = data['category'].value_counts().sort_values(ascending=True)

# Create bar chart
plt.figure(figsize=(10, 6))
colors = plt.cm.Set3(range(len(category_counts)))
category_counts.plot(kind='barh', color=colors, edgecolor='black', linewidth=1.5)

plt.title('üìä Number of FAQs by Category', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Number of FAQs', fontsize=12)
plt.ylabel('Category', fontsize=12)
plt.grid(axis='x', alpha=0.3, linestyle='--')

# Add count labels on bars
for i, v in enumerate(category_counts):
    plt.text(v + 0.1, i, str(v), va='center', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

print("\nüí° Insights:")
most_common = category_counts.idxmax()
least_common = category_counts.idxmin()
print(f"   ‚Ä¢ Most FAQs: {most_common.title()} ({category_counts[most_common]} questions)")
print(f"   ‚Ä¢ Fewest FAQs: {least_common.title()} ({category_counts[least_common]} questions)")
print(f"   ‚Ä¢ Total coverage: {len(category_counts)} different categories")
print("\n   Consider adding more FAQs to underrepresented categories!")

## üîç Test Retrieval Quality

**What this does:** Shows how well our semantic card catalog matches questions to FAQs.

**From the presentation:** This is the "reference interview + shelf browsing" step (Step 3 - RETRIEVE).

**Your expertise matters here:** Good retrieval = good RAG answers. If it retrieves the wrong FAQ, the LLM will give a wrong answer - just like directing a patron to the wrong section of the library.

**Look for:** Does the matched FAQ make sense for the question asked? This is evaluating the quality of your retrieval system.

In [None]:
test_qs = [
    "What time do you close on Sunday?",
    "Can I access JSTOR from my apartment?",
    "How long can I keep a book?",
    "Where's the quiet zone?",
    "Do you have group study rooms?"
]

print("üîç Retrieval Quality Test:\n")
print("Testing if our semantic card catalog finds the right FAQ for each question...\n")

for q in test_qs:
    emb = embedder.encode([q])[0]
    res = collection.query(query_embeddings=[emb.tolist()], n_results=1)
    
    if res['documents'][0]:
        matched = res['metadatas'][0][0]['question']
        print(f"‚ùì Asked: \"{q}\"")
        print(f"‚úÖ Matched: \"{matched}\"\n")

print("üí° Good retrieval = relevant matches (like finding the right shelf)")
print("   If matches look wrong, you might need:")
print("   ‚Ä¢ Better FAQ coverage (collection development)")
print("   ‚Ä¢ Different embeddings (better 'coordinates')")
print("   ‚Ä¢ More context in questions/answers (richer metadata)")

## üéÆ Interactive: Test Your Own Questions

**Try it yourself!** Change the question below and see how RAG performs.

**As mentioned in the presentation:** This is your chance to audit the code and see what works (and what doesn't).

**Experiment ideas:**
- Try questions similar to existing FAQs
- Try questions NOT in the FAQs (what happens when there's no source?)
- Try different phrasings of the same question (test semantic understanding)
- Try vague vs specific questions (does it still find the right answer?)

In [None]:
# üëá CHANGE THIS QUESTION!
your_q = "Can I eat pizza in the library?"

print("=" * 80)
print(f"‚ùì YOUR QUESTION: {your_q}")
print("=" * 80)

print("\nüî¥ WITHOUT RAG (Generic LLM - 15-30% hallucination rate):\n")
print(no_rag_answer(your_q))
print("\n‚ö†Ô∏è  Notice: Vague, unhelpful, could be wrong, NO CITATION\n")

print("=" * 80)

print("\nüü¢ WITH RAG (Grounded in YOUR FAQs - 2-5% hallucination rate):\n")
print(rag_answer(your_q))
print("\n‚úÖ Notice: Specific, accurate, cited source - GROUNDED AND VERIFIABLE\n")

print("=" * 80)

print("\nüí° Try changing 'your_q' above to test different questions!")
print("   Remember: RAG can only answer from sources in the knowledge base")

## üß™ Experiment: Top 3 Results

**What this shows:** RAG doesn't just find 1 match - it can retrieve multiple relevant FAQs.

**Why this matters (from the presentation):** 
- More context = better answers (like pulling multiple books from the shelf)
- Handles questions that span multiple FAQs
- Shows how confident the system is (big gap between #1 and #2 = very confident)

**Your reference skills:** Just like you might pull 3-4 relevant books for a patron, RAG can use multiple sources.

**Try changing the question** to see different results!

In [None]:
# üëá CHANGE THIS!
exp_q = "How do I use databases off campus?"

emb = embedder.encode([exp_q])[0]
res = collection.query(query_embeddings=[emb.tolist()], n_results=3)

print(f"üîç Query: \"{exp_q}\"\n")
print("Top 3 Most Relevant FAQs from Semantic Card Catalog:\n")

for i, (doc, meta) in enumerate(zip(res['documents'][0], res['metadatas'][0]), 1):
    print(f"[{i}] Question: {meta['question']}")
    print(f"    Answer: {doc[:100]}...\n")

print("üí° In production RAG, you'd give all 3 to the LLM for richer context!")
print("   Like handing a patron 3 relevant books instead of just 1")

## üìÅ Upload Your Own FAQs (Optional)

Want to try with your library's actual data?

**Required format:** CSV with `question`, `answer`, `category` columns

**Tip:** See `CSV_GENERATION_PROMPT.md` in the GitHub repo for a ChatGPT prompt that generates perfectly formatted CSVs from your existing docs!

**This is where your expertise matters most:** The quality of your FAQs (source evaluation, clear answers, good coverage) directly impacts RAG quality.

In [None]:
# Uncomment these lines to upload your own CSV
# from google.colab import files
# uploaded = files.upload()
# custom_data = pd.read_csv(list(uploaded.keys())[0])
# print(f"‚úÖ Loaded {len(custom_data)} custom FAQs")
# 
# Then re-run cells 7-9 to rebuild the vector database with your data!

---

# ü§ñ REAL LLM INTEGRATION (OPTIONAL)

**‚ö†Ô∏è FOR LIVE DEMO:** The presenter will skip this section (API keys are private!)

**FOR YOU TO TRY LATER:** You can add your own API key and test the full RAG pipeline.

---

**So far:** We've built the RAG *retrieval* part (finding relevant FAQs from your semantic card catalog).

**Now:** Let's add the *generation* part (actual AI-powered answers)!

**From the presentation (Slide 8):** This is Step 4 - GENERATE answers using ONLY retrieved docs + cite sources.

## Two Options:

1. **OpenAI API** (GPT-4o-mini) - Fast, high quality, $0.15 per 1M tokens
2. **Google Gemini API** (Gemini 1.5 Flash) - **FREE tier, no credit card needed!**

**Pick based on your needs:**
- **Free tier/testing?** ‚Üí Gemini (generous free quota)
- **Production/quality?** ‚Üí Either! Both are excellent
- **No credit card?** ‚Üí Gemini (truly free tier)

**Remember from the presentation:** Both produce "grounded and verifiable" outputs when given good sources.

**üîí Security:** All API key cells use `getpass` so your key stays hidden (shows dots, not actual characters).

---

## Option 1: OpenAI API (GPT-4o-mini) - OPTIONAL

**Why OpenAI?**
- ‚ö° Very fast (<2 seconds)
- üéØ High quality answers
- üí∞ Cheap ($0.15 per 1M input tokens ‚âà $0.15 per 2,000 questions)
- üîß Easy to use

**Setup:**
1. Get free API key: https://platform.openai.com/api-keys
2. Run the cell below - it will prompt for your key (input hidden)
3. Or press Enter to skip this section

**Note:** You'll need to add a payment method, but usage is extremely cheap for testing.

**üîí For live demos:** Your key input is hidden (shows dots, not characters) so you can safely paste from another monitor.

In [None]:
# Install OpenAI SDK
!pip install -q openai

from openai import OpenAI
from getpass import getpass

print("üîë OpenAI API Key Setup (OPTIONAL)")
print("=" * 60)
print("üìñ Get your API key: https://platform.openai.com/api-keys")
print("üîí Your input will be hidden (shows dots, not characters)")
print("‚è≠Ô∏è  Or press Enter to skip this section")
print("=" * 60)

OPENAI_API_KEY = getpass("\nAPI Key: ").strip()

if OPENAI_API_KEY:
    try:
        openai_client = OpenAI(api_key=OPENAI_API_KEY)
        # Test the key with a simple request
        openai_client.models.list()
        print("\n‚úÖ OpenAI configured! You can now run the OpenAI RAG cells below.")
    except Exception as e:
        print(f"\n‚ùå Error configuring OpenAI: {e}")
        print("   Please check your API key and try again.")
        openai_client = None
else:
    openai_client = None
    print("\n‚è≠Ô∏è  Skipped - You can try this later with your own key")
    print("   The demo (Part 1) works perfectly without any API keys!")

In [None]:
def rag_answer_openai(question, n_results=2, model="gpt-4o-mini"):
    """RAG with OpenAI GPT-4o-mini - Full pipeline from presentation."""
    # Check if OpenAI is configured
    if 'openai_client' not in globals() or openai_client is None:
        return "‚è≠Ô∏è OpenAI not configured. Run the setup cell above and add your API key (or press Enter to skip)."

    # Step 1: RETRIEVE - Find relevant FAQs from semantic card catalog
    query_emb = embedder.encode([question])[0]
    results = collection.query(query_embeddings=[query_emb.tolist()], n_results=n_results)

    if not results['documents'][0]:
        return "No information found in knowledge base."

    # Step 2: Build context from retrieved docs with sources
    context_parts = []
    for i, (doc, meta) in enumerate(zip(results['documents'][0], results['metadatas'][0])):
        context_parts.append(f"Source {i+1} (FAQ: '{meta['question']}'):\n{doc}")
    context = "\n\n".join(context_parts)

    # Step 3: GENERATE - Call OpenAI to create answer based ONLY on context
    response = openai_client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful library assistant. Answer questions using ONLY the provided sources. Always cite which source you used. Be concise and accurate. If the sources don't contain the answer, say so."},
            {"role": "user", "content": f"Question: {question}\n\nSources:\n{context}\n\nAnswer the question using ONLY the sources above. Cite which source(s) you used."}
        ],
        temperature=0.3,  # Lower = more conservative/factual
        max_tokens=200
    )

    return response.choices[0].message.content

print("‚úÖ OpenAI RAG function ready!")
print("\nüí° This implements the four-step RAG pipeline from the presentation:")
print("   1. INGEST: Already done (loaded FAQs)")
print("   2. STORE: Already done (semantic card catalog)")
print("   3. RETRIEVE: Search for relevant sources")
print("   4. GENERATE: Create grounded, verifiable answer with citations")

### Test OpenAI RAG:

In [None]:
# üëá CHANGE THIS!
test_q = "Can I renew a book that's already late?"

print("=" * 80)
print(f"‚ùì QUESTION: {test_q}")
print("=" * 80)
print("\nü§ñ OPENAI GPT-4o-mini RAG ANSWER:\n")

answer = rag_answer_openai(test_q)
print(answer)

if not answer.startswith("‚è≠Ô∏è"):
    print("\n" + "=" * 80)
    print("\n‚ö° Fast! Natural! Cited! GROUNDED AND VERIFIABLE!")
    print("   This is production-ready RAG - 2-5% hallucination rate vs 15-30%")
else:
    print("\nüí° To enable OpenAI: Run the setup cell above with your API key")

---

## Option 2: Google Gemini API (Gemini 1.5 Flash) - OPTIONAL

**Why Gemini?**
- üÜì Generous free tier (15 requests/minute, 1M requests/day!)
- ‚ö° Very fast (<2 seconds)
- üéØ High quality (comparable to GPT-4o-mini)
- üí≥ **No credit card required for free tier!**

**Setup:**
1. Get free API key: https://aistudio.google.com/app/apikey
2. Run the cell below - it will prompt for your key (input hidden)
3. Or press Enter to skip this section

**Perfect for:** Testing, prototyping, or low-volume production (<1M requests/day)

**üîí For live demos:** Your key input is hidden (shows dots, not characters) so you can safely paste from another monitor.

In [None]:
# Install Google Generative AI SDK
!pip install -q google-generativeai

import google.generativeai as genai
from getpass import getpass

print("üîë Google Gemini API Key Setup (OPTIONAL)")
print("=" * 60)
print("üìñ Get your FREE API key: https://aistudio.google.com/app/apikey")
print("üí≥ No credit card required!")
print("üîí Your input will be hidden (shows dots, not characters)")
print("‚è≠Ô∏è  Or press Enter to skip this section")
print("=" * 60)

GEMINI_API_KEY = getpass("\nAPI Key: ").strip()

if GEMINI_API_KEY:
    try:
        genai.configure(api_key=GEMINI_API_KEY)
        gemini_model = genai.GenerativeModel('gemini-1.5-flash')
        # Test the key with a simple request
        gemini_model.generate_content("test")
        print("\n‚úÖ Gemini configured! You can now run the Gemini RAG cells below.")
    except Exception as e:
        print(f"\n‚ùå Error configuring Gemini: {e}")
        print("   Please check your API key and try again.")
        gemini_model = None
else:
    gemini_model = None
    print("\n‚è≠Ô∏è  Skipped - You can try this later with your own key")
    print("   The demo (Part 1) works perfectly without any API keys!")

In [None]:
def rag_answer_gemini(question, n_results=2):
    """RAG with Google Gemini 1.5 Flash - Full pipeline from presentation."""
    # Check if Gemini is configured
    if 'gemini_model' not in globals() or gemini_model is None:
        return "‚è≠Ô∏è Gemini not configured. Run the setup cell above and add your API key (or press Enter to skip)."

    # Step 1: RETRIEVE - Find relevant FAQs from semantic card catalog
    query_emb = embedder.encode([question])[0]
    results = collection.query(query_embeddings=[query_emb.tolist()], n_results=n_results)

    if not results['documents'][0]:
        return "No information found in knowledge base."

    # Step 2: Build context from retrieved docs with sources
    context_parts = []
    for i, (doc, meta) in enumerate(zip(results['documents'][0], results['metadatas'][0])):
        context_parts.append(f"Source {i+1} (FAQ: '{meta['question']}'):\n{doc}")
    context = "\n\n".join(context_parts)

    # Step 3: GENERATE - Build prompt for Gemini
    prompt = f"""You are a helpful library assistant. Answer the question using ONLY the provided sources. Always cite which source you used. Be concise and accurate. If the sources don't contain the answer, say so.

Question: {question}

Sources:
{context}

Answer the question using ONLY the sources above. Cite which source(s) you used."""

    # Step 4: Call Gemini to generate grounded answer
    response = gemini_model.generate_content(
        prompt,
        generation_config=genai.types.GenerationConfig(
            temperature=0.3,  # Lower = more conservative/factual
            max_output_tokens=200,
        )
    )

    return response.text

print("‚úÖ Gemini RAG function ready!")
print("\nüí° This implements the four-step RAG pipeline from the presentation:")
print("   1. INGEST: Already done (loaded FAQs)")
print("   2. STORE: Already done (semantic card catalog)")
print("   3. RETRIEVE: Search for relevant sources")
print("   4. GENERATE: Create grounded, verifiable answer with citations")
print("\nüÜì Free tier: 15 requests/min, 1M requests/day!")

### Test Gemini RAG:

In [None]:
# üëá CHANGE THIS!
test_q = "How do I access databases from my apartment?"

print("=" * 80)
print(f"‚ùì QUESTION: {test_q}")
print("=" * 80)
print("\nü§ñ GOOGLE GEMINI RAG ANSWER:\n")

answer = rag_answer_gemini(test_q)
print(answer)

if not answer.startswith("‚è≠Ô∏è"):
    print("\n" + "=" * 80)
    print("\n‚ö° Fast! Natural! Cited! GROUNDED AND VERIFIABLE!")
    print("   Free tier - perfect for testing and prototyping")
else:
    print("\nüí° To enable Gemini: Run the setup cell above with your API key")

---

## üÜö Compare Both LLMs

Try the same question with both and see how they compare!

**What to look for:**
- Which one cites sources better?
- Which one is more concise?
- Which one sounds more natural?
- Any hallucinations (info not in the sources)?

In [None]:
# üëá CHANGE THIS!
compare_q = "What are the late fees for overdue books?"

print("=" * 80)
print(f"‚ùì QUESTION: {compare_q}")
print("=" * 80)

# Try OpenAI (if configured)
print("\nüü¢ OPENAI GPT-4o-mini:\n")
openai_answer = rag_answer_openai(compare_q)
print(openai_answer)

print("\n" + "=" * 80)

# Try Gemini (if configured)
print("\nüîµ GOOGLE GEMINI 1.5 Flash:\n")
gemini_answer = rag_answer_gemini(compare_q)
print(gemini_answer)

print("\n" + "=" * 80)

# Show observations if at least one is configured
if not openai_answer.startswith("‚è≠Ô∏è") or not gemini_answer.startswith("‚è≠Ô∏è"):
    print("\nüí° Observations:")
    print("   ‚Ä¢ Both are fast (<2 seconds)")
    print("   ‚Ä¢ Both cite sources accurately")
    print("   ‚Ä¢ OpenAI: Requires payment method")
    print("   ‚Ä¢ Gemini: Truly free tier, no credit card needed")
    print("   ‚Ä¢ Quality: Both excellent for library FAQs!")
else:
    print("\nüí° To enable comparison: Run setup cells for OpenAI and/or Gemini above")

---

## üéì What You Learned

Congratulations! You just built a **production-ready RAG system**!

### Key Concepts You Mastered:

1. ‚úÖ **Embeddings** - GPS coordinates for meaning (Slide 11)
2. ‚úÖ **Semantic Card Catalog** - Vector database for meaning-based search (Slide 10)
3. ‚úÖ **Reference Interview + Shelf Browsing** - Retrieval (Step 3)
4. ‚úÖ **Grounded and Verifiable Outputs** - Generation with citations (Step 4)
5. ‚úÖ **Full RAG Pipeline** - All four steps working together

### The Big Idea (from the presentation):

> **RAG = Open-Book Exam for AI**
>
> Instead of hallucinating from memory, the AI looks up the answer in YOUR actual documents!

> **"RAG enhances generative AI by drawing on external sources... allowing outputs to be more grounded and verifiable."**  
> ‚Äî ACRL AI Competencies, 2025

### What Makes Good RAG - Your Expertise Matters:

- üìö **Quality FAQs** - Your source evaluation skills
- üéØ **Good Retrieval** - Your reference interview skills
- ü§ñ **Smart LLM** - Natural, cited responses
- ‚úÖ **Citations** - Your information literacy teaching

### The Statistics (from Slide 3):

| Metric | Standard LLM | With RAG | Improvement |
|--------|-------------|----------|-------------|
| **Hallucination Rate** | 15-30% | 2-5% | **6.5x more reliable** |
| **Citations** | None | Always | **Verifiable** |
| **Outdated Info** | Yes (2021-2023 cutoff) | No (your current docs) | **Always current** |

### Trade-offs You Explored:

| Feature | OpenAI | Gemini |
|---------|--------|--------|
| **Cost** | $0.15/1M tokens | Free tier! |
| **Speed** | <2 sec | <2 sec |
| **Quality** | Excellent | Excellent |
| **Setup** | Needs payment | No credit card |
| **Limits** | Pay as you go | 15/min free |

**For libraries (from Slide 18):**
- üß™ **Prototype** ‚Üí Gemini (free, no approvals needed)
- üöÄ **Production** ‚Üí Either! Both work great
- üîí **Privacy concerns** ‚Üí Local models (see resources)

---

## üöÄ Next Steps

**From Slide 24 - Getting Started This Week:**

### Immediate Next Steps (Easy - 2-3 hours):

1. **Experiment with YOUR library's FAQs**
   - Export from LibGuides/website
   - Use CSV_GENERATION_PROMPT.md (in GitHub repo) with ChatGPT
   - Upload and test!
   - **Apply your source evaluation skills:** Which FAQs work well? Which don't?

2. **Try public RAG tools**
   - NotebookLM (Google) - upload your FAQ PDF
   - Perplexity AI - see web + RAG
   - ChatGPT - upload docs, ask questions
   - **Evaluate critically:** Are citations accurate? Verifiable? Complete?

3. **Experiment with this notebook**
   - Try different questions
   - Adjust n_results (more context)
   - Change temperature (0.1 = conservative, 0.7 = creative)

### Intermediate (Medium):

4. **Add more document types**
   - Policy PDFs
   - Hours/locations
   - Staff directory
   - **Collection development for RAG:** What should be included?

5. **Build a simple UI**
   - Streamlit (easiest)
   - Gradio (good for demos)
   - Flask (more control)

6. **Improve retrieval**
   - Hybrid search (keywords + semantic)
   - Reranking top results
   - Better chunking strategies

### Advanced (Hard):

7. **Add conversation memory**
   - Multi-turn dialogue
   - Context from previous questions
   - User session tracking

8. **Deploy to production**
   - HuggingFace Spaces (free!)
   - Embed in LibGuides
   - Slack/Teams bot

9. **Advanced features**
   - User feedback loop
   - Analytics dashboard
   - A/B testing
   - Multilingual support

### Remember from the Presentation (Slide 25):

**Start Small, Stay Critical:**
- ‚úÖ Pilots, not production (sandbox first)
- ‚úÖ Ethics review before deployment
- ‚úÖ Failure is okay - learning is goal
- ‚úÖ Apply ACRL Section 3.2 framework: evaluate data quality, bias, output sufficiency

**You Already Have RAG Expertise:**
- Source evaluation = document quality
- Cataloging = structuring knowledge  
- Reference interviews = understanding queries
- This is your professional wheelhouse!

---

## üìö Resources

### This Presentation:
- **GitHub Repo:** [github.com/radio-shaq/Lyrasis-slides-11-2025](https://github.com/radio-shaq/Lyrasis-slides-11-2025)
- **CSV Generation Guide:** See `CSV_GENERATION_PROMPT.md` in repo
- **Slides + Speaker Notes:** PowerPoint and markdown versions in repo
- **ACRL Framework Resources:** Links and guidance

### Key Frameworks (from Presentation):

**ACRL AI Competencies for Academic Library Workers (October 2025):**
- Section 3.2: "Evaluate benefits and risks in deployment of AI technologies"
- **Core principle:** "RAG enhances generative AI... allowing outputs to be more grounded and verifiable"
- **Your role:** Apply information literacy to algorithmic systems
- Link in GitHub repo

### Research Citations (from Slide 3):

- **Lewis et al. (2020):** Original RAG paper - "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Meta AI/UCL/NYU)
- **TruthfulQA (Lin et al., 2022):** Baseline LLM hallucination rates (15-30%)
- **Watanabe et al. (2025):** RAG hallucination rates with reliable sources (2-5%)

### API Documentation:
- **OpenAI:** [platform.openai.com/docs](https://platform.openai.com/docs)
- **Google Gemini:** [ai.google.dev/docs](https://ai.google.dev/docs)
- **ChromaDB:** [docs.trychroma.com](https://docs.trychroma.com)
- **Sentence Transformers:** [sbert.net](https://www.sbert.net)

### Learning More:
- **Simon Willison's Blog:** Practical RAG experiments ([simonwillison.net](https://simonwillison.net))
- **LangChain Docs:** Framework for LLM apps ([python.langchain.com](https://python.langchain.com))
- **Pinecone Learning Center:** Vector DB tutorials ([pinecone.io/learn](https://www.pinecone.io/learn/))

### Tools to Explore:
- **LangChain:** RAG framework (makes this easier!)
- **LlamaIndex:** Document-focused RAG
- **Streamlit:** Easy web apps for demos
- **Gradio:** Interactive ML demos

### Library Examples (from Slide 9):
- **Columbia University:** RAG-enhanced CLIO search (2024)
- **Virginia Beach Public Library:** PAGE chatbot (110,000+ monthly engagements)
- **NotebookLM:** Google's RAG tool for research

### Questions?
- **Email:** davidmeincke@protonmail.com
- **GitHub Issues:** Open an issue in the repo!

---

## üéâ Thank You!

You now have:
- ‚úÖ A working RAG demo
- ‚úÖ Understanding of embeddings & semantic card catalogs
- ‚úÖ Integration with real LLMs (OpenAI & Gemini)
- ‚úÖ A foundation to build production systems
- ‚úÖ The knowledge that YOUR librarian skills apply to evaluating these systems

### Remember from the Presentation:

**"Your expertise maps to RAG"** (Slide 5):
- Source evaluation ‚Üí Document quality
- Cataloging ‚Üí Structuring knowledge
- Reference interviews ‚Üí Query understanding
- Information literacy ‚Üí Critical AI evaluation

**"RAG needs librarian expertise"** (Slide 26):
- Evaluating sources
- Understanding user needs
- Teaching critical thinking
- Bringing judgment to information systems

**Go forth and build amazing library AI tools!** üöÄüìö

Apply your professional judgment. Stay critical. Start small.

---

*LYRASIS Presentation, November 2025*  
*Built with ‚ù§Ô∏è for libraries*

*"I love how AI tools can help us build small shareable code projects much faster. But the librarian expertise - evaluating sources, understanding user needs, teaching critical thinking - that's timeless. RAG needs that. Students need that. You have that."*