<div style="border: 2px solid #ccc; border-radius: 12px; padding: 20px; max-width: 550px; margin: auto; background-color: #1e1e1e; color: #f0f0f0; font-family: Arial, sans-serif; line-height: 1.6;">

  <div style="text-align: center; margin-bottom: 20px;">
    <img src="..\assets\images\SlideHunter_LogoV2.png" 
         alt="SlideHunter Mockup Logo"
         style="width: 60%; max-width: 60%; height: auto; border-radius: 8px; box-shadow: 0 0 10px rgba(0,0,0,0.4);">
  </div>

  <blockquote style="margin: 0; padding: 10px 20px; border-left: 4px solid #4faaff;">
    <p><strong>
      SlideHunter App Logo😁
    </strong></p>
    <p>
      SlideHunter — AI-Powered Lecture Navigator:
      <a href="..\assets\images\SlideHunter_LogoV2.png" target="_blank" style="color: #4faaff;">
        Find exactly where a concept lives in course slides and notes. Lightning-fast answers with pinpoint slide/page citations, powered by modern ML retrieval (FAISS + BM25 + reranker) and concise GPT-4o-mini summarization with google/flan-t5-base model as fallback.
      </a>
    </p>
  </blockquote>

</div>

# Query Demo (Answer from Context + Citations)
- Ask a question like 'Where did we define precision vs. recall?' and get (deck, page) cites.

## Load store & model (auto-CUDA on Deck)

In [23]:
import os

# Tell Hugging Face to skip TensorFlow/Flax so they never import TensorFlow (TF).
os.environ["TRANSFORMERS_NO_TF"] = "1"
os.environ["TRANSFORMERS_NO_FLAX"] = "1"

# Quiet TF logs if something still pulls it in.
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"  # 1=INFO, 2=WARNING, 3=ERROR


# Ipmorts

In [28]:
# imports
import os, json, faiss
from pathlib import Path

# Imports for minimal demo using saved store
from scripts.nb01_helper import load_store, make_router, search

# Recreate the SAME embedding model used when the index was built
from sentence_transformers import SentenceTransformer

In [25]:
# Point this ONCE to the repo root (hardcode or env var)
# Will use when serch() import from nb01_setuup_and_ingest is resolved
BASE = Path(os.getenv("SLIDEHUNT", r"C:\Users\oneps\Documents\Research_Dev_Documents\DataEden_Github\TEPP-2-SlideHunt-Repo\SlideHunt")).resolve()

# Paths under the repo
INDEX_PATH = BASE / "data" / "faiss" / "canvas.index"
FACTS_PATH = BASE / "data" / "faiss" / "facts.json"

# Show paths
print("CWD:", Path.cwd())
print("Reading:", INDEX_PATH)
print("Reading:", FACTS_PATH)

# Load FAISS index 
index = faiss.read_index(str(INDEX_PATH))

# Load facts/metas from JSON
with open(FACTS_PATH, "r", encoding="utf-8") as file:
    data = json.load(file)
facts, metas = data["facts"], data["metas"]

print("Loaded:", len(facts), "facts")


CWD: c:\Users\oneps\Documents\Research_Dev_Documents\DataEden_Github\TEPP-2-SlideHunt-Repo\SlideHunt\notebooks
Reading: C:\Users\oneps\Documents\Research_Dev_Documents\DataEden_Github\TEPP-2-SlideHunt-Repo\SlideHunt\data\faiss\canvas.index
Reading: C:\Users\oneps\Documents\Research_Dev_Documents\DataEden_Github\TEPP-2-SlideHunt-Repo\SlideHunt\data\faiss\facts.json
Loaded: 324 facts


In [26]:
# Load saved artifacts (FAISS index, fact snippets, metadata)
# These were previously built & saved with save_store()
index, facts, metas = load_store()   # reads from data/faiss/{canvas.index,facts.json}

# important: query embeddings must match the ones stored in FAISS
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Precompute route embeddings for auto domain detection ("technical", "career", etc.)
router = make_router(model)

# Sanity check: index size must align with number of facts/metas
assert index.ntotal == len(facts) == len(metas), "Index/facts/metas are misaligned!"

# Test queries to validate the pipeline
tests = [
    "Where did we define precision vs. recall?",
    "tips for a resume and cover letter?",
    "What lecture slides did we learn about control flow?",
]

# Run each query through the search function
for q in tests:
    scope, hits = search(
        query=q,               # natural language question
        model=model,           # sentence-transformers embedder
        index=index,           # FAISS vector index
        facts=facts,           # list of fact text snippets
        metas=metas,           # metadata dicts (course/module/item info)
        router_emb=router,     # domain router embeddings
        k=4,                   # return top-4 matches
        scope="auto"           # let router decide technical/career/all
    )

    # Print query + resolved scope
    print(f"\nQ: {q}   [scope={scope}]")

    # If no results, notify and skip
    if not hits:
        print("  (no hits)")
        continue

    # Otherwise, print out citation-style matches with score
    for h in hits:
        m = h["meta"]
        cite = f"{m['course_name']} > {m['module_name']} > {m['item_title']} ({m['type']})"
        if m.get("url"):  # add URL if available
            cite += f"  [{m['url']}]"
        print(f"  {h['score']:.3f} :: {cite}")



Q: Where did we define precision vs. recall?   [scope=technical]
  0.382 :: IF '25 Data Science Cohort A > P2W3 (6/23-6/27) Classification Algorithms > 💻 W3D2 (6/24) Logistic Regression Accuracy Metrics (Page)  [https://tkh.instructure.com/courses/172/pages/w3d2-6-slash-24-logistic-regression-accuracy-metrics]
  0.306 :: Foundations '25 Data Science > Week 5:  Statistics(Feb. 24th- Feb. 27th) > What is Data Science? (Page)  [https://tkh.instructure.com/courses/165/pages/what-is-data-science]
  0.276 :: IF '25 Data Science Cohort A > P2W11 (8/18-8/22) Agents & End of Phase Project > 💻 W11D1 (8/18) Applied LLM Review & AI Agents (Page)  [https://tkh.instructure.com/courses/172/pages/w11d1-8-slash-18-applied-llm-review-and-ai-agents]
  0.263 :: IF '25 Data Science Cohort A > P2W9 (8/4-8/8) NLP Foundations & Transformers > 📚 P2W9 Overview & Lesson Plan (Page)  [https://tkh.instructure.com/courses/172/pages/p2w9-overview-and-lesson-plan]

Q: tips for a resume and cover letter?   [scope=car