# 02 — Query Demo (Answer from Context + Citations)
Ask a question like 'Where did we define precision vs. recall?' and get (deck, page) cites.

## Load store & model (auto-CUDA on Deck)

In [1]:
import os

# Tell Hugging Face to skip TensorFlow/Flax so they never import TensorFlow (TF).
os.environ["TRANSFORMERS_NO_TF"] = "1"
os.environ["TRANSFORMERS_NO_FLAX"] = "1"

# Quiet TF logs if something still pulls it in.
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"  # 1=INFO, 2=WARNING, 3=ERROR


## Search function

# Ipmorts

In [10]:
%pip -q install importnb


Note: you may need to restart the kernel to use updated packages.


In [12]:

from pathlib import Path
import os, sys, json, faiss 
from importnb import Notebook

from nb01_setup_and_ingest import *


In [9]:
# 1) Point to the folder that contains the notebook you want to import
NB_DIR = Path(os.getenv("SLIDEHUNTER_BASE", ".")).resolve() / "notebooks"

# 2) Make sure the notebook file name is a valid module name:
#    rename to: notebooks/nb01_setup_and_ingest.ipynb
assert (NB_DIR / "nb01_setup_and_ingest.ipynb").exists(), "Notebook not found at NB_DIR"

# 3) Put that folder on sys.path so imports can see it
sys.path.insert(0, str(NB_DIR))

# 4) Import the notebook as a module
with Notebook():
    import nb01_setup_and_ingest as nb01

# 5) Grab the function
search = nb01.search


  '''<div style="border: 2px solid #ccc; border-radius: 12px; padding: 20px; max-width: 950px; margin: auto; background-color: #1e1e1e; color: #f0f0f0; font-family: Arial, sans-serif; line-height: 1.6;">


cuda_available: False
torch.cuda: None
device: CPU only

Foundations '25 Data Science
Foundations Course
IF '25 Data Science Cohort A
IF '25 NY Career Readiness and Success
  Module_id: 1118
  Module: Fellow Resources
 - Item: Fellow Success Resources (Page)
  Module_id: 1239
  Module: Phase 2 (6/9-8/29)
 - Item: Homework: Option 1 - Weekly Job Applications & Progress Report (Due August 30) (Assignment)
 - Item: P2W1 (6/12) NO CAREER CLASS - TECHNICAL CLASS (SubHeader)
 - Item: P2W2 (6/16) Bloomberg Ideathon (SubHeader)
 - Item: Homework (SubHeader)
 - Item: Homework: Watch Hackathon Video (Assignment)
 - Item: Homework: Upwardly Global Learning Paths: Tech Market/Resume/Cover Letter (Assignment)
 - Item: Homework: Draft Resume (Assignment)
 - Item: P2W2 NO CLASS MEETING 6/19 Juneteenth TKH Closed (SubHeader)
 - Item: P2W3 (6/26) Bloomberg Hackathon (SubHeader)
 - Item: Homework (SubHeader)
 - Item: Homework: Hackathon Activity Log + Judges' Feedback (Assignment)
 - Item: P2W4 (7/3) Re

Batches:   0%|          | 0/6 [00:00<?, ?it/s]

FAISS ntotal: 323

Q: Where did we define precision vs. recall?   [scope=technical]
  0.382 :: IF '25 Data Science Cohort A > P2W3 (6/23-6/27) Classification Algorithms > 💻 W3D2 (6/24) Logistic Regression Accuracy Metrics (Page)  [https://tkh.instructure.com/courses/172/pages/w3d2-6-slash-24-logistic-regression-accuracy-metrics]
  0.306 :: Foundations '25 Data Science > Week 5:  Statistics(Feb. 24th- Feb. 27th) > What is Data Science? (Page)  [https://tkh.instructure.com/courses/165/pages/what-is-data-science]
  0.276 :: IF '25 Data Science Cohort A > P2W11 (8/18-8/22) Agents & End of Phase Project > 💻 W11D1 (8/18) Applied LLM Review & AI Agents (Page)  [https://tkh.instructure.com/courses/172/pages/w11d1-8-slash-18-applied-llm-review-and-ai-agents]
  0.263 :: IF '25 Data Science Cohort A > P2W9 (8/4-8/8) NLP Foundations & Transformers > 📚 P2W9 Overview & Lesson Plan (Page)  [https://tkh.instructure.com/courses/172/pages/p2w9-overview-and-lesson-plan]

Q: tips for a resume and cover le

In [13]:
# Point this ONCE to your repo root (hardcode or env var)
BASE = Path(os.getenv("SLIDEHUNTER_BASE", r"C:\path\to\SLIDEHUNTER")).resolve()

INDEX_PATH = BASE / "data" / "faiss" / "canvas.index"
FACTS_PATH = BASE / "data" / "faiss" / "facts.json"

print("CWD:", Path.cwd())
print("Reading:", INDEX_PATH)
print("Reading:", FACTS_PATH)

# Load FAISS index (can't use open(); use faiss helper)
#index = faiss.read_index(str(INDEX_PATH))

# Load facts/metas with plain open()
with open(FACTS_PATH, "r", encoding="utf-8") as f:
    data = json.load(f)
facts, metas = data["facts"], data["metas"]

print("Loaded:", len(facts), "facts")


CWD: c:\Users\oneps\Documents\Research_Dev_Documents\DataEden_Github\TEPP-2-SlideHunt-Repo\SlideHunt\notebooks
Reading: C:\Users\oneps\Documents\Research_Dev_Documents\DataEden_Github\TEPP-2-SlideHunt-Repo\SlideHunt\data\faiss\canvas.index
Reading: C:\Users\oneps\Documents\Research_Dev_Documents\DataEden_Github\TEPP-2-SlideHunt-Repo\SlideHunt\data\faiss\facts.json
Loaded: 323 facts


In [None]:
# makes sure the folder exits
# saves the FAISS index and the facts/metadata
# then prints a confirmation
def save_store(index, facts, metas, store_dir=STORE_DIR):
    Path(store_dir).mkdir(parents=True, exist_ok=True)
    faiss.write_index(index, os.path.join(store_dir, "canvas.index"))
    with open(os.path.join(store_dir, "facts.json"), "w", encoding="utf-8") as f:
        json.dump({"facts": facts, "metas": metas}, f, ensure_ascii=False)
    print(" saved:", INDEX_PATH, "and", FACTS_PATH)



# loads the vector index back into memory
# opens the JSON file and loaads the facts and metadata
# returns them to allow for querying without having to recompute the embeddings
def load_store(store_dir=STORE_DIR):
    idx = faiss.read_index(os.path.join(store_dir, "canvas.index"))
    with open(os.path.join(store_dir, "facts.json"), "r", encoding="utf-8") as f:
        data = json.load(f)
    print(" loaded:", os.path.join(store_dir, "canvas.index"), "and facts.json")
    return idx, data["facts"], data["metas"]

# This saves preceding index, facts, and metadata right after building
#save_store(index, facts, metas)

In [None]:

# Reload in a fresh session--Refresh style
# index, facts, metas = load_store()
# route_emb = {k: model.encode([v], normalize_embeddings=True).astype("float32") for k,v in ROUTE_DESC.items()}


# We'll use this to reload:
# index = faiss.read_index("data/faiss/canvas.index")
# facts_meta = json.load(open("data/faiss/facts.json","r",encoding="utf-8"))
# facts, metas = facts_meta["facts"], facts_meta["metas"]
