# Query Quizlet Flashcards

This notebook provides an interface to query your embedded Quizlet flashcards from the vector database.

**Features:**
- Semantic search across all flashcards
- Filter by source file (subject)
- Adjustable number of results
- View detailed metadata

## Setup

In [1]:
from quizlet_rag import QuizletRAGPipeline
from pathlib import Path

  from .autonotebook import tqdm as notebook_tqdm


## Configuration

In [2]:
# Configuration (should match preprocess_flashcards.ipynb)
VECTOR_DB_PATH = "./quizlet_db"
COLLECTION_NAME = "quizlet_flashcards"
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

# Query settings
DEFAULT_K = 5  # Default number of results to return

## Initialize Pipeline & Load Vector Store

In [3]:
# Check if vector store exists
if not Path(VECTOR_DB_PATH).exists():
    print("Vector store not found!")
    print(f"Expected location: {VECTOR_DB_PATH}")
    print("\nPlease run 'preprocess_flashcards.ipynb' first to create the vector store.")
else:
    # Initialize pipeline
    pipeline = QuizletRAGPipeline(
        embedding_model=EMBEDDING_MODEL,
        vector_store_path=VECTOR_DB_PATH
    )
    
    # Load existing vector store
    pipeline.load_existing_vectorstore(collection_name=COLLECTION_NAME)

Using device: cuda
âœ“ Loaded existing vector store from ./quizlet_db


## Simple Query

In [4]:
query = "What are some fields of study in artificial intelligence?"

print(f"Query: '{query}'")

results = pipeline.query(query, k=3)

for i, doc in enumerate(results, 1):
    print(f"\nResult {i}:")
    print(doc.page_content)
    print(f"- Source: {doc.metadata.get('source_name')}")
    print(f"- Card #{doc.metadata.get('card_number')}")
    print(f"- Flashcard Type: {doc.metadata.get('flashcard_type')}")

Query: 'What are some fields of study in artificial intelligence?'

Result 1:
Term: artificial intelligence
computer science
Definition: Which of the following are popular college majors/emphasis areas for college students interested in preparing for a career in AI?
- Source: ai-4.json
- Card #40
- Flashcard Type: question_to_answer

Result 2:
Term: examples of applications of ai
Definition: 1. finance (to detect fraud)
2. medicine (diagnosis, medical images)
3. robotics
4. online/telephone customer service
5. transportation (cars and "fuzzy logic", traffic light systems, self driving cars)
6. telecommunications (maintenance)
7. toys and games
8. music
9. data mining
10. spam filtering
- Source: ai-5.json
- Card #22
- Flashcard Type: multiple_choice

Result 3:
Term: Which of the following are popular AI career fields?
Definition: AI engineer
user experience
- Source: ai-3.json
- Card #37
- Flashcard Type: question_to_answer


## Query with Source Filter

In [5]:
# Query only specific flashcard set
query = "Natural Language Processing"
source_filter = "ai-1.json"

print(f"Query: '{query}'")
print(f"Filtering by source: {source_filter}")

results = pipeline.query(
    query,
    k=3,
    filter_metadata={"source_name": source_filter}
)

if results:
    for i, doc in enumerate(results, 1):
        print(f"\nResult {i}:")
        print(doc.page_content)
        print(f"- Card #{doc.metadata.get('card_number')}")
        print(f"- Flashcard Type: {doc.metadata.get('flashcard_type')}")
else:
    print(f"\nNo results found for '{query}' in {source_filter}")

Query: 'Natural Language Processing'
Filtering by source: ai-1.json

Result 1:
Term: Natural Language Processing
Definition: A field of computer science and linguistics concerned with the interactions between computers and human languages.
- Card #4
- Flashcard Type: term_definition

Result 2:
Term: Computational Linguistics
Definition: An interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective.
- Card #49
- Flashcard Type: term_definition

Result 3:
Term: Frames
Definition: This concept, proposed by Marvin Minsky, "is an artificial intelligence data structure used to divide knowledge into substructures by representing "stereotyped situations." They are connected together to form a complete idea.
- Card #63
- Flashcard Type: multiple_choice


## Batch Queries

In [6]:
# Query multiple questions at once
queries = [
    "ai",
    "nlp",
    "machine learning",
]

k = 3  # Results per query

for query in queries:
    print(f"Query: '{query}'")
    
    results = pipeline.query(query, k=k)
    
    for i, doc in enumerate(results, 1):
        print(f"\n  {i}. {doc.page_content}")
        print(f"     Source: {doc.metadata.get('source_name')} | Card #{doc.metadata.get('card_number')} | Flashcard Type: {doc.metadata.get('flashcard_type')}")
    
    print('\n', '=' * 60, '\n')

Query: 'ai'

  1. Term: AI
Definition: Artificial Intelligence
     Source: ai-4.json | Card #53 | Flashcard Type: term_definition

  2. Term: According to John McCarthy from Stanford University, artificial intelligence (AI) is "the science and engineering of making intelligent machines and intelligent computer programs. It is related to the similar task of using computers to understand Blank______ intelligence, but AI does not have to confine itself to biologically observable methods."
Definition: human
     Source: ai-3.json | Card #4 | Flashcard Type: fill_in_blank

  3. Term: Artificial Intelligence
Definition: refers to the art and science of creating computer systems that stimulate human though and behavior
     Source: ai-2.json | Card #1 | Flashcard Type: term_definition


Query: 'nlp'

  1. Term: NLP
Definition: Natural Language Processing
     Source: ai-4.json | Card #54 | Flashcard Type: term_definition

  2. Term: NLU
Definition: Natural Language Understanding
     Source:

## Interactive Query Loop

In [7]:
# Interactive query mode
print("Interactive Query Mode")
print("Type 'quit' or 'exit' to stop\n")

while True:
    query = input("\nEnter your query: ").strip()
    
    if query.lower() in ['quit', 'exit', 'q']:
        print("Goodbye!")
        break
    
    if not query:
        continue
    
    results = pipeline.query(query, k=DEFAULT_K)
    
    for i, doc in enumerate(results, 1):
        print(f"\nResult {i}:")
        print(doc.page_content)
        print(f"- Source: {doc.metadata.get('source_name')} | Card #{doc.metadata.get('card_number')} | Flashcard Type: {doc.metadata.get('flashcard_type')}")
    
    print('\n', '=' * 60, '\n')

Interactive Query Mode
Type 'quit' or 'exit' to stop


Result 1:
Term: 67
Definition: In a recent study by Deloitte, __percent of executives responded they are "not comfortable" using data from advanced AI systems.
- Source: ai-4.json | Card #30 | Flashcard Type: term_definition

Result 2:
Term: What does "epoch" mean in training? A. One full pass through the training dataset B. One gradient calculation C. A hyperparameter that controls complexity D. A type of loss function
Definition: A
- Source: high-quality-testcards.json | Card #44 | Flashcard Type: multiple_choice

Result 3:
Term: machine learning
Definition: the ability to learn from experience to continuously improve performance
- Source: ai-2.json | Card #7 | Flashcard Type: term_definition

Result 4:
Term: Machine learning
Deep Learning
Definition: Two subfields if AI include which of the following?
- Source: ai-4.json | Card #2 | Flashcard Type: question_to_answer

Result 5:
Term: human brain can process ___ instructions per 