# Advanced Vector Store Queries

This notebook demonstrates detailed querying of the Qdrant vector store containing:
- **Resume data** (from `resume_ale.md`): work experience, education, skills
- **Personality traits** (from `personalities_16.md`): personality, strengths, weaknesses

We'll explore:
1. Collection metadata and structure
2. Filtering by section type
3. Viewing embeddings and payloads
4. Semantic search examples
5. Specific queries for resume vs personality data

## 1. Initialize Vector Store Connection

In [1]:
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
from pathlib import Path
import json

# Initialize Qdrant client with local storage
storage_path = "../vector_db/qdrant_storage"
client = QdrantClient(path=storage_path)

# Collection name
collection_name = "resume_data"

print("‚úÖ Connected to Qdrant vector store")
print(f"üìÇ Storage path: {Path(storage_path).absolute()}")

‚úÖ Connected to Qdrant vector store
üìÇ Storage path: c:\Users\Ale\Documents\Data-Science-Projects\GitHub\Resume_Claude_SDK_Agent\notebooks\..\vector_db\qdrant_storage


## 2. Explore Collection Structure

In [2]:
# Get all collections
collections = client.get_collections()
print("üìö Available Collections:")
for collection in collections.collections:
    print(f"   - {collection.name}")

# Get detailed collection info
if client.collection_exists(collection_name):
    collection_info = client.get_collection(collection_name)
    
    print(f"\nüìä Collection '{collection_name}' Details:")
    print(f"   Total documents: {collection_info.points_count}")
    print(f"   Vector dimensions: {collection_info.config.params.vectors.size}")
    print(f"   Distance metric: {collection_info.config.params.vectors.distance}")
    print(f"   Status: {collection_info.status}")
    
    # Count by section type
    from collections import Counter
    all_records, _ = client.scroll(
        collection_name=collection_name,
        limit=1000,
        with_payload=True,
        with_vectors=False
    )
    
    section_counts = Counter(r.payload.get('section_type', 'unknown') for r in all_records)
    
    print(f"\nüìà Documents by Section Type:")
    for section, count in sorted(section_counts.items()):
        print(f"   {section:20s}: {count:3d} chunks")
else:
    print(f"‚ùå Collection '{collection_name}' not found")

üìö Available Collections:
   - resume_data

üìä Collection 'resume_data' Details:
   Total documents: 49
   Vector dimensions: 1536
   Distance metric: Cosine
   Status: green

üìà Documents by Section Type:
   continuing_studies  :   7 chunks
   education           :   2 chunks
   personal_info       :   1 chunks
   personality         :  14 chunks
   professional_summary:   1 chunks
   skills              :   5 chunks
   work_experience     :  19 chunks


## 3. Query Resume Data (from resume_ale.md)

### 3.1 View Work Experience with Full Metadata

In [3]:
# Filter for work experience entries
work_filter = Filter(
    must=[
        FieldCondition(
            key="section_type",
            match=MatchValue(value="work_experience")
        )
    ]
)

work_records, _ = client.scroll(
    collection_name=collection_name,
    scroll_filter=work_filter,
    limit=20,
    with_payload=True,
    with_vectors=False  # Set True to see embeddings
)

print(f"üíº Work Experience Chunks (showing {len(work_records)}):\n")

for i, record in enumerate(work_records, 1):
    payload = record.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"CHUNK {i} - ID: {record.id}")
    print(f"{'='*80}")
    print(f"üìÑ Content (Achievement):")
    print(f"   {payload.get('content', 'N/A')}")
    print(f"\nüè¢ Metadata:")
    print(f"   Company:        {metadata.get('company', 'N/A')}")
    print(f"   Position:       {metadata.get('position', 'N/A')}")
    print(f"   Start Date:     {metadata.get('start_date', 'N/A')}")
    print(f"   End Date:       {metadata.get('end_date', 'N/A')}")
    print(f"   Source File:    {payload.get('source_file', 'N/A')}")
    print(f"   Section Type:   {payload.get('section_type', 'N/A')}")
    print()

üíº Work Experience Chunks (showing 19):

CHUNK 1 - ID: 0a064195-2a60-4559-ac32-61ceb3b48bcb
üìÑ Content (Achievement):
   Data Scientist II: Developed a Power BI dashboard to track changes in imported food volumes, collaborating with import inspectors to define metrics and design visualizations in Power BI for stakeholder use.

üè¢ Metadata:
   Company:        Canadian Food Inspection Agency
   Position:       Data Scientist II
   Start Date:     March-2025
   End Date:       November-2025
   Source File:    resume_ale.md
   Section Type:   work_experience

CHUNK 2 - ID: 2910f376-6469-4bac-aec3-8627504b7d30
üìÑ Content (Achievement):
   Data Scientist II: Automated forecasting and reduced manual effort by 40 hours per month by deploying the forecasting pipeline and scheduling automated runs using Python and Microsoft Fabric.

üè¢ Metadata:
   Company:        Canadian Food Inspection Agency
   Position:       Data Scientist II
   Start Date:     March-2025
   End Date:       Novem

### 3.2 View Work Experience WITH Embeddings

Each chunk has a 1536-dimensional embedding vector generated by OpenAI's `text-embedding-3-small` model.

In [4]:
# Get one work experience record WITH embeddings
work_with_vector, _ = client.scroll(
    collection_name=collection_name,
    scroll_filter=work_filter,
    limit=20,
    with_payload=True,
    with_vectors=True  # ‚Üê Include embeddings
)

if work_with_vector:
    record = work_with_vector[0]
    vector = record.vector
    
    print(f"üî¢ Embedding Vector Details:")
    print(f"   Vector dimensions: {len(vector)}")
    print(f"   Vector type: {type(vector)}")
    print(f"   First 10 values: {vector[:10]}")
    print(f"   Last 10 values:  {vector[-10:]}")
    print(f"\nüìä Vector Statistics:")
    import numpy as np
    vector_array = np.array(vector)
    print(f"   Min value:  {vector_array.min():.6f}")
    print(f"   Max value:  {vector_array.max():.6f}")
    print(f"   Mean value: {vector_array.mean():.6f}")
    print(f"   Std dev:    {vector_array.std():.6f}")
    
    print(f"\nüìÑ Associated Content:")
    print(f"   {record.payload.get('content', 'N/A')[:150]}...")

üî¢ Embedding Vector Details:
   Vector dimensions: 1536
   Vector type: <class 'list'>
   First 10 values: [-0.008849185891449451, -0.010623646900057793, 0.050725314766168594, -0.03128138184547424, -0.003268592059612274, 0.016022169962525368, -0.019386133179068565, 0.020599933341145515, 0.016264930367469788, 0.028761299327015877]
   Last 10 values:  [0.0031183119863271713, 0.010646766982972622, 0.013074368238449097, 0.02090049348771572, 0.02324717491865158, -0.02385985665023327, -0.001502800965681672, 0.0018813912756741047, 0.02025313302874565, 0.0021877314429730177]

üìä Vector Statistics:
   Min value:  -0.097289
   Max value:  0.109034
   Mean value: 0.001180
   Std dev:    0.025488

üìÑ Associated Content:
   Data Scientist II: Developed a Power BI dashboard to track changes in imported food volumes, collaborating with import inspectors to define metrics an...


### 3.3 Query Education & Skills Sections

In [5]:
# Query education entries
education_filter = Filter(
    must=[FieldCondition(key="section_type", match=MatchValue(value="education"))]
)

education_records, _ = client.scroll(
    collection_name=collection_name,
    scroll_filter=education_filter,
    limit=20,
    with_payload=True
)

print(f"üéì Education Entries ({len(education_records)}):\n")
for i, record in enumerate(education_records, 1):
    payload = record.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"EDUCATION CHUNK {i}")
    print(f"{'='*80}")
    print(f"üìù Degree:        {metadata.get('degree', 'N/A')}")
    print(f"üè´ Institution:   {metadata.get('institution', 'N/A')}")
    print(f"üìÖ Year:          {metadata.get('year', 'N/A')}")
    print(f"üìÇ Source File:   {payload.get('source_file', 'N/A')}")
    print(f"üè∑Ô∏è  Section Type:  {payload.get('section_type', 'N/A')}")
    print(f"\nüìÑ Content:\n   {payload.get('content', 'N/A')}")
    print(f"\nüîç Full Metadata: {json.dumps(metadata, indent=2)}")
    print()

# Query skills
skills_filter = Filter(
    must=[FieldCondition(key="section_type", match=MatchValue(value="skills"))]
)

skills_records, _ = client.scroll(
    collection_name=collection_name,
    scroll_filter=skills_filter,
    limit=20,
    with_payload=True
)

print(f"\nüõ†Ô∏è  Skills Entries ({len(skills_records)}):\n")
for i, record in enumerate(skills_records, 1):
    payload = record.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"SKILL CHUNK {i}")
    print(f"{'='*80}")
    print(f"üìÇ Category:      {metadata.get('category', 'N/A')}")
    print(f"üìÑ Skills:        {payload.get('content', 'N/A')}")
    print(f"üìÅ Source File:   {payload.get('source_file', 'N/A')}")
    print(f"üè∑Ô∏è  Section Type:  {payload.get('section_type', 'N/A')}")
    print(f"\nüîç Full Metadata: {json.dumps(metadata, indent=2)}")
    print()

üéì Education Entries (2):

EDUCATION CHUNK 1
üìù Degree:        BSc in Biotechnology Engineering
üè´ Institution:   Tec de Monterrey
üìÖ Year:          N/A
üìÇ Source File:   resume_ale.md
üè∑Ô∏è  Section Type:  education

üìÑ Content:
   BSc in Biotechnology Engineering from Tec de Monterrey. August-2012 - May-2017 | Mexico

üîç Full Metadata: {
  "degree": "BSc in Biotechnology Engineering",
  "institution": "Tec de Monterrey",
  "dates": "August-2012 - May-2017 | Mexico"
}

EDUCATION CHUNK 2
üìù Degree:        MSc in Food Science
üè´ Institution:   University of British Columbia
üìÖ Year:          N/A
üìÇ Source File:   resume_ale.md
üè∑Ô∏è  Section Type:  education

üìÑ Content:
   MSc in Food Science from University of British Columbia. January-2019 - October-2020 | Canada

üîç Full Metadata: {
  "degree": "MSc in Food Science",
  "institution": "University of British Columbia",
  "dates": "January-2019 - October-2020 | Canada"
}


üõ†Ô∏è  Skills Entries (5):

SKI

## 4. Query Personality Traits Data (from personalities_16.md)

### 4.1 View Personality Sections

In [6]:
# Query personality trait chunks (main sections like "Personality Traits", "Career Preferences")
personality_filter = Filter(
    must=[FieldCondition(key="section_type", match=MatchValue(value="personality"))]
)

personality_records, _ = client.scroll(
    collection_name=collection_name,
    scroll_filter=personality_filter,
    limit=20,
    with_payload=True
)

print(f"üß† Personality Trait Chunks ({len(personality_records)}):\n")

for i, record in enumerate(personality_records, 1):
    payload = record.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"PERSONALITY CHUNK {i}")
    print(f"{'='*80}")
    print(f"üìù Section:       {metadata.get('section', 'N/A')}")
    print(f"üìÇ Source File:   {payload.get('source_file', 'N/A')}")
    print(f"üè∑Ô∏è  Section Type:  {payload.get('section_type', 'N/A')}")
    print(f"\nüìÑ Content:\n   {payload.get('content', 'N/A')}")
    print(f"\nüîç Full Metadata: {json.dumps(metadata, indent=2)}")
    print()

üß† Personality Trait Chunks (14):

PERSONALITY CHUNK 1
üìù Section:       Big-Picture Focus
üìÇ Source File:   personalities_16.md
üè∑Ô∏è  Section Type:  personality

üìÑ Content:
   I prefer focusing on overarching goals and strategies rather than micromanaging small details.

üîç Full Metadata: {
  "section": "Big-Picture Focus"
}

PERSONALITY CHUNK 2
üìù Section:       Conceptual Thinking
üìÇ Source File:   personalities_16.md
üè∑Ô∏è  Section Type:  personality

üìÑ Content:
   I effortlessly grasp abstract, complex ideas, making me particularly suited to roles that require strategic analysis and long-term planning.

üîç Full Metadata: {
  "section": "Conceptual Thinking"
}

PERSONALITY CHUNK 3
üìù Section:       Innovative Mindset
üìÇ Source File:   personalities_16.md
üè∑Ô∏è  Section Type:  personality

üìÑ Content:
   My ability to see possibilities others overlook often helps me find smarter solutions and effective improvements at work.

üîç Full Metadata: {
  "

### 4.2 View Strength Chunks

Strengths are subsections (### headers) under the main "Strengths" section.

In [7]:
# Query strength chunks
strength_filter = Filter(
    must=[FieldCondition(key="section_type", match=MatchValue(value="strength"))]
)

strength_records, _ = client.scroll(
    collection_name=collection_name,
    scroll_filter=strength_filter,
    limit=10,
    with_payload=True
)

print(f"üí™ Strength Chunks ({len(strength_records)}):\n")

for i, record in enumerate(strength_records, 1):
    payload = record.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"STRENGTH {i}: {metadata.get('subsection', 'N/A')}")
    print(f"{'='*80}")
    print(f"üìù Parent Section: {metadata.get('section', 'N/A')}")
    print(f"üìå Subsection:     {metadata.get('subsection', 'N/A')}")
    print(f"üìÇ Source File:    {payload.get('source_file', 'N/A')}")
    print(f"üè∑Ô∏è  Section Type:   {payload.get('section_type', 'N/A')}")
    print(f"\nüìÑ Content:\n   {payload.get('content', 'N/A')}")
    print(f"\nüîç Full Metadata: {json.dumps(metadata, indent=2)}")
    print()

üí™ Strength Chunks (0):



### 4.3 View Weakness Chunks (Excluded from Cover Letters)

In [8]:
# Query weakness chunks (these are stored but NOT retrieved for cover letters)
weakness_filter = Filter(
    must=[FieldCondition(key="section_type", match=MatchValue(value="weakness"))]
)

weakness_records, _ = client.scroll(
    collection_name=collection_name,
    scroll_filter=weakness_filter,
    limit=10,
    with_payload=True
)

print(f"‚ö†Ô∏è  Weakness Chunks ({len(weakness_records)}):")
print(f"   (Note: These are intentionally EXCLUDED from cover letter retrieval)\n")

for i, record in enumerate(weakness_records, 1):
    payload = record.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"WEAKNESS {i}: {metadata.get('subsection', 'N/A')}")
    print(f"{'='*80}")
    print(f"üìù Parent Section: {metadata.get('section', 'N/A')}")
    print(f"üìå Subsection:     {metadata.get('subsection', 'N/A')}")
    print(f"üìÇ Source File:    {payload.get('source_file', 'N/A')}")
    print(f"üè∑Ô∏è  Section Type:   {payload.get('section_type', 'N/A')}")
    print(f"\nüìÑ Content:\n   {payload.get('content', 'N/A')}")
    print(f"\nüîç Full Metadata: {json.dumps(metadata, indent=2)}")
    print()

‚ö†Ô∏è  Weakness Chunks (0):
   (Note: These are intentionally EXCLUDED from cover letter retrieval)



## 5. Semantic Search Examples

### 5.1 Search for Python-Related Work Experience

This demonstrates how semantic search works with embeddings.

In [12]:
# Import OpenAI embeddings to create query vectors
import sys
sys.path.append('..')
from src.core.embeddings import OpenAIEmbeddings

# Initialize embedder
embedder = OpenAIEmbeddings()

# Create a query for Python-related achievements
query_text = "Python data analysis ETL pipeline machine learning"
query_vector = embedder.embed_query(query_text)

print(f"üîç Semantic Search Query: '{query_text}'")
print(f"   Query vector dimensions: {len(query_vector)}")

# Search with vector similarity using query_points (newer API)
results = client.query_points(
    collection_name=collection_name,
    query=query_vector,
    limit=5,
    score_threshold=0.5  # Only return results with similarity > 0.5
).points

print(f"\nüìä Top {len(results)} Results (by semantic similarity):\n")

for i, result in enumerate(results, 1):
    payload = result.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"RESULT {i} - Similarity Score: {result.score:.4f}")
    print(f"{'='*80}")
    print(f"üìÑ Content: {payload.get('content', 'N/A')}")
    print(f"üè∑Ô∏è  Section Type: {payload.get('section_type', 'N/A')}")
    if payload.get('section_type') == 'work_experience':
        print(f"   Company: {metadata.get('company', 'N/A')}")
        print(f"   Position: {metadata.get('position', 'N/A')}")
    print()

üîç Semantic Search Query: 'Python data analysis ETL pipeline machine learning'
   Query vector dimensions: 1536

üìä Top 4 Results (by semantic similarity):

RESULT 1 - Similarity Score: 0.6339
üìÑ Content: Data Analyst: Built an ETL pipeline integrating five data sources totaling over 1M records using SQL and Python, automating ingestion and cleaning and saving 8 hours weekly in data preparation.
üè∑Ô∏è  Section Type: work_experience
   Company: Rubicon Organics
   Position: Data Analyst

RESULT 2 - Similarity Score: 0.5509
üìÑ Content: Data Scientist II: Extracted and processed millions of import/export transactions by building web-scraping collectors and a PySpark ETL pipeline to load cleaned data into a Microsoft Fabric lakehouse.
üè∑Ô∏è  Section Type: work_experience
   Company: Canadian Food Inspection Agency
   Position: Data Scientist II

RESULT 3 - Similarity Score: 0.5154
üìÑ Content: Data Scientist II: Automated data categorization, reducing data cleaning time by ove

### 5.2 Search for Personality Traits Matching Job Requirements

This mimics how `retrieve_personality_traits()` works in the resume generator.

In [13]:
# Simulate a job analysis with soft skills and keywords
job_analysis = {
    'soft_skills': ['analytical thinking', 'problem-solving', 'collaboration'],
    'keywords': ['strategic', 'innovative', 'team player']
}

# Build query (same logic as retrieve_personality_traits)
query_parts = job_analysis.get('soft_skills', []) + job_analysis.get('keywords', [])
query_text = ' '.join(query_parts)
query_vector = embedder.embed_query(query_text)

print(f"üîç Job Requirements Query: '{query_text}'\n")

# Search all sections first
all_results = client.query_points(
    collection_name=collection_name,
    query=query_vector,
    limit=10
).points

# Filter for personality/strength only (mimics retrieve_personality_traits)
personality_results = [
    r for r in all_results 
    if r.payload.get('section_type') in ['personality', 'strength']
]

print(f"üìä Retrieved {len(personality_results)} Personality/Strength Traits:\n")

for i, result in enumerate(personality_results[:5], 1):  # Top 5
    payload = result.payload
    metadata = payload.get('metadata', {})
    
    print(f"{'='*80}")
    print(f"TRAIT {i} - Similarity: {result.score:.4f}")
    print(f"{'='*80}")
    print(f"üè∑Ô∏è  Type: {payload.get('section_type', 'N/A')}")
    print(f"üìù Section: {metadata.get('section', 'N/A')}")
    if metadata.get('subsection'):
        print(f"   Subsection: {metadata.get('subsection', 'N/A')}")
    print(f"üìÑ Content:\n   {payload.get('content', 'N/A')}")
    print()

print("\nüí° These traits would be injected into the cover letter prompt!")

üîç Job Requirements Query: 'analytical thinking problem-solving collaboration strategic innovative team player'

üìä Retrieved 5 Personality/Strength Traits:

TRAIT 1 - Similarity: 0.4947
üè∑Ô∏è  Type: personality
üìù Section: Conceptual Thinking
üìÑ Content:
   I effortlessly grasp abstract, complex ideas, making me particularly suited to roles that require strategic analysis and long-term planning.

TRAIT 2 - Similarity: 0.4716
üè∑Ô∏è  Type: personality
üìù Section: Innovative Mindset
üìÑ Content:
   My ability to see possibilities others overlook often helps me find smarter solutions and effective improvements at work.

TRAIT 3 - Similarity: 0.4082
üè∑Ô∏è  Type: personality
üìù Section: Goal-Oriented
üìÑ Content:
   I stay motivated by clear goals and visible progress, consistently tracking achievements and identifying next steps. # Weaknesses My preference for working independently and my dislike for office politics can sometimes hinder my career progression. I need to 

### 5.3 Semantic Search with Section Filtering

Combine semantic search with metadata filters for precise results.

In [14]:
# Search for data science achievements ONLY in work experience
query_text = "data science machine learning SQL Python dashboard visualization"
query_vector = embedder.embed_query(query_text)

# Apply filter to only search work_experience
work_filter = Filter(
    must=[FieldCondition(key="section_type", match=MatchValue(value="work_experience"))]
)

results = client.query_points(
    collection_name=collection_name,
    query=query_vector,
    query_filter=work_filter,  # ‚Üê Apply filter during search
    limit=5
).points

print(f"üîç Query: '{query_text}'")
print(f"üéØ Filter: section_type = 'work_experience'")
print(f"\nüìä Top {len(results)} Work Achievements:\n")

for i, result in enumerate(results, 1):
    payload = result.payload
    metadata = payload.get('metadata', {})
    
    print(f"{i}. [Score: {result.score:.4f}] {metadata.get('company', 'N/A')} - {metadata.get('position', 'N/A')}")
    print(f"   {payload.get('content', 'N/A')[:100]}...")
    print()

üîç Query: 'data science machine learning SQL Python dashboard visualization'
üéØ Filter: section_type = 'work_experience'

üìä Top 5 Work Achievements:

1. [Score: 0.5815] Canadian Food Inspection Agency - Data Scientist
   Data Scientist: Standardized descriptive and statistical reporting in Power BI, reducing report-gene...

2. [Score: 0.5102] Rubicon Organics - Data Analyst
   Data Analyst: Built an ETL pipeline integrating five data sources totaling over 1M records using SQL...

3. [Score: 0.5098] Rubicon Organics - Data Analyst
   Data Analyst: Built three Power BI dashboards for sales and marketing by collaborating with stakehol...

4. [Score: 0.4951] Canadian Food Inspection Agency - Data Scientist II
   Data Scientist II: Automated forecasting and reduced manual effort by 40 hours per month by deployin...

5. [Score: 0.4918] Canadian Food Inspection Agency - Data Scientist II
   Data Scientist II: Implemented daily automated data refreshes, replacing weekly manual CSV expor

## 6. Complete RAG Workflow Example

This demonstrates the full retrieval flow used in resume generation.

In [15]:
# Simulate a complete RAG workflow for a Data Scientist job
print("="*80)
print("COMPLETE RAG WORKFLOW: Data Scientist Position")
print("="*80)

# 1. Job context
job_title = "Senior Data Scientist"
company = "Tech Corp"
job_description = """
Looking for a data scientist with strong Python skills, experience with machine learning,
SQL databases, and data visualization. Must have excellent analytical and problem-solving
abilities with strong communication skills.
"""

print(f"\nüìã Job: {job_title} at {company}")
print(f"üìù Requirements: Python, ML, SQL, data viz, analytical thinking, communication\n")

# 2. PHASE 1: RETRIEVAL
print("="*80)
print("PHASE 1: RETRIEVAL (Vector Similarity Search)")
print("="*80)

# Create query embedding
query_text = f"{job_title} {company} {job_description}"
query_vector = embedder.embed_query(query_text)

# Retrieve work experience
work_results = client.query_points(
    collection_name=collection_name,
    query=query_vector,
    query_filter=Filter(
        must=[FieldCondition(key="section_type", match=MatchValue(value="work_experience"))]
    ),
    limit=10
).points

print(f"\nüîç Retrieved {len(work_results)} relevant work achievements:")
for i, result in enumerate(work_results[:5], 1):
    metadata = result.payload.get('metadata', {})
    print(f"   {i}. [{result.score:.3f}] {metadata.get('company')} - {result.payload.get('content', '')[:60]}...")

# Retrieve personality traits
job_analysis = {
    'soft_skills': ['analytical', 'problem-solving', 'communication'],
    'keywords': ['data-driven', 'collaborative']
}

personality_query = ' '.join(job_analysis['soft_skills'] + job_analysis['keywords'])
personality_vector = embedder.embed_query(personality_query)

personality_results = client.query_points(
    collection_name=collection_name,
    query=personality_vector,
    limit=10
).points

# Filter for personality/strength
personality_filtered = [
    r for r in personality_results 
    if r.payload.get('section_type') in ['personality', 'strength']
][:5]

print(f"\nüß† Retrieved {len(personality_filtered)} personality traits:")
for i, result in enumerate(personality_filtered, 1):
    print(f"   {i}. [{result.score:.3f}] {result.payload.get('content', '')[:60]}...")

# 3. PHASE 2: AUGMENTATION
print(f"\n{'='*80}")
print("PHASE 2: AUGMENTATION (Combine Context)")
print("="*80)
print("\n‚úÖ Would combine:")
print(f"   - Job requirements: {job_title}, Python, ML, SQL...")
print(f"   - {len(work_results[:5])} work achievements")
print(f"   - {len(personality_filtered)} personality traits")
print("   - Into a structured prompt for Claude")

# 4. PHASE 3: GENERATION
print(f"\n{'='*80}")
print("PHASE 3: GENERATION (Claude LLM)")
print("="*80)
print("\n‚úÖ Would call Claude API with augmented prompt to generate:")
print("   - Tailored resume sections")
print("   - Personalized cover letter")
print("   - Using ONLY the retrieved context")

print(f"\n{'='*80}")
print("‚úÖ RAG WORKFLOW COMPLETE")
print("="*80)

COMPLETE RAG WORKFLOW: Data Scientist Position

üìã Job: Senior Data Scientist at Tech Corp
üìù Requirements: Python, ML, SQL, data viz, analytical thinking, communication

PHASE 1: RETRIEVAL (Vector Similarity Search)

üîç Retrieved 10 relevant work achievements:
   1. [0.452] Canadian Food Inspection Agency - Data Scientist II: Extracted and processed millions of impor...
   2. [0.448] Canadian Food Inspection Agency - Data Scientist: Standardized descriptive and statistical rep...
   3. [0.448] Canadian Food Inspection Agency - Data Scientist II: Implemented daily automated data refreshe...
   4. [0.441] Canadian Food Inspection Agency - Data Scientist: Analyzed pathogen occurrence trends across 5...
   5. [0.433] Rubicon Organics - Data Analyst: Built an ETL pipeline integrating five data so...

üß† Retrieved 0 personality traits:

PHASE 2: AUGMENTATION (Combine Context)

‚úÖ Would combine:
   - Job requirements: Senior Data Scientist, Python, ML, SQL...
   - 5 work achievement

## Summary

This notebook demonstrated:

1. **Collection Structure**: Viewing all collections, document counts, and section types
2. **Resume Data Queries**: Filtering work experience, education, skills with full metadata
3. **Personality Data Queries**: Viewing personality, strength, and weakness chunks
4. **Embeddings**: Inspecting 1536-dimensional vectors and their statistics
5. **Semantic Search**: Using OpenAI embeddings for similarity-based retrieval
6. **Section Filtering**: Combining semantic search with metadata filters
7. **Complete RAG Flow**: End-to-end retrieval ‚Üí augmentation ‚Üí generation workflow

### Key Insights

- **Chunking preserves context**: Each chunk contains a complete semantic unit (achievement, trait, skill)
- **Embeddings enable semantic matching**: Query "machine learning" matches "ML model", "predictive analytics"
- **Metadata enables filtering**: Can retrieve only work experience, only strengths, etc.
- **Similarity scores guide selection**: Higher scores = more relevant to query
- **Weaknesses are excluded**: `retrieve_personality_traits()` filters out `section_type='weakness'`

### Next Steps

- Run cells to explore your actual vector database
- Modify queries to test different job requirements
- Experiment with `score_threshold` values
- Try combining multiple filters (e.g., company + section_type)