# Resume Evaluation & Matching

Notebook playground for vector-based resume/job matching and optional LLM evaluation. 

**Uses `workspace/functions/resume_eval_core.py`** - a lightweight module with the same logic as the Streamlit pages (`6_Resume_Evaluation.py` and `7_Resume_Matching.py`) but without Streamlit dependencies.

## Quick Start

1. **Run Cell 1** - Sets up Python paths and imports libraries
2. **Run Cell 2** - Configure database and Ollama endpoints  
3. **Run Cell 4** - Import resume_eval_core functions
4. **Run Cell 11** - Quick text-based example
5. **Run Cell 13** - Evaluate actual PDF resumes from Resume_testing folder

## Available Functions

- `eval_resume_core()` - End-to-end resume evaluation pipeline
- `find_top_jobs_core()` - Vector search for matching jobs
- `eval_with_llm_core()` - Get LLM recommendations (Yes/No + suggestions)
- `extract_pdf_core()` - Extract text from PDF resumes
- `clean_text_core()` - Clean and normalize text

## Requirements for Vector Search

For vector search to work, you need:
- **Database connection**: PostgreSQL with pgvector extension
- **Jobs table**: Must have `embedding` column (for sbert) or `word2vec_embedding` (for word2vec)
- **Embeddings**: Jobs must have pre-computed embeddings in the database
- **Database URL**: Configured in Cell 2 or via `DATABASE_URL` environment variable
- **Word2Vec Model**: Default embedding method uses Word2Vec (300 dimensions). Model should be in `workspace/models/word2vec_model.joblib`

**Default Embedding Method**: Word2Vec (300 dimensions) - simpler, faster, and more reliable than SBERT

If vector search returns no results, check:
1. Database is running and accessible
2. Jobs table exists and has data
3. Embeddings are computed and stored in the database (use `word2vec_embedding` column for Word2Vec)
4. Database connection settings in Cell 2 are correct
5. Word2Vec model exists in `workspace/models/word2vec_model.joblib` (or use `embedding_type='sbert'` as fallback)


In [125]:
# Optional: install runtime deps (commented to avoid accidental runs)
# %pip install -q sentence-transformers spacy pypdf sqlalchemy psycopg2-binary ollama transformers
# python -m spacy download en_core_web_sm

# Setup: Add parent directories to Python path and import basic libraries
import sys
import os
import json
from pathlib import Path
from typing import List, Dict, Optional

import numpy as np
import pandas as pd

# Add workspace and app-streamlit to Python path so we can import functions.resume_eval_core
notebook_dir = Path.cwd()
workspace_root = notebook_dir.parent if notebook_dir.name == "workspace" else notebook_dir
app_streamlit_path = workspace_root / "app-streamlit"

# Set workspace path
WORKSPACE_PATH = Path(os.getenv("WORKSPACE_PATH", Path.cwd()))
if WORKSPACE_PATH.name != "workspace":
    WORKSPACE_PATH = WORKSPACE_PATH / "workspace"

# Add paths to sys.path in order of priority (workspace first, then app-streamlit, then root)
if str(WORKSPACE_PATH) not in sys.path:
    sys.path.insert(0, str(WORKSPACE_PATH))
if str(app_streamlit_path) not in sys.path:
    sys.path.insert(0, str(app_streamlit_path))
if str(workspace_root) not in sys.path:
    sys.path.insert(0, str(workspace_root))

print(f"‚úÖ Python path configured:")
print(f"   - Workspace: {WORKSPACE_PATH}")
print(f"   - App-streamlit: {app_streamlit_path}")
print(f"   - Root: {workspace_root}")


‚úÖ Python path configured:
   - Workspace: /home/jovyan/workspace
   - App-streamlit: /home/jovyan/app-streamlit
   - Root: /home/jovyan


In [126]:
# Ollama settings (used by evaluate_with_llm)
# Defaults set to your endpoint/model; override via env vars if needed.
os.environ["OLLAMA_API_URL"] = os.getenv("OLLAMA_API_URL", "http://hyper07.ddns.net:11434")
os.environ["OLLAMA_HOST"] = os.getenv("OLLAMA_HOST", os.environ["OLLAMA_API_URL"])
os.environ["OLLAMA_MODEL"] = os.getenv("OLLAMA_MODEL", "gpt-oss:20b")

# Database settings used by functions.database
# Preferred: single DATABASE_URL; fallback: individual vars
# Example: postgresql://user:password@host:port/database
os.environ["DATABASE_URL"] = os.getenv("DATABASE_URL", "postgresql://admin:PassW0rd@localhost:45432/db")

# If you prefer individual vars instead of DATABASE_URL, uncomment and set:
os.environ["POSTGRES_USER"] = "admin"
os.environ["POSTGRES_PASSWORD"] = "PassW0rd"
os.environ["POSTGRES_DB"] = "db"
os.environ["POSTGRES_HOST"] = "localhost"
os.environ["POSTGRES_PORT"] = "45432"

print("OLLAMA_API_URL=", os.environ["OLLAMA_API_URL"])
print("OLLAMA_MODEL=", os.environ["OLLAMA_MODEL"])
print("DATABASE_URL=", os.environ["DATABASE_URL"])

OLLAMA_API_URL= http://hyper07.ddns.net:11434
OLLAMA_MODEL= gpt-oss:20b
DATABASE_URL= postgresql://admin:PassW0rd@nlp-postgres:5432/db


In [127]:
# Cell merged into Cell 1 above ‚úÖ



In [128]:
# Import Streamlit-free core helpers (path was set in cell 1)
try:
    from functions.resume_eval_core import (
        evaluate_resume as eval_resume_core,
        find_top_jobs_for_resume as find_top_jobs_core,
        evaluate_with_llm as eval_with_llm_core,
        extract_text_from_pdf as extract_pdf_core,
        clean_text as clean_text_core,
    )
    # Import MASTER_SKILL_LIST from nlp_config or fallback to functions module
    try:
        from functions.nlp_config import MASTER_SKILL_LIST
    except Exception:  # noqa: BLE001
        # Fallback: try importing from functions module (which exports from vector_search_eval)
        try:
            from functions import MASTER_SKILL_LIST
        except Exception:  # noqa: BLE001
            # Last resort: try vector_search_eval directly
            from functions.vector_search_eval import MASTER_SKILL_LIST
    
    SKILL_LIST = MASTER_SKILL_LIST
    CORE_IMPORT_OK = True
    print(f"‚úÖ Successfully imported resume_eval_core and {len(SKILL_LIST)} skills")
except Exception as e:  # noqa: BLE001
    CORE_IMPORT_OK = False
    SKILL_LIST = []
    print(f"‚ùå Core import failed: {e}")
    import traceback
    traceback.print_exc()
    print("Make sure you run Cell 1 first to set up the Python path")


‚úÖ Successfully imported resume_eval_core and 462 skills


In [129]:
# ‚úÖ All functions now imported from resume_eval_core module
# Available functions:
#   - clean_text_core(text: str) -> str
#   - extract_pdf_core(file_path: str) -> Optional[str]
#   - eval_resume_core(resume_text, skill_list, top_k, embedding_type, run_llm, llm_model, llm_api_url)
#   - find_top_jobs_core(resume_text, skill_list, top_k, embedding_type)
#   - eval_with_llm_core(resume_text, job, model_name, api_url)

if CORE_IMPORT_OK:
    print("‚úÖ Ready to use resume_eval_core functions")
    print(f"   Available skills: {len(SKILL_LIST)}")
    
    # Quick diagnostic: Check if database connection and embeddings are available
    try:
        from functions.database import create_db_engine
        from sqlalchemy import text
        engine = create_db_engine()
        if engine:
            with engine.connect() as conn:
                # Check if jobs table exists and has embeddings
                result = conn.execute(text("""
                    SELECT 
                        COUNT(*) as total_jobs,
                        COUNT(embedding) as jobs_with_sbert,
                        COUNT(word2vec_embedding) as jobs_with_w2v
                    FROM jobs
                """))
                row = result.fetchone()
                if row:
                    print(f"\nüìä Database Status:")
                    print(f"   Total jobs: {row[0]}")
                    print(f"   Jobs with SBERT embeddings: {row[1]}")
                    print(f"   Jobs with Word2Vec embeddings: {row[2]}")
                    if row[1] == 0 and row[2] == 0:
                        print("   ‚ö†Ô∏è  No embeddings found! Vector search will not work.")
                        print("   üí° Run embedding generation script to populate embeddings.")
                    elif row[1] > 0:
                        print("   ‚úÖ SBERT embeddings available - ready for vector search!")
                    elif row[2] > 0:
                        print("   ‚úÖ Word2Vec embeddings available - ready for vector search!")
        else:
            print("\n‚ö†Ô∏è  Database connection failed. Check DATABASE_URL in Cell 2.")
    except Exception as e:
        print(f"\n‚ö†Ô∏è  Could not check database status: {e}")
    
    # Test embedding generation capability (Word2Vec is default, 300 dimensions)
    print(f"\nüîß Testing embedding generation...")
    try:
        from functions.resume_eval_core import generate_embedding
        from pathlib import Path
        import os
        
        # Check where models might be located
        print("   üìÅ Checking for Word2Vec model in common locations:")
        model_locations = [
            Path.cwd() / "models" / "word2vec_model.joblib",
            Path.cwd() / "workspace" / "models" / "word2vec_model.joblib",
            WORKSPACE_PATH / "models" / "word2vec_model.joblib",
        ]
        found_model = False
        for model_path in model_locations:
            if model_path.exists():
                print(f"      ‚úÖ Found: {model_path}")
                found_model = True
                break
        if not found_model:
            print(f"      ‚ùå Word2Vec model not found in expected locations")
            print(f"      üí° The code will auto-fallback to SBERT if available")
        
        test_text = "Test text for embedding"
        # Test Word2Vec first (default, simpler, 300 dimensions)
        # Note: generate_embedding will auto-fallback to SBERT if Word2Vec fails
        test_emb = generate_embedding(test_text, method="word2vec")
        if test_emb is not None:
            print(f"   ‚úÖ Embedding generation works (shape: {test_emb.shape})")
            if test_emb.shape[0] == 300:
                print(f"      Using Word2Vec (300 dimensions)")
            else:
                print(f"      Using SBERT (auto-fallback, {test_emb.shape[0]} dimensions)")
        else:
            print(f"   ‚ùå Both Word2Vec and SBERT embedding generation failed!")
            print(f"      üí° Check if models are available or install dependencies")
    except Exception as e:
        print(f"   ‚ùå Error testing embedding: {e}")
        import traceback
        traceback.print_exc()
else:
    print("‚ö†Ô∏è  Core functions not available - check imports above")


‚úÖ Ready to use resume_eval_core functions
   Available skills: 462

üìä Database Status:
   Total jobs: 14687
   Jobs with SBERT embeddings: 14687
   Jobs with Word2Vec embeddings: 14687
   ‚úÖ SBERT embeddings available - ready for vector search!

üîß Testing embedding generation...
   üìÅ Checking for Word2Vec model in common locations:
      ‚úÖ Found: /home/jovyan/models/word2vec_model.joblib
   ‚ùå Both Word2Vec and SBERT embedding generation failed!
      üí° Check if models are available or install dependencies


In [130]:
sample_resume = """
Data analyst with 4 years experience in SQL, Python, Tableau, and building ETL pipelines. Led migration to Snowflake and created dashboards for revenue analytics.
""".strip()

# Test embedding generation first (Word2Vec is default, 300 dimensions)
print("üîç Testing embedding generation (Word2Vec, 300 dim)...")
try:
    from functions.resume_eval_core import generate_embedding
    test_emb = generate_embedding(sample_resume, method="word2vec")
    if test_emb is not None:
        print(f"‚úÖ Word2Vec embedding generated successfully (shape: {test_emb.shape}, 300 dimensions)")
    else:
        print("‚ùå Word2Vec embedding generation failed!")
        print("   This usually means:")
        print("   - Word2Vec model not found in workspace/models/word2vec_model.joblib")
        print("   - Try using 'sbert' as fallback: embedding_type='sbert'")
except Exception as e:
    print(f"‚ùå Error testing embedding: {e}")
    import traceback
    traceback.print_exc()

print("\nüîç Running vector search with Word2Vec (300 dim)...")
# If database / embeddings are wired, this will pull live matches; otherwise returns []
try:
    results = eval_resume_core(
        sample_resume,
        skill_list=SKILL_LIST,
        top_k=3,
        embedding_type="word2vec",  # Using Word2Vec (300 dimensions) - simpler and more reliable
        run_llm=False,
        llm_model=os.environ.get("OLLAMA_MODEL"),
        llm_api_url=os.environ.get("OLLAMA_API_URL"),
    )
    
    if not results:
        print("‚ùå No matches returned.")
        print("\nüí° Troubleshooting:")
        print("   1. Check if embedding generation worked above")
        print("   2. Verify database connection in Cell 5 output")
        print("   3. Check if find_similar_jobs_vector is available:")
        try:
            from functions.database import find_similar_jobs_vector
            print("      ‚úÖ find_similar_jobs_vector is available")
        except Exception as e:
            print(f"      ‚ùå find_similar_jobs_vector import failed: {e}")
        print("   4. Try running find_top_jobs_core directly to see detailed errors")
    else:
        print(f"‚úÖ Found {len(results)} matching jobs!")
        display(pd.DataFrame(results))
except Exception as e:
    print(f"‚ùå Error during evaluation: {e}")
    import traceback
    traceback.print_exc()



üîç Testing embedding generation (Word2Vec, 300 dim)...
‚ùå Word2Vec embedding generation failed!
   This usually means:
   - Word2Vec model not found in workspace/models/word2vec_model.joblib
   - Try using 'sbert' as fallback: embedding_type='sbert'

üîç Running vector search with Word2Vec (300 dim)...
‚ùå No matches returned.

üí° Troubleshooting:
   1. Check if embedding generation worked above
   2. Verify database connection in Cell 5 output
   3. Check if find_similar_jobs_vector is available:
      ‚úÖ find_similar_jobs_vector is available
   4. Try running find_top_jobs_core directly to see detailed errors


### Example: Evaluate a PDF Resume

Below shows how to load a PDF resume and evaluate it against the jobs database.



In [131]:
# Example: Evaluate a PDF resume from the Resume_testing folder OR use sample text
resume_dir = WORKSPACE_PATH / "Resume_testing"

# Try to load from PDF first, otherwise use sample text
resume_text = None
if resume_dir.exists():
    pdf_files = list(resume_dir.glob("*.pdf"))
    if pdf_files and CORE_IMPORT_OK:
        # Take the first PDF as an example
        sample_pdf = pdf_files[0]
        print(f"üìÑ Loading resume: {sample_pdf.name}")
        
        # Extract text
        resume_text = extract_pdf_core(str(sample_pdf))
        if resume_text:
            print(f"‚úÖ Extracted {len(resume_text)} characters from PDF")
        else:
            print("‚ö†Ô∏è  Failed to extract text from PDF, using sample text instead")
    else:
        print("‚ö†Ô∏è  No PDF files found, using sample text instead")

# Fallback to sample text if PDF extraction failed
if not resume_text:
    resume_text = """          
    SUMMARY
    Experienced Data Scientist with 5 years of expertise in machine learning, statistical analysis, and data-driven decision making. 
    Proficient in Python, SQL, and cloud platforms. Strong background in NLP, deep learning, and predictive modeling.
    
    SKILLS
    Python, SQL, R, Machine Learning, Deep Learning, NLP, TensorFlow, PyTorch, scikit-learn, pandas, numpy, 
    AWS, Docker, Git, Tableau, Statistical Analysis, A/B Testing
    
    EXPERIENCE
    
    Senior Data Scientist | Tech Innovations Inc. | San Francisco, CA
    2020 - Present
    - Developed and deployed machine learning models for customer churn prediction, improving retention by 25%
    - Built NLP pipelines for sentiment analysis and topic modeling using BERT and LDA
    - Led team of 3 junior data scientists in building recommendation systems
    - Implemented A/B testing frameworks for product feature evaluation
    - Created automated ETL pipelines using Python and AWS Lambda
    
    Data Scientist | Analytics Solutions LLC | San Francisco, CA
    2018 - 2020
    - Built predictive models for sales forecasting using time series analysis
    - Performed statistical analysis on large datasets (10M+ records) using Spark and SQL
    - Developed data visualization dashboards in Tableau for executive reporting
    - Collaborated with engineering teams to deploy models to production
    
    EDUCATION
    
    Master's in Data Science | Stanford University | Stanford, CA
    2018
    """.strip()
    print(f"üìù Using sample resume text ({len(resume_text)} characters)")

if resume_text:
    print(f"\nüîç Running vector search evaluation...")
    
    # Test embedding generation first (Word2Vec, 300 dimensions)
    try:
        from functions.resume_eval_core import generate_embedding
        test_emb = generate_embedding(resume_text[:500], method="word2vec")  # Test with first 500 chars
        if test_emb is None:
            print("‚ö†Ô∏è  Warning: Word2Vec embedding generation test failed. Results may be empty.")
            print("   üí° Try using 'sbert' as fallback: embedding_type='sbert'")
        else:
            print(f"‚úÖ Word2Vec embedding test passed (shape: {test_emb.shape}, 300 dimensions)")
    except Exception as e:
        print(f"‚ö†Ô∏è  Embedding test error: {e}")
    
    # Evaluate (without LLM for speed) - Using Word2Vec (300 dimensions)
    try:
        results = eval_resume_core(
            resume_text,
            skill_list=SKILL_LIST,
            top_k=3,
            embedding_type="word2vec",  # Using Word2Vec (300 dimensions) - simpler and more reliable
            run_llm=False,  # Set to True to get LLM recommendations
        )
        
        if results:
            print(f"\n‚úÖ Found {len(results)} matching jobs:")
            display(pd.DataFrame(results))
        else:
            print("\n‚ùå No job matches found.")
            print("\nüí° Troubleshooting steps:")
            print("   1. Check Cell 5 - Database status should show embeddings available")
            print("   2. Check Cell 6 - Embedding generation should work")
            print("   3. Verify database connection settings in Cell 2")
            print("   4. Try running find_top_jobs_core directly:")
            print("      results = find_top_jobs_core(resume_text, SKILL_LIST, top_k=3, embedding_type='sbert')")
    except Exception as e:
        print(f"‚ùå Error during evaluation: {e}")
        import traceback
        traceback.print_exc()
else:
    print("‚ùå Failed to get resume text")


üìù Using sample resume text (1461 characters)

üîç Running vector search evaluation...
   üí° Try using 'sbert' as fallback: embedding_type='sbert'

‚ùå No job matches found.

üí° Troubleshooting steps:
   1. Check Cell 5 - Database status should show embeddings available
   2. Check Cell 6 - Embedding generation should work
   3. Verify database connection settings in Cell 2
   4. Try running find_top_jobs_core directly:
      results = find_top_jobs_core(resume_text, SKILL_LIST, top_k=3, embedding_type='sbert')


### Database Records Preview

View the last 10 records from the `jobs` table in the database.


In [132]:
# View last 10 records from database with vector values
try:
    from functions.database import create_db_engine, execute_query
    from sqlalchemy import text
    
    engine = create_db_engine()
    if engine:
        # Query sample data from database with vector values as strings
        query = text("""
            SELECT 
                id,
                title,
                company,
                LEFT(text, 200) as text_preview,
                CASE 
                    WHEN embedding IS NOT NULL THEN embedding::text
                    ELSE NULL 
                END as sbert_embedding_str,
                CASE 
                    WHEN word2vec_embedding IS NOT NULL THEN word2vec_embedding::text
                    ELSE NULL 
                END as w2v_embedding_str,
                CASE 
                    WHEN embedding IS NOT NULL THEN 'Yes' 
                    ELSE 'No' 
                END as has_sbert_embedding,
                CASE 
                    WHEN word2vec_embedding IS NOT NULL THEN 'Yes' 
                    ELSE 'No' 
                END as has_w2v_embedding,
                created_at
            FROM jobs
            ORDER BY created_at DESC NULLS LAST
            LIMIT 10
        """)
        
        with engine.connect() as conn:
            result = conn.execute(query)
            rows = result.fetchall()
            
            if rows:
                # Convert to DataFrame
                columns = ['id', 'title', 'company', 'text_preview', 'sbert_embedding_str', 'w2v_embedding_str', 
                          'has_sbert_embedding', 'has_w2v_embedding', 'created_at']
                df_sample = pd.DataFrame(rows, columns=columns)
                
                # Create preview columns for embeddings (first 100 chars)
                df_display = df_sample.copy()
                df_display['sbert_preview'] = df_display['sbert_embedding_str'].apply(
                    lambda x: (x[:100] + '...') if x and len(str(x)) > 100 else (x if x else 'N/A')
                )
                df_display['w2v_preview'] = df_display['w2v_embedding_str'].apply(
                    lambda x: (x[:100] + '...') if x and len(str(x)) > 100 else (x if x else 'N/A')
                )
                
                # Get total count
                count_query = text("SELECT COUNT(*) as total FROM jobs")
                count_result = conn.execute(count_query)
                total_count = count_result.fetchone()[0]
                
                print(f"‚úÖ Found {len(df_sample)} sample records (showing latest 10)")
                print(f"üìä Total jobs in database: {total_count:,}")
                print("\n" + "="*80)
                
                # Display columns: basic info + embedding previews
                display_cols = ['id', 'title', 'company', 'text_preview', 'sbert_preview', 'w2v_preview', 
                               'has_sbert_embedding', 'has_w2v_embedding', 'created_at']
                display(df_display[display_cols])
                
                # Show full vector values in expandable format
                print("\n" + "="*80)
                print("üìä Full Vector Values (expand to view):")
                print("="*80)
                
                for idx, row in df_sample.iterrows():
                    print(f"\nüîπ Record {idx + 1}: {row['title']} @ {row['company']}")
                    print(f"   ID: {row['id']}")
                    
                    if row['sbert_embedding_str']:
                        sbert_str = str(row['sbert_embedding_str'])
                        print(f"\n   üìê SBERT Embedding ({len(sbert_str)} chars):")
                        if len(sbert_str) > 200:
                            print(f"      {sbert_str[:200]}...")
                            print(f"      ... (truncated, full length: {len(sbert_str)} characters)")
                        else:
                            print(f"      {sbert_str}")
                    else:
                        print(f"\n   üìê SBERT Embedding: None")
                    
                    if row['w2v_embedding_str']:
                        w2v_str = str(row['w2v_embedding_str'])
                        print(f"\n   üìê Word2Vec Embedding ({len(w2v_str)} chars):")
                        if len(w2v_str) > 200:
                            print(f"      {w2v_str[:200]}...")
                            print(f"      ... (truncated, full length: {len(w2v_str)} characters)")
                        else:
                            print(f"      {w2v_str}")
                    else:
                        print(f"\n   üìê Word2Vec Embedding: None")
                    
                    print("-" * 80)
                
                # Show embedding statistics
                print("\nüìà Embedding Statistics:")
                sbert_count = (df_sample['has_sbert_embedding'] == 'Yes').sum()
                w2v_count = (df_sample['has_w2v_embedding'] == 'Yes').sum()
                print(f"   - Jobs with SBERT embeddings (in sample): {sbert_count}/10")
                print(f"   - Jobs with Word2Vec embeddings (in sample): {w2v_count}/10")
                
                # Get overall statistics
                stats_query = text("""
                    SELECT 
                        COUNT(*) as total_jobs,
                        COUNT(embedding) as jobs_with_sbert,
                        COUNT(word2vec_embedding) as jobs_with_w2v
                    FROM jobs
                """)
                stats_result = conn.execute(stats_query)
                stats_row = stats_result.fetchone()
                if stats_row:
                    print(f"\nüìä Overall Database Statistics:")
                    print(f"   - Total jobs: {stats_row[0]:,}")
                    print(f"   - Jobs with SBERT embeddings: {stats_row[1]:,} ({stats_row[1]/stats_row[0]*100:.1f}%)")
                    print(f"   - Jobs with Word2Vec embeddings: {stats_row[2]:,} ({stats_row[2]/stats_row[0]*100:.1f}%)")
            else:
                print("‚ÑπÔ∏è  No records found in the database.")
                print("üí° Import embeddings using the Streamlit app (page 4) or run the import script.")
    else:
        print("‚ùå Database connection failed. Check DATABASE_URL in Cell 2.")
except Exception as e:
    print(f"‚ùå Error querying database: {e}")
    import traceback
    traceback.print_exc()
    print("\nüí° Make sure:")
    print("   1. Database is running and accessible")
    print("   2. DATABASE_URL is set correctly in Cell 2")
    print("   3. The 'jobs' table exists (run 'Setup / Ensure Jobs Table' in Streamlit app page 4)")


‚úÖ Found 10 sample records (showing latest 10)
üìä Total jobs in database: 14,687



Unnamed: 0,id,title,company,text_preview,sbert_preview,w2v_preview,has_sbert_embedding,has_w2v_embedding,created_at
0,4318422955,Quantity Surveyor,Artelia Vietnam,Job Title:\nQuantity Surveyor\nJob Description...,"[-0.06695009,0.0046474594,0.027306074,0.030171...","[0.032178283,-0.0045534386,0.1007853,0.1007668...",Yes,Yes,2025-12-03 16:14:06.243561
1,4331851631,üé• Freelance Videographer Needed in Mumbai! üé•,DIGITAL WEB LONDON,Job Title:\nüé• Freelance Videographer Needed in...,"[-7.551803e-05,-0.11403701,0.016413225,-0.0532...","[0.099463336,0.051961232,0.11220918,-0.0152620...",Yes,Yes,2025-12-03 16:14:06.243561
2,4318431141,Desenvolvedor Java Kotlin - SP,innolevels,Job Title:\nDesenvolvedor Java Kotlin - SP\nJo...,"[-0.11486245,0.04842331,-0.06716769,-0.0423658...","[0.06835436,-0.10130825,0.32482982,0.0862046,0...",Yes,Yes,2025-12-03 16:14:06.243561
3,4332062636,Business Development Executive,UForce Solutions,Job Title:\nBusiness Development Executive\nJo...,"[-0.037340913,0.03121129,0.052649077,-0.058466...","[0.07128589,0.043121543,0.09117663,-0.00993084...",Yes,Yes,2025-12-03 16:14:06.243561
4,4318881666,International Customer Service Representative ...,Design Indian,Job Title:\nInternational Customer Service Rep...,"[-0.045741126,-0.057420842,0.06831001,0.027257...","[0.08275574,0.07060182,0.05877195,0.078775905,...",Yes,Yes,2025-12-03 16:14:06.243561
5,4290761534,LMHC (Remote),CMP.jobs,Job Title:\nLMHC (Remote)\nJob Description:\nA...,"[0.0106550185,-0.005671574,-0.03397301,0.00154...","[0.03051495,0.091563106,0.076338105,-0.0377251...",Yes,Yes,2025-12-03 16:14:06.243561
6,4332065592,Asistente Administrativa,Castelo Branco Industrial,Job Title:\nAsistente Administrativa\nJob Desc...,"[-0.06089491,0.02499839,-0.052517217,-0.074152...","[-0.05257533,-0.09478265,0.3872193,0.33953944,...",Yes,Yes,2025-12-03 16:14:06.243561
7,4331857107,CUISINIER(ERE),AY Solutions - Centre d'incubation et d'insert...,Job Title:\nCUISINIER(ERE)\nJob Description:\n...,"[-0.05383858,0.035472285,-0.037387066,-0.07056...","[0.08255422,0.20889747,0.5118412,0.21753241,0....",Yes,Yes,2025-12-03 16:14:06.243561
8,4317658799,Sales Manager,MIIT GLOBAL,Job Title:\nSales Manager\nJob Description:\nA...,"[-0.027104381,-0.027074695,-0.0060258587,-0.01...","[0.11635299,-0.0030421098,0.08128214,0.0868816...",Yes,Yes,2025-12-03 16:14:06.243561
9,4318859733,AV Engineer,Haystack,Job Title:\nAV Engineer\nJob Description:\nAbo...,"[-0.041359562,-0.028956838,0.0007664749,0.0031...","[0.049155172,-0.007493671,0.12139886,0.0468305...",Yes,Yes,2025-12-03 16:14:06.243561



üìä Full Vector Values (expand to view):

üîπ Record 1: Quantity Surveyor @ Artelia Vietnam
   ID: 4318422955

   üìê SBERT Embedding (4690 chars):
      [-0.06695009,0.0046474594,0.027306074,0.03017137,-0.120232,-0.0059182253,-0.043976583,0.0297666,-0.03882374,0.019348878,-0.05100315,-0.1495947,-0.009789393,0.04504226,-0.011575956,0.061756797,0.030343...
      ... (truncated, full length: 4690 characters)

   üìê Word2Vec Embedding (3551 chars):
      [0.032178283,-0.0045534386,0.1007853,0.100766845,-0.024244877,-0.09830581,0.15196925,0.058376405,0.06546201,0.13593107,0.1155943,-0.013743897,-0.04188189,0.08585754,-0.1291054,0.09318715,-0.044313066,...
      ... (truncated, full length: 3551 characters)
--------------------------------------------------------------------------------

üîπ Record 2: üé• Freelance Videographer Needed in Mumbai! üé• @ DIGITAL WEB LONDON
   ID: 4331851631

   üìê SBERT Embedding (4718 chars):
      [-7.551803e-05,-0.11403701,0.016413225,-0.05324966