## 1. Setup and Imports

In [None]:
# Install required packages (uncomment if needed)
# !pip install sentence-transformers langchain langchain-community python-docx pdfplumber numpy scikit-learn

In [None]:
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Import application modules
from app.parsers import ResumeParser, JobParser
from app.embeddings import EmbeddingService, SimilarityMatcher
from app.embeddings.similarity import compute_skill_overlap
from app.chains import ExplanationChain

import numpy as np
import pandas as pd

print("‚úÖ All modules imported successfully!")

## 2. Sample Data

Let's create some sample resumes and job descriptions for demonstration.

In [None]:
# Sample Resume 1: Data Scientist
resume_1_text = """
John Smith
john.smith@email.com | (555) 123-4567

PROFESSIONAL SUMMARY
Experienced Data Scientist with 5+ years of expertise in machine learning, 
statistical analysis, and data visualization. Passionate about leveraging 
AI to solve complex business problems.

SKILLS
Python, R, SQL, TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy, 
Machine Learning, Deep Learning, NLP, Computer Vision, Statistics,
Data Visualization, Tableau, Power BI, AWS, Docker, Git

EXPERIENCE
Senior Data Scientist at TechCorp Inc.
2020 - Present
- Built ML models for customer churn prediction achieving 92% accuracy
- Developed NLP pipeline for sentiment analysis of customer feedback
- Led a team of 3 junior data scientists

Data Scientist at DataAnalytics Co.
2018 - 2020
- Created predictive models for sales forecasting
- Implemented recommendation systems using collaborative filtering
- Built ETL pipelines processing 10TB+ daily

EDUCATION
Master of Science in Computer Science
Stanford University, 2018

Bachelor of Science in Mathematics
UC Berkeley, 2016

CERTIFICATIONS
- AWS Certified Machine Learning Specialty
- Google Cloud Professional Data Engineer
"""

# Sample Resume 2: Software Engineer
resume_2_text = """
Sarah Johnson
sarah.j@email.com | (555) 987-6543

SUMMARY
Full-stack software engineer with 4 years of experience building 
scalable web applications. Expert in React, Node.js, and cloud services.

SKILLS
JavaScript, TypeScript, React, Angular, Node.js, Express, Python,
Django, PostgreSQL, MongoDB, Redis, AWS, Docker, Kubernetes,
CI/CD, Git, Agile, REST API, GraphQL

EXPERIENCE
Software Engineer at WebTech Solutions
2021 - Present
- Developed React-based dashboard serving 100K+ users
- Built microservices architecture using Node.js and Docker
- Reduced API response time by 40% through optimization

Junior Developer at StartupXYZ
2020 - 2021
- Built e-commerce platform using Django and React
- Implemented payment integration with Stripe API

EDUCATION
Bachelor of Science in Computer Science
MIT, 2020
"""

# Sample Resume 3: Marketing Analyst
resume_3_text = """
Michael Chen
m.chen@email.com | (555) 456-7890

PROFILE
Marketing analyst with strong data analysis skills and 3 years 
of experience in digital marketing and campaign optimization.

SKILLS
Excel, SQL, Google Analytics, Tableau, Python, R, Statistics,
A/B Testing, SEO, SEM, Social Media Marketing, Content Strategy,
Market Research, Campaign Analysis, Data Visualization

EXPERIENCE
Marketing Analyst at BrandCo
2022 - Present
- Analyzed marketing campaign performance across channels
- Built dashboards tracking KPIs for executive team
- Increased ROI by 25% through data-driven optimization

Marketing Coordinator at MediaAgency
2021 - 2022
- Managed social media accounts with 500K followers
- Conducted market research and competitor analysis

EDUCATION
Bachelor of Business Administration - Marketing
NYU Stern School of Business, 2021
"""

print("‚úÖ Sample resumes created!")

In [None]:
# Sample Job Descriptions

job_1_text = """
Job Title: Senior Machine Learning Engineer
Company: AI Innovations Inc.
Location: San Francisco, CA (Hybrid)

About Us:
AI Innovations is a leading AI company developing cutting-edge ML solutions.

Responsibilities:
- Design and implement ML models for production systems
- Lead ML projects from research to deployment
- Mentor junior team members
- Collaborate with product teams on AI features

Requirements:
- 5+ years experience in machine learning
- Strong proficiency in Python, TensorFlow or PyTorch
- Experience with NLP or Computer Vision
- Knowledge of MLOps and model deployment
- Master's or PhD in CS, Statistics, or related field

Preferred:
- Experience with LLMs and Generative AI
- Publications in top ML conferences
- AWS or GCP certification

Benefits:
- Competitive salary ($180K - $250K)
- Stock options
- Health insurance
- Remote flexibility
"""

job_2_text = """
Job Title: Full Stack Developer
Company: TechStartup Co.
Location: Remote

We're looking for a passionate full-stack developer to join our team!

What You'll Do:
- Build and maintain web applications using React and Node.js
- Design RESTful APIs and database schemas
- Write clean, maintainable code with tests
- Participate in code reviews and agile ceremonies

Requirements:
- 3+ years of full-stack development experience
- Proficiency in JavaScript/TypeScript, React, Node.js
- Experience with SQL and NoSQL databases
- Familiarity with cloud services (AWS/GCP/Azure)
- Strong communication skills

Nice to Have:
- Experience with Docker and Kubernetes
- Knowledge of GraphQL
- Contributions to open source projects

Salary: $120K - $160K
"""

job_3_text = """
Job Title: Data Analyst
Company: RetailGiant Corp.
Location: New York, NY

About the Role:
Join our analytics team to drive data-informed decisions across the organization.

Key Responsibilities:
- Analyze large datasets to identify trends and insights
- Create dashboards and reports for stakeholders
- Support marketing and sales teams with data analysis
- Develop and maintain data pipelines

Qualifications:
- Bachelor's degree in Business, Statistics, or related field
- 2+ years of data analysis experience
- Expert in Excel and SQL
- Experience with visualization tools (Tableau, Power BI)
- Strong analytical and problem-solving skills

Preferred Skills:
- Python or R programming
- Experience in retail or e-commerce industry
- Knowledge of statistical analysis

Compensation: $80K - $110K + bonus
"""

print("‚úÖ Sample job descriptions created!")

## 3. Parse Resumes and Job Descriptions

In [None]:
# Initialize parsers
resume_parser = ResumeParser()
job_parser = JobParser()

print("‚úÖ Parsers initialized!")

In [None]:
# Parse resumes from text
resume_texts = [resume_1_text, resume_2_text, resume_3_text]
resumes = []

for i, text in enumerate(resume_texts):
    # Create a temporary text file content
    resume = resume_parser.parse(
        file_content=text.encode('utf-8'),
        file_name=f"resume_{i+1}.txt"
    )
    resumes.append(resume)
    print(f"\nüìÑ Resume {i+1}: {resume.name}")
    print(f"   Email: {resume.email}")
    print(f"   Skills: {', '.join(resume.skills[:10])}...")
    print(f"   Experience entries: {len(resume.experience)}")

print(f"\n‚úÖ Parsed {len(resumes)} resumes!")

In [None]:
# Parse job descriptions
job_texts = [job_1_text, job_2_text, job_3_text]
jobs = []

for i, text in enumerate(job_texts):
    job = job_parser.parse(text=text)
    jobs.append(job)
    print(f"\nüíº Job {i+1}: {job.title}")
    print(f"   Company: {job.company}")
    print(f"   Required Skills: {', '.join(job.required_skills[:8])}...")
    print(f"   Experience: {job.required_experience}")

print(f"\n‚úÖ Parsed {len(jobs)} job descriptions!")

## 4. Generate Embeddings

In [None]:
# Initialize embedding service with SentenceTransformers (local, no API key needed)
embedding_service = EmbeddingService(provider="sentence-transformers")

print(f"Provider: {embedding_service.provider}")
print(f"Embedding dimension: {embedding_service.embedding_dim}")

In [None]:
# Generate embeddings for resumes
print("Generating resume embeddings...")
resume_embeddings = embedding_service.embed_documents(resumes)
print(f"Resume embeddings shape: {resume_embeddings.shape}")

# Generate embeddings for jobs
print("\nGenerating job embeddings...")
job_embeddings = embedding_service.embed_documents(jobs)
print(f"Job embeddings shape: {job_embeddings.shape}")

print("\n‚úÖ Embeddings generated!")

## 5. Compute Semantic Similarity

In [None]:
# Initialize similarity matcher
matcher = SimilarityMatcher(similarity_metric="cosine")

# Compute similarity matrix
similarity_matrix = matcher.compute_similarity_matrix(resume_embeddings, job_embeddings)

# Create a nice DataFrame to display
resume_names = [r.name or f"Resume {i+1}" for i, r in enumerate(resumes)]
job_titles = [j.title or f"Job {i+1}" for i, j in enumerate(jobs)]

similarity_df = pd.DataFrame(
    similarity_matrix,
    index=resume_names,
    columns=job_titles
)

print("üìä Similarity Matrix (Resume vs Job):")
print("="*60)
display(similarity_df.style.background_gradient(cmap='Blues').format("{:.2%}"))

In [None]:
# Match resumes to jobs
matches = matcher.match_resumes_to_jobs(
    resume_embeddings,
    job_embeddings,
    resumes,
    jobs,
    top_k=3,
    threshold=0.0
)

print("üéØ Top Matches for Each Resume:")
print("="*60)

for i, resume_matches in enumerate(matches):
    resume = resumes[i]
    print(f"\nüìÑ {resume.name or resume.file_name}")
    print("-" * 40)
    
    for match in resume_matches:
        job = jobs[match.job_index]
        score = match.similarity_score
        
        # Determine match quality
        if score >= 0.7:
            quality = "üü¢ Excellent"
        elif score >= 0.5:
            quality = "üü° Good"
        else:
            quality = "üî¥ Low"
        
        print(f"  {quality} {job.title} ({score:.1%})")

## 6. Skill Overlap Analysis

In [None]:
# Analyze skill overlap between best matches
print("üîç Skill Gap Analysis:")
print("="*60)

for i, resume in enumerate(resumes):
    best_match = matches[i][0]  # Top match for this resume
    job = jobs[best_match.job_index]
    
    overlap = compute_skill_overlap(resume.skills, job.required_skills)
    
    print(f"\nüìÑ {resume.name} ‚Üí üíº {job.title}")
    print(f"   Match Score: {best_match.similarity_score:.1%}")
    print(f"   Skill Coverage: {overlap['coverage_percentage']:.0f}% ({overlap['matched_count']}/{overlap['total_required']})")
    
    if overlap['matching_skills']:
        print(f"   ‚úÖ Matching: {', '.join(list(overlap['matching_skills'])[:5])}")
    
    if overlap['missing_skills']:
        print(f"   ‚ùå Missing: {', '.join(list(overlap['missing_skills'])[:5])}")

## 7. AI-Powered Match Explanations (Optional)

This section uses LangChain to generate detailed explanations. It requires either:
- Ollama running locally with llama3.2 model
- Google API key for Gemini

If neither is available, you'll see a fallback message.

In [None]:
# Try to initialize explanation chain
explanation_chain = None

try:
    # Try Ollama first (local)
    explanation_chain = ExplanationChain(provider="ollama")
    print("‚úÖ Using Ollama for explanations")
except Exception as e:
    print(f"‚ö†Ô∏è Ollama not available: {e}")
    
    # Try Google as fallback
    import os
    if os.getenv("GOOGLE_API_KEY"):
        try:
            explanation_chain = ExplanationChain(provider="google")
            print("‚úÖ Using Google Gemini for explanations")
        except Exception as e:
            print(f"‚ö†Ô∏è Google AI not available: {e}")
    else:
        print("‚ÑπÔ∏è Set GOOGLE_API_KEY environment variable to enable AI explanations")

In [None]:
# Generate explanation for best match
if explanation_chain:
    print("ü§ñ Generating AI-Powered Match Explanation...")
    print("="*60)
    
    # Get best overall match
    best_resume_idx = 0  # John Smith (Data Scientist)
    best_match = matches[best_resume_idx][0]
    
    resume = resumes[best_resume_idx]
    job = jobs[best_match.job_index]
    
    print(f"\nAnalyzing: {resume.name} ‚Üí {job.title}")
    print(f"Match Score: {best_match.similarity_score:.1%}")
    print()
    
    # Compute skill overlap for context
    skill_overlap = compute_skill_overlap(resume.skills, job.required_skills)
    
    # Generate explanation
    explanation = explanation_chain.explain_match(
        resume, job, 
        best_match.similarity_score,
        skill_overlap
    )
    
    print(explanation.to_text())
else:
    print("‚ÑπÔ∏è AI explanations not available. Using quick summary instead.")
    print()
    
    # Generate simple summaries
    for i, resume in enumerate(resumes):
        best_match = matches[i][0]
        job = jobs[best_match.job_index]
        score = best_match.similarity_score
        
        resume_skills = set(s.lower() for s in resume.skills)
        job_skills = set(s.lower() for s in job.required_skills)
        matching = resume_skills & job_skills
        
        if score >= 0.7:
            fit = "excellent"
        elif score >= 0.5:
            fit = "good"
        else:
            fit = "moderate"
        
        print(f"üìÑ {resume.name} ‚Üí {job.title}")
        print(f"   {fit.title()} match ({score:.0%}) with {len(matching)} matching skills")
        if matching:
            print(f"   Key matches: {', '.join(list(matching)[:4])}")
        print()

## 8. Visualization

In [None]:
import matplotlib.pyplot as plt

# Create heatmap visualization
fig, ax = plt.subplots(figsize=(10, 6))

im = ax.imshow(similarity_matrix, cmap='Blues', aspect='auto', vmin=0, vmax=1)

# Labels
ax.set_xticks(range(len(job_titles)))
ax.set_yticks(range(len(resume_names)))
ax.set_xticklabels(job_titles, rotation=45, ha='right', fontsize=10)
ax.set_yticklabels(resume_names, fontsize=10)

# Add values on heatmap
for i in range(len(resume_names)):
    for j in range(len(job_titles)):
        score = similarity_matrix[i, j]
        color = 'white' if score > 0.5 else 'black'
        ax.text(j, i, f'{score:.0%}', ha='center', va='center', color=color, fontsize=11)

ax.set_xlabel('Job Positions', fontsize=12)
ax.set_ylabel('Candidates', fontsize=12)
ax.set_title('üéØ Resume-Job Similarity Matrix', fontsize=14, fontweight='bold')

# Colorbar
cbar = plt.colorbar(im, ax=ax)
cbar.set_label('Similarity Score', fontsize=11)

plt.tight_layout()
plt.show()

In [None]:
# Bar chart of best matches
fig, ax = plt.subplots(figsize=(10, 5))

# Prepare data
candidates = []
best_job = []
scores = []

for i, resume in enumerate(resumes):
    best_match = matches[i][0]
    job = jobs[best_match.job_index]
    
    candidates.append(resume.name or f"Resume {i+1}")
    best_job.append(job.title or f"Job {best_match.job_index+1}")
    scores.append(best_match.similarity_score)

colors = ['#28a745' if s >= 0.7 else '#ffc107' if s >= 0.5 else '#dc3545' for s in scores]

bars = ax.barh(candidates, scores, color=colors)

# Add job labels on bars
for bar, job_title, score in zip(bars, best_job, scores):
    ax.text(bar.get_width() + 0.02, bar.get_y() + bar.get_height()/2,
            f'{job_title} ({score:.0%})', va='center', fontsize=9)

ax.set_xlabel('Match Score', fontsize=12)
ax.set_title('üèÜ Best Job Match for Each Candidate', fontsize=14, fontweight='bold')
ax.set_xlim(0, 1.3)
ax.axvline(0.7, color='green', linestyle='--', alpha=0.5, label='Excellent (70%+)')
ax.axvline(0.5, color='orange', linestyle='--', alpha=0.5, label='Good (50%+)')
ax.legend(loc='lower right')

plt.tight_layout()
plt.show()

## 9. Summary

This notebook demonstrated the complete workflow of the Smart Resume & Job Matcher:

1. **Parsing**: Extracted structured information from resumes and job descriptions
2. **Embedding**: Generated semantic vector representations using SentenceTransformers
3. **Matching**: Computed cosine similarity to find best matches
4. **Analysis**: Analyzed skill gaps and coverage
5. **Explanation**: Generated AI-powered match explanations (when LLM available)
6. **Visualization**: Created heatmaps and charts to visualize results

### Key Findings:
- **John Smith** (Data Scientist) is an excellent match for the ML Engineer role
- **Sarah Johnson** (Software Engineer) matches well with the Full Stack Developer position
- **Michael Chen** (Marketing Analyst) is best suited for the Data Analyst role

### Next Steps:
- Run the Streamlit app for an interactive UI: `streamlit run app/main.py`
- Upload your own resumes and job descriptions
- Configure different embedding providers (Ollama, Google) in `.env`

In [None]:
print("üéâ Demo Complete!")
print("\nTo run the Streamlit app, execute:")
print("  streamlit run app/main.py")