# ResumeRecommendationReview

- Author: [Ilgyun Jeong](https://github.com/johnny9210)
- Design:
- Peer Review:
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)

## Overview

The ResumeRecommendationReview system is a comprehensive solution designed to simplify and enhance the job application process for individuals seeking corporate positions. The system is divided into two main components, each tailored to address key challenges faced by job seekers:

1) Company Recommendation
Using advanced matching algorithms, the system analyzes a user’s uploaded resume and compares it with job postings on LinkedIn. Based on this analysis, it identifies and recommends companies that align closely with the candidate’s qualifications, skills, and career aspirations.

2) Resume Evaluation and Enhancement
For the recommended companies, the system conducts a detailed evaluation of the user’s resume. It highlights strengths, identifies areas for improvement, and provides actionable suggestions for tailoring the resume to better fit the expectations of target roles. This ensures candidates can present their qualifications in the most impactful way possible.

By integrating these two components, the ResumeRecommendationReview system streamlines the job application journey, empowering users to:

- Discover job opportunities that best match their unique profile.
- Optimize their resumes for maximum impact, increasing their chances of securing interviews and job offers.

**Key Features**:

- **CV/Resume Upload**: 
  Users begin by uploading their existing CV or resume in a supported file format (e.g., PDF)
  The system extracts relevant keywords, experiences, and skill sets to build a user profile.

- **Job Matching with LinkedIn Postings**: 
  The platform automatically scans LinkedIn job listings (and potentially other job boards) for roles that align with the user’s skill set and career interests.
  A matching algorithm ranks and recommends a list of the most relevant companies and positions for the candidate to consider.

- **Comparison & Evaluation (LLM-as-a-Judge)**  
  The system leverages a Large Language Model (LLM) to analyze the uploaded resume and specific job requirements. 
  It evaluates the alignment between the user's experience and the job description, identifying strengths,  skill gaps, and areas in need of improvement.
  Additionally, the system evaluates the recommendation performance using **cosine similarity** to measure the semantic alignment and **NDCG (Normalized Discounted Cumulative Gain)** to assess the ranking quality of the recommendations.
  
- **Automated Resume Enhancement**: 
  Based on the LLM evaluation, the system provides a detailed report highlighting sections that need modification.
  Suggested edits may include restructuring experience points, emphasizing relevant skills, or adding keywords that match the job posting’s expectations.


### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Data Preparation and Preprocessing](#data-preparation-and-preprocessing)
- [Setting Up ChromaDB and Storing Data](#Setting-Up-ChromaDB-and-Storing-Data)
- [Company Recommendation System](#Company-Recommendation-System)
- [LLM-Based Resume Evaluation System](#LLM-Based-Resume-Evaluation-System)
- [LLM-Based Resume Revise System](#LLM-Based-Resume-Revise-System)

### References



---


## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**

- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.


In [2]:
%%capture --no-stderr
%pip install langchain-opentutorial


[notice] A new release of pip available: 22.3.1 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "chromadb",
        "langchain_chroma",
        "langchain_openai",
        "PyMuPDF",
        "pydantic",
        "pandas",
        "kagglehub",
        "langchain_community",
        "numpy",
    ],
    verbose=False,
    upgrade=False,
)

In [4]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "ResumeRecommendationReview",
        "UPSTAGE_API_KEY": "",
    }
)

Environment variables have been set successfully.


In [5]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Data Preparation and Preprocessing

This section covers the data preparation and preprocessing steps required for the Resume Recommendation System. The key stages include:

- **Processing resume data (PDF)**  
- **Processing LinkedIn job postings**  

For the LinkedIn job postings data, this tutorial uses the dataset available on Kaggle: [arshkon/linkedin-job-postings](https://www.kaggle.com/arshkon/linkedin-job-postings).  

Using the raw data directly to build the recommendation system may lead to suboptimal performance. Therefore, the data is refined and preprocessed to focus specifically on recruitment-related information to enhance the accuracy and relevance of the recommendations.

Install and Import Required Libraries

In [6]:
# Import required libraries
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pandas as pd
import fitz  # PyMuPDF
import re
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import Dict, List
import kagglehub
import json

  from .autonotebook import tqdm as notebook_tqdm


Text Splitting Configuration

Set up configurations to divide the extracted text into manageable sizes, ensuring smooth processing:

Parameter Descriptions:
- `chunk_size`: The maximum length of each text chunk, ensuring the text is divided into manageable sections.
- `chunk_overlap`: The length of overlapping text between chunks, providing continuity and context for downstream tasks.
- `separators`: The delimiters used to split the text, such as line breaks or punctuation, to optimize the splitting process.

In [7]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", " ", ""]
)

Defining the Pydantic Model

In this section, we define a structured data model using Pydantic, which ensures validation and consistency in the data extracted from resumes. This model is critical for organizing key sections of a resume into a format that the system can analyze effectively.

In [8]:
# Define the Pydantic model
class ResumeSection(BaseModel):
    skills: List[str] = Field(description="List of job-related technical skills")
    work_experience: List[Dict[str, str]] = Field(
        description="Work experience (role, description)"
    )
    projects: List[Dict[str, str]] = Field(
        description="Project experience (name, description)"
    )
    achievements: List[str] = Field(
        description="List of major achievements and accomplishments"
    )
    education: List[Dict[str, str]] = Field(
        description="Education information (name, description)"
    )


# Configure the PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=ResumeSection)

Analyzing Interests in Resumes

The `analyze_interests` function is designed to extract and summarize the key areas of interest and research focus from a resume. It uses a **Large Language Model (LLM)** to process the resume text and provide a concise summary, helping to identify the candidate's academic and professional interests effectively.

Purpose
- Extracts **main areas of interest** and **research focus** from the provided resume text.
- Generates a **brief summary** (2-3 sentences) that highlights the candidate's academic and career patterns.
- Focuses solely on interests and research areas to provide targeted insights.

In [9]:
llm = ChatOpenAI(model="gpt-4o", temperature=0.2)


def analyze_interests(resume_text: str, llm) -> str:
    """Analyzes the complete resume text to identify key interest areas."""
    interest_prompt = """Analysis this resume text and provide a brief summary (2-3 sentences) 
    of the person's main areas of interest and research focus. Focus on their academic interests, 
    research topics, and career patterns.

    Resume Text:
    {text}

    Provide a concise summary focusing ONLY on their interests and research areas."""

    messages = [{"role": "user", "content": interest_prompt.format(text=resume_text)}]
    response = llm.invoke(messages)
    return response.content.strip()

Analyzing Career Fit in Resumes

The `analyze_career_fit` function evaluates a candidate's resume to recommend the most suitable job roles along with their respective fit scores. By leveraging a **Large Language Model (LLM)**, this function identifies key areas of expertise and rates the candidate's suitability for various technical positions.

Purpose
- Recommends **job roles** based on the candidate's skills, research background, and career trajectory.
- Assigns a **fit score** (0.0 to 1.0) for each role, reflecting the candidate's alignment with the position.

In [10]:
def analyze_career_fit(resume_text: str, llm) -> Dict[str, float]:
    """Analyzes the resume to recommend suitable job roles and their fit scores."""
    career_prompt = """You are an expert technical recruiter. Analyze this resume and recommend the most suitable job roles.
    Focus on the candidate's expertise, research background, and career trajectory.
    
    Resume Text:
    {text}
    
    Based on their background, rate the candidate's fit (0.0 to 1.0) for different technical roles.
    Consider:
    - Technical expertise and depth
    - Research contributions
    - Project complexity
    - Educational background
    - Career progression
    
    Return ONLY a JSON object with role-fit pairs, like:
    {{"Research Scientist": 0.95, "Machine Learning Engineer": 0.9, "Algorithm Engineer": 0.85}}
    
    Include only roles with fit score > 0.7. Focus on senior/research level positions if appropriate."""

    messages = [{"role": "user", "content": career_prompt.format(text=resume_text)}]
    response = llm.invoke(messages)

    try:
        return json.loads(response.content.strip())
    except json.JSONDecodeError:
        return {}

Processing Resumes to Extract Key Job-Related Information

The `process_resume` function analyzes a resume file, extracting and processing key information relevant to job applications. It combines **text extraction**, **interest analysis**, and **career fit evaluation** to generate structured, weighted insights from the resume.

### Function Overview

Purpose
- Extract **key job-related information** from resumes in PDF format.
- Use **LLM analysis** to evaluate the candidate's skills, experience, projects, achievements, and education.
- Assign **weights** to each section based on relevance to the target job role.

In [11]:
def process_resume(file_path, target_job_title=None):
    """Analyze a resume to extract key job-related information."""
    doc = fitz.open(file_path)
    resume_text = ""
    for page in doc:
        resume_text += page.get_text()

    # Get interest summary and career fit analysis
    interest_summary = analyze_interests(resume_text, llm)
    career_fit = analyze_career_fit(resume_text, llm)

    prompt_template = """You are a professional resume analyst specializing in research and technical roles.
    Analyze the resume in detail, focusing on the candidate's expertise level and research background.
    
    Target Job Title: {target_job_title}
    
    Resume Content:
    {resume_text}
    
    Extract the information in the following format:
    {format_instructions}
    
    Focus on extracting information most relevant to research and technical roles.
    Pay special attention to:
    - Research contributions and impact
    - Technical depth in each area
    - Project complexity and leadership
    - Academic achievements and specializations"""

    # Create the prompt
    prompt = ChatPromptTemplate.from_template(prompt_template)

    # Format the messages
    messages = prompt.format_messages(
        target_job_title=target_job_title if target_job_title else "Not specified",
        resume_text=resume_text,
        format_instructions=parser.get_format_instructions(),
    )

    # Perform LLM analysis
    response = llm.invoke(messages)

    try:
        parsed_sections = parser.parse(response.content)
        print("Resume analysis completed.")
    except Exception as e:
        print(f"Parsing error: {e}")
        print(f"LLM response: {response.content}")
        return []

    # Apply weights based on job relevance
    weighted_content = []

    # Skills (Weight: 0.25)
    if parsed_sections.skills:
        skills_text = " ".join(parsed_sections.skills)
        weighted_content.append((skills_text, 0.25))

    # Work Experience (Weight: 0.3)
    if parsed_sections.work_experience:
        experience_text = "\n".join(
            [
                f"{exp.get('role', '')}: {exp.get('description', '')}"
                for exp in parsed_sections.work_experience
            ]
        )
        weighted_content.append((experience_text, 0.3))

    # Projects (Weight: 0.2)
    if parsed_sections.projects:
        projects_text = "\n".join(
            [
                f"{proj.get('name', '')}: {proj.get('description', '')}"
                for proj in parsed_sections.projects
            ]
        )
        weighted_content.append((projects_text, 0.2))

    # Achievements (Weight: 0.1)
    if parsed_sections.achievements:
        achievements_text = " ".join(parsed_sections.achievements)
        weighted_content.append((achievements_text, 0.1))

    # Education (Weight: 0.05)
    if parsed_sections.education:
        education_text = "\n".join(
            [
                f"{edu.get('name', '')}: {edu.get('description', '')}"
                for edu in parsed_sections.education
            ]
        )
        weighted_content.append((education_text, 0.05))

    # Add interest summary and career fit (combined weight: 0.1)
    if interest_summary or career_fit:
        analysis_text = (
            "Research Interests and Focus Areas: " + interest_summary + "\n\n"
        )
        if career_fit:
            analysis_text += "Recommended Roles:\n"
            for role, score in sorted(
                career_fit.items(), key=lambda x: x[1], reverse=True
            ):
                analysis_text += f"- {role}: {score:.2f}\n"

        weighted_content.append((analysis_text, 0.1))

        # Adjust other weights to maintain total of 1.0
        weighted_content = [
            (content, weight * 0.9) for content, weight in weighted_content[:-1]
        ] + [weighted_content[-1]]

    # Generate chunks for each section
    processed_chunks = []
    for content, weight in weighted_content:
        if content.strip():  # Process only non-empty strings
            chunks = text_splitter.split_text(content)
            processed_chunks.extend([(chunk, weight) for chunk in chunks])

    print(f"Number of extracted chunks: {len(processed_chunks)}")
    print("\nCareer Analysis Summary:")
    print("------------------------")
    print("Interests:", interest_summary)
    print("\nRecommended Roles:")
    for role, score in sorted(career_fit.items(), key=lambda x: x[1], reverse=True):
        print(f"- {role}: {score:.2f}")

    return processed_chunks

Resume Processing Example

Here's an example of how to use the `process_resume` function to extract structured data from a resume:

In [11]:
process_resume("../data/joannadrummond-cv.pdf")

Resume analysis completed.
Number of extracted chunks: 8

Career Analysis Summary:
------------------------
Interests: Joanna Drummond's primary academic interests and research focus lie in computer science, particularly in algorithms, artificial intelligence, and game theory. Her research has extensively explored stable matching problems, preference elicitation, and multi-agent systems, with a specific emphasis on developing algorithms for stable and approximately stable matches using partial information and multi-attribute preferences. Additionally, she has investigated the application of machine learning techniques in educational contexts, such as classifying student engagement and understanding in intelligent tutoring systems.

Recommended Roles:


[('Python Java Julia R Matlab Unix Shell Scripting (bash) Linux Mac OSX Windows LATEX Weka',
  0.225),
 ('Research Intern: Microsoft Research, with Ian Kash and Peter Key, May 2016 to August 2016. Investigated simple pricing for cloud computing.\nResearch Assistant: University of Toronto, Department of Computer Science, August 2011 to Present. Investigated Bayes-Nash and ex-post equilibria for matching games with imperfect information, stable and approximately stable matching using multi-attribute preference information, and elicitation schemes using multi-attribute based queries.\nResearch Assistant: University of Pittsburgh Department of Computer Science, April 2008 to May 2011. Investigated the impact of different training set populations on accurately classifying student uncertainty while using a spoken intelligent physics tutor.\nDirected Study: University of Pittsburgh Department of Computer Science, September 2010 to December 2010. Analyzed and proved properties about an algorit

LinkedIn Data Preprocessing

This step involves loading job posting data and extracting only the necessary details. The dataset used for this tutorial is sourced from **Kaggle**: [arshkon/linkedin-job-postings](https://www.kaggle.com/arshkon/linkedin-job-postings).

- `company_name`: The name of the company offering the job posting.
- `title`: The title of the job being offered.
- `description`: A detailed description of the job, including responsibilities, qualifications, and expectations.
- `max_salary`: The maximum salary offered for the position.
- `med_salary`: The median salary for the position, providing an average range for the offered pay.
- `min_salary`: The minimum salary offered for the position.
- `skills_desc`: A list or summary of the required or preferred skills for the position.
- `work_type`: The type of work arrangement, such as full-time, part-time, remote, or hybrid.

Purpose of These Columns
These selected columns are essential for processing job posting data. They allow the system to:

- Extract relevant metadata for recommendation and filtering.
- Match resumes to job postings based on skills, and job details.
- Provide users with clear and actionable job-related information.

In [23]:
path = kagglehub.dataset_download("arshkon/linkedin-job-postings", path="postings.csv")

df = pd.read_csv(path)

selected_columns = [
    "company_name",
    "title",
    "description",
    "max_salary",
    "med_salary",
    "min_salary",
    "skills_desc",
    "work_type",
]
linkedin_df = df[selected_columns].copy()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 14: invalid start byte

Text Cleaning Function

Here’s a utility function designed to clean and preprocess text data for better consistency and quality:

If there are any `null` values in the company name field, those entries are excluded. (While other fields may also have `null` values, this step focuses only on excluding records with `null` in the company name.)

In [24]:
def clean_text(text):
    if pd.isna(text):
        return ""
    # Remove HTML tags
    text = re.sub(r"<[^>]+>", "", str(text))
    # Remove consecutive whitespace
    text = re.sub(r"\s+", " ", text)
    return text.strip()


# Remove rows where company_name is empty
linkedin_df = linkedin_df.dropna(subset=["company_name"])
# Alternative using boolean indexing:
# linkedin_df = linkedin_df[linkedin_df['company_name'].notna()]

# Clean text data
linkedin_df["description"] = linkedin_df["description"].apply(clean_text)
linkedin_df["skills_desc"] = linkedin_df["skills_desc"].apply(clean_text)
linkedin_df["title"] = linkedin_df["title"].apply(clean_text)

# Process salary information
for col in ["max_salary", "med_salary", "min_salary"]:
    linkedin_df[col] = pd.to_numeric(linkedin_df[col], errors="coerce")

# Handle missing values
linkedin_df["work_type"] = linkedin_df["work_type"].fillna("Not specified")

NameError: name 'linkedin_df' is not defined

Processing Job Postings Data

The `process_job_postings` function integrates and processes job information from a LinkedIn dataset to create structured documents for analysis or recommendation purposes.

This function takes a DataFrame of LinkedIn job postings and processes each entry into a standardized format, combining relevant details like company name, job title, required skills, and salary information.

In [26]:
def process_job_postings(linkedin_df):
    """Process and integrate job information"""
    job_documents = []

    # Integrate information for each job
    for _, row in linkedin_df.iterrows():
        # Format salary information
        salary_info = "No salary information"
        if pd.notna(row["min_salary"]) and pd.notna(row["max_salary"]):
            salary_info = f"{row['min_salary']:,.0f} - {row['max_salary']:,.0f}"
        elif pd.notna(row["med_salary"]):
            salary_info = f"Average {row['med_salary']:,.0f}"

        # Integrate job information
        job_text = f"""
        Company: {row['company_name']}
        Position: {row['title']}
        Work Type: {row['work_type']}
        Salary: {salary_info}
        
        Required Skills:
        {row['skills_desc']}
        
        Job Description:
        {row['description']}
        """

        # Store with metadata
        job_documents.append(
            {
                "content": job_text,
                "metadata": {
                    "company": row["company_name"],
                    "title": row["title"],
                    "work_type": row["work_type"],
                    "salary": salary_info,
                    "skills": row["skills_desc"],
                },
            }
        )

    return job_documents


# Usage example
job_documents = process_job_postings(linkedin_df)

NameError: name 'linkedin_df' is not defined

## Setting Up ChromaDB and Storing Data

Using ChromaDB for Storing and Retrieving Resume and Job Posting Data
In this section, we will explore how to use ChromaDB to store resume and job posting data as vector representations and perform similarity-based searches.

What is `ChromaDB`?

`ChromaDB` is a vector database that allows text data to be stored as embeddings, enabling efficient similarity-based searches. In our Resume Recommendation System, ChromaDB is used for the following purposes:

- Vectorizing Text: Converting resume and job posting text into vector representations.
- Efficient Similarity Search: Performing fast searches based on the similarity of embeddings.
- Metadata-Based Search and Filtering: Enhancing search results with filters like job title, or company name.


Setup Steps
Preparing Required Libraries

Before starting, import the necessary libraries:

Roles of Each Library:

- `langchain_community.vectorstores`: Provides integration with ChromaDB.
- `langchain_openai`: Enables the use of OpenAI embedding models.
- `chromadb`: Provides vector database functionality.

In [27]:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
import chromadb

Initializing ChromaDB

Set up ChromaDB and create collections:

Why Use PersistentClient?
- `Permanent Data Storage`: Ensures that data is not lost when the application or session ends.
- `Data Persistence Across Sessions`: Allows the system to retain data for use in future queries without requiring re-upload or re-processing.
- `Ease of Backup and Recovery`: Provides a reliable way to save and restore data for robustness and fault tolerance.

In [28]:
client = chromadb.PersistentClient(path="../data/chromadb")

# Create separate collections for resumes and job postings
resume_collection = client.create_collection("resumes")
job_collection = client.create_collection("jobs")

Storing Data
This step involves saving resume and job posting data into ChromaDB for efficient querying and management.
Origin data has too many data, so we use only 500 data

In [29]:
# Prepare resume data
resume_file_path = "../data/joannadrummond-cv.pdf"  # Path to the resume PDF file
resume_chunks = process_resume(
    resume_file_path
)  # Using the previously defined process_resume function

resume_texts = [chunk[0] for chunk in resume_chunks]
resume_metadatas = [
    {"source": "resume", "type": "text", "weight": chunk[1]} for chunk in resume_chunks
]

resume_ids = [f"resume_chunk_{i}" for i in range(len(resume_chunks))]

# origin data has too many data, so we use only 500 data
job_documents_ = job_documents[:500]

# Prepare job posting data (same as before)
job_texts = [doc["content"] for doc in job_documents_]
job_metadatas = [doc["metadata"] for doc in job_documents_]
job_ids = [f"job_{i}" for i in range(len(job_documents_))]

# Generate and store embeddings
embeddings = OpenAIEmbeddings()

# Resume embeddings
resume_embeddings = embeddings.embed_documents(resume_texts)
resume_collection.add(
    embeddings=resume_embeddings,
    documents=resume_texts,
    metadatas=resume_metadatas,
    ids=resume_ids,
)

# Job posting embeddings
job_embeddings = embeddings.embed_documents(job_texts)
job_collection.add(
    embeddings=job_embeddings, documents=job_texts, metadatas=job_metadatas, ids=job_ids
)

Resume analysis completed.
Number of extracted chunks: 8

Career Analysis Summary:
------------------------
Interests: Joanna Drummond's main areas of interest and research focus are in computer science, particularly in algorithms, artificial intelligence, and game theory. Her research has concentrated on stable matching problems, preference elicitation, and decision-making under uncertainty, with applications in multi-agent systems and educational technologies. She has also explored topics related to dialogue systems and student engagement in educational settings.

Recommended Roles:


NameError: name 'job_documents' is not defined

Example of Job_documents_

In [30]:
job_documents_[0]

NameError: name 'job_documents_' is not defined

## Company Recommendation System

This section focuses on recommending companies that align with the candidate's resume and evaluates the recommendations using two key metrics:

1. **Cosine Similarity for Recommendation Evaluation**:  
   - Measures the similarity between the candidate's resume and the job posting.  
   - A higher cosine similarity score indicates a stronger match between the candidate's profile and the company's job requirements.

2. **NDCG (Normalized Discounted Cumulative Gain) for Recommendation Evaluation**:  
   - Assesses the quality of the ranking of recommended companies.  
   - A higher NDCG score signifies that the most relevant companies appear at the top of the recommendation list, reflecting better ranking performance.

### Understanding the Scores
- **High Scores**:  
   - Indicate a strong alignment between the resume and the recommended companies (Cosine Similarity).  
   - Demonstrate that the ranking system effectively prioritizes the most relevant companies (NDCG).  
- **Low Scores**:  
   - Suggest weaker matches between the resume and job postings or suboptimal ranking of recommendations.  

The goal is to achieve high scores in both metrics, ensuring accurate and effective company recommendations for the candidate.

Job Recommendation System with Weighted Similarity Search

This implementation utilizes a **Job Recommendation System** to match resumes with the most relevant job postings. By combining **cosine similarity** and **weighted scoring**, the system ensures accurate and tailored recommendations.


---
- **Personalized Matching**: Matches resumes to job postings with high accuracy.
- **Flexible Scoring**: Incorporates weighted factors to prioritize specific job attributes.
- **Enhanced Readability**: Formats job descriptions for easy review.

In [31]:
import numpy as np
from typing import List, Dict, Union

# Retrieve resume and job posting collections
resume_collection = client.get_collection("resumes")
job_collection = client.get_collection("jobs")

# Retrieve stored resume data from ChromaDB
resume_results = resume_collection.get(
    include=["documents", "metadatas"]
)  # Use the get() method to fetch all data

# Combine resume texts
full_resume_text = " ".join(resume_results["documents"])

# Configure embeddings
embeddings = OpenAIEmbeddings()

# Convert the resume text into a query vector
query_embedding = embeddings.embed_query(full_resume_text)

# Use ChromaDB to search for the top 5 most similar job postings
job_results = job_collection.query(
    query_embeddings=[query_embedding],
    n_results=5,
    include=["documents", "metadatas", "distances"],
)

# List to store recommended jobs
recommended_jobs = []


class JobRecommender:
    def __init__(self, resume_collection, job_collection):
        self.resume_collection = resume_collection
        self.job_collection = job_collection
        self.embeddings = OpenAIEmbeddings()

    def get_resume_text(self) -> str:
        """Get combined resume text from collection"""
        resume_results = self.resume_collection.get(include=["documents", "metadatas"])
        return " ".join(resume_results["documents"])

    def get_query_embedding(self, text: str) -> List[float]:
        """Convert text to embedding vector"""
        return self.embeddings.embed_query(text)

    def weighted_similarity_search(
        self, query_embedding: List[float], method: str = "cosine", n_results: int = 5
    ) -> List[Dict]:
        """
        Search jobs using weighted similarity

        Args:
            query_embedding: The query embedding vector
            method: Similarity method ('cosine' or 'distance')
            n_results: Number of results to return
        """
        include_params = ["documents", "metadatas"]
        include_params.append("embeddings" if method == "cosine" else "distances")

        results = self.job_collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results * 2,  # Get more results for reranking
            include=include_params,
        )

        weighted_results = []
        for i in range(len(results["documents"][0])):
            weight = results["metadatas"][0][i].get("weight", 1.0)

            if method == "cosine":
                doc_embedding = results["embeddings"][0][i]
                similarity = np.dot(query_embedding, doc_embedding) / (
                    np.linalg.norm(query_embedding) * np.linalg.norm(doc_embedding)
                )
            else:  # distance
                distance = results["distances"][0][i]
                similarity = 1 - distance

            weighted_score = similarity * weight
            job_desc = self._clean_job_description(results["documents"][0][i])

            # Ensure consistent dictionary structure with search_jobs_by_distance
            weighted_results.append(
                {
                    "company": results["metadatas"][0][i].get("company", "Unknown"),
                    "title": results["metadatas"][0][i].get("title", "Unknown"),
                    "description": job_desc,
                    "similarity": weighted_score,  # Use weighted_score as the similarity
                    "metadata": results["metadatas"][0][i],
                }
            )

        # Sort by weighted score and get top results
        weighted_results.sort(key=lambda x: x["similarity"], reverse=True)
        return weighted_results[:n_results]

    def _clean_job_description(self, description: str) -> str:
        """Clean job description text"""
        return description.strip().replace("\n\n", "\n")

    def print_recommendations(self, recommendations: List[Dict]) -> List[Dict]:
        """Print job recommendations and return the results"""
        print("\n=== Similar Job Posting Search Results ===")  # Changed from Korean
        results = []

        for i, job in enumerate(recommendations, 1):
            print(f"\n\nJob Posting #{i}")
            print("=" * 80)
            print(f"Company: {job['company']}")
            print(f"Position: {job['title']}")
            print(f"Similarity Score: {job['similarity']:.2f}")
            print("\n[Job Description]")

            desc_lines = [
                line.strip() for line in job["description"].split("\n") if line.strip()
            ]
            for line in desc_lines:
                print(line)
            print("-" * 80)

            # Add results to list  # Changed from Korean
            results.append(
                {
                    "company": job["company"],
                    "title": job["title"],
                    "description": job["description"],
                    "similarity": job["similarity"],
                }
            )

        return results


# Execution section modification  # Changed from Korean
# Initialize collections
resume_collection = client.get_collection("resumes")
job_collection = client.get_collection("jobs")

# Create recommender instance
recommender = JobRecommender(resume_collection, job_collection)

# Get resume text and create query embedding
resume_text = recommender.get_resume_text()
query_embedding = recommender.get_query_embedding(resume_text)

# Get recommendations using different methods
weighted_recommendations = recommender.weighted_similarity_search(
    query_embedding, method="cosine"
)

# Print results and store them
print("\n=== Weighted Recommendations ===")
recommended_jobs = recommender.print_recommendations(weighted_recommendations)


=== Weighted Recommendations ===

=== Similar Job Posting Search Results ===


Resume and Job Recommendation Evaluation System

This implementation introduces a comprehensive evaluation system for job recommendations based on resumes.

 The system leverages **Discounted Cumulative Gain (DCG)** and **Normalized Discounted Cumulative Gain (NDCG)** to measure the quality of recommendations. Additionally, precision and recall metrics are calculated for further analysis.

In [32]:
from typing import List, Dict
import numpy as np
import math


class ResumeProcessor:
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-4o", temperature=0.2)

    def process_resume(self, resume_texts: List[str]) -> str:
        """Process text using already refined resume_texts"""
        return " ".join(resume_texts)


class NDCGEvaluator:
    def __init__(self, model_name="gpt-4", temperature=0.2):
        self.llm = ChatOpenAI(model=model_name, temperature=temperature)

        # Ground truth generation prompt
        self.relevance_prompt = ChatPromptTemplate.from_template(
            """
        As an expert recruiter, evaluate the relevance between this resume and job posting.
        Consider technical skills, experience level, and overall fit.

        Resume:
        {resume_text}

        Job Posting:
        {job_text}

        Rate the relevance on a scale of 0 to 1, where:
        1.0: Perfect match
        0.8: Very good match
        0.6: Good match
        0.4: Moderate match
        0.2: Poor match
        0.0: No match

        Provide only the numerical score, nothing else.
        """
        )

    def calculate_dcg(self, relevance_scores: List[float], k: int = None) -> float:
        """Calculate Discounted Cumulative Gain
        Formula: DCG = sum(rel_i / log2(i + 2)) where rel_i is the relevance of item i
        """
        if k is None:
            k = len(relevance_scores)
        else:
            k = min(k, len(relevance_scores))

        dcg = 0.0
        for i in range(k):
            # 2^rel - 1 is commonly used for NDCG calculation to emphasize relevant items
            rel = 2 ** relevance_scores[i] - 1
            dcg += rel / math.log2(i + 2)
        return dcg

    def calculate_ndcg(
        self, predicted_scores: List[float], ideal_scores: List[float], k: int = None
    ) -> float:
        """Calculate Normalized Discounted Cumulative Gain
        NDCG = DCG / IDCG where IDCG is DCG of ideal ordering
        """
        if not predicted_scores or not ideal_scores:
            return 0.0

        if k is None:
            k = len(predicted_scores)
        k = min(k, len(predicted_scores))

        # Sort ideal scores in descending order
        ideal_scores_sorted = sorted(ideal_scores, reverse=True)

        dcg = self.calculate_dcg(predicted_scores[:k], k)
        idcg = self.calculate_dcg(ideal_scores_sorted[:k], k)

        if idcg == 0:
            return 0.0

        ndcg = dcg / idcg
        # Ensure NDCG is between 0 and 1
        return max(0.0, min(1.0, ndcg))

    def generate_ground_truth(
        self, resume_text: str, job_postings: List[Dict]
    ) -> Dict[str, float]:
        """Generate ground truth relevance scores using LLM"""
        ground_truth = {}

        for job in job_postings:
            if job["company"] == "Unknown":
                continue

            messages = self.relevance_prompt.format_messages(
                resume_text=resume_text, job_text=job["description"]
            )

            response = self.llm.invoke(messages)
            try:
                score = float(response.content.strip())
                ground_truth[job["company"]] = score
            except ValueError:
                print(f"Error parsing score for company {job['company']}")
                ground_truth[job["company"]] = 0.0

        return ground_truth

    def normalize_scores(self, scores: List[float]) -> List[float]:
        """Normalize scores to 0-1 range"""
        if not scores:
            return scores

        min_score = min(scores)
        max_score = max(scores)

        if max_score == min_score:
            return [1.0 for _ in scores]

        return [(score - min_score) / (max_score - min_score) for score in scores]

    def evaluate_recommendations(
        self, resume_text: str, recommended_jobs: List[Dict], k: int = None
    ) -> Dict[str, float]:
        """Evaluate recommendations using NDCG"""
        # Filter out Unknown companies
        valid_jobs = [job for job in recommended_jobs if job["company"] != "Unknown"]

        # Generate ground truth scores
        ground_truth = self.generate_ground_truth(resume_text, valid_jobs)

        # Get predicted scores and normalize them
        predicted_scores = [job["similarity"] for job in valid_jobs]
        predicted_scores = self.normalize_scores(predicted_scores)

        # Get ideal scores in the same order as predictions
        ideal_scores = [ground_truth[job["company"]] for job in valid_jobs]

        # Calculate NDCG
        ndcg_score = self.calculate_ndcg(predicted_scores, ideal_scores, k)

        # Additional metrics
        if k is None:
            k = len(valid_jobs)

        # Calculate precision and recall using threshold of 0.6 for relevance
        relevant_recommended = sum(1 for score in ideal_scores[:k] if score >= 0.6)
        total_relevant = sum(1 for score in ground_truth.values() if score >= 0.6)

        precision_at_k = relevant_recommended / k if k > 0 else 0
        recall_at_k = relevant_recommended / total_relevant if total_relevant > 0 else 0

        return {
            "ndcg": ndcg_score,
            "precision@k": precision_at_k,
            "recall@k": recall_at_k,
            "ground_truth": ground_truth,
            "normalized_predictions": dict(
                zip([job["company"] for job in valid_jobs], predicted_scores)
            ),
        }


def print_evaluation_results(metrics: Dict[str, float], recommended_jobs: List[Dict]):
    """Print detailed evaluation results"""
    print("\n=== Recommendation Evaluation Results ===")
    print(f"NDCG Score: {metrics['ndcg']:.3f}")
    print(f"Precision@k: {metrics['precision@k']:.3f}")
    print(f"Recall@k: {metrics['recall@k']:.3f}")

    print("\nDetailed Company Scores:")
    print("=" * 80)
    print(f"{'Company':<30} {'Original':<10} {'Normalized':<10} {'Ground Truth':<10}")
    print("-" * 80)

    for job in recommended_jobs:
        company = job["company"]
        if company == "Unknown":
            continue

        original = job["similarity"]
        normalized = metrics["normalized_predictions"].get(company, 0.0)
        ground_truth = metrics["ground_truth"].get(company, 0.0)
        print(
            f"{company:<30} {original:<10.3f} {normalized:<10.3f} {ground_truth:<10.3f}"
        )

Excute Evaluation

In [33]:
# Use existing processed resume_texts
resume_processor = ResumeProcessor()
resume_text = resume_processor.process_resume(resume_texts)

evaluator = NDCGEvaluator()
metrics = evaluator.evaluate_recommendations(
    resume_text=resume_text,
    recommended_jobs=recommended_jobs,
    k=5,  # Evaluate top-5 recommendations
)

# Print evaluation results
print_evaluation_results(metrics, recommended_jobs)


=== Recommendation Evaluation Results ===
NDCG Score: 0.000
Precision@k: 0.000
Recall@k: 0.000

Detailed Company Scores:
Company                        Original   Normalized Ground Truth
--------------------------------------------------------------------------------


## LLM-Based Resume Evaluation System

In this section, we implement a system that utilizes a `Large Language Model (LLM)` to compare and analyze resumes against job requirements.

---

What is `LLM-as-a-Judge`?

The `LLM-as-a-Judge` system leverages the advanced reasoning and natural language understanding capabilities of an LLM to serve as an impartial evaluator in the hiring process. By acting as a "judge," the LLM compares a candidate’s resume to job requirements, evaluates their alignment, and provides actionable feedback.

Key features of the `LLM-as-a-Judge` system include:
- `Contextual Understanding`: It comprehends detailed job descriptions and resumes beyond simple keyword matching, enabling nuanced evaluations.  
- `Feedback Generation`: Provides insights into the candidate's strengths and areas for improvement.  
- `Decision Support`: Assists hiring managers or applicants by generating a recommendation on the candidate's suitability for the role.

This system bridges the gap between human evaluation and automated analysis, ensuring more accurate and tailored results in the recruitment process.

---

Functionalities

The `LLM-as-a-Judge` system provides the following functionalities:

- `Detailed Analysis`: Analyzes resumes and job requirements in detail, identifying key qualifications and expectations.  
- `Alignment Evaluation`: Assesses how well the candidate's skills and experiences match the job requirements.  
- `Strengths and Improvement Areas`: Identifies the candidate's strengths and offers suggestions for improvement.  
- `Role Suitability Recommendation`: Provides a final recommendation on whether the candidate is a good fit for the role.  


LLM-Based Resume Evaluation System

This system leverages a **Large Language Model (LLM)** to evaluate resumes against job descriptions systematically. It provides detailed feedback based on predefined evaluation criteria, helping candidates understand their strengths, areas for improvement, and overall suitability for specific roles.

In [34]:
from pydantic import BaseModel, Field


# Define Pydantic Models
class CriterionEvaluation(BaseModel):
    """Evaluation result for individual criteria"""

    score: int = Field(description="Evaluation score (1-5)")
    reasoning: str = Field(description="Reasoning behind the score")
    evidence: List[str] = Field(description="Evidence found in the resume")
    suggestions: List[str] = Field(description="Suggestions for improvement")


class DetailedEvaluation(BaseModel):
    """Detailed evaluation results"""

    technical_fit: CriterionEvaluation
    experience_relevance: CriterionEvaluation
    industry_knowledge: CriterionEvaluation
    education_qualification: CriterionEvaluation
    soft_skills: CriterionEvaluation
    overall_score: int = Field(description="Overall score (0-100)")
    key_strengths: List[str] = Field(description="Key strengths")
    improvement_areas: List[str] = Field(description="Areas for improvement")
    final_recommendation: str = Field(description="Final recommendation")


class LLMJudge:
    def __init__(self, model_name="gpt-4o", temperature=0.1):
        self.llm = ChatOpenAI(model=model_name, temperature=temperature)
        self.parser = PydanticOutputParser(pydantic_object=DetailedEvaluation)

        # Define evaluation criteria
        self.evaluation_criteria = {
            "technical_fit": {
                "weight": 30,
                "description": "Evaluation of technical fit",
                "subcriteria": [
                    "required_skills_match",
                    "tech_stack_relevance",
                    "skill_proficiency",
                ],
            },
            "experience_relevance": {
                "weight": 25,
                "description": "Evaluation of experience relevance",
                "subcriteria": ["role_similarity", "impact_scale", "problem_solving"],
            },
            "industry_knowledge": {
                "weight": 15,
                "description": "Evaluation of industry knowledge",
                "subcriteria": [
                    "domain_expertise",
                    "trend_awareness",
                    "industry_exposure",
                ],
            },
            "education_qualification": {
                "weight": 15,
                "description": "Evaluation of education and qualifications",
                "subcriteria": [
                    "degree_relevance",
                    "certifications",
                    "continuous_learning",
                ],
            },
            "soft_skills": {
                "weight": 15,
                "description": "Evaluation of soft skills",
                "subcriteria": [
                    "leadership_teamwork",
                    "communication",
                    "problem_approach",
                ],
            },
        }

        # Evaluation prompt template
        self.prompt_template = """You are a professional hiring evaluator.
        Evaluate the provided resume objectively and fairly based on the following criteria.

        Job Information:
        Company: {company_name}
        Position: {position}
        Job Description: {job_description}

        Resume Content:
        {resume_text}

        Evaluation Criteria:
        {evaluation_criteria}

        Guidelines for Evaluation:
        1. Assign a score from 1-5 for each evaluation area and provide detailed reasoning.
        2. Scoring criteria:
           5: Outstanding - Exceeds expectations significantly
           4: Excellent - Meets and slightly exceeds expectations
           3: Adequate - Meets expectations
           2: Needs Improvement - Falls slightly short of expectations
           1: Poor - Falls significantly short of expectations
        3. Provide specific evidence found in the resume for each area.
        4. Offer concrete suggestions for improvement.

        Provide the evaluation results in the following format:
        {format_instructions}
        """

        self.prompt = ChatPromptTemplate.from_template(
            template=self.prompt_template,
            partial_variables={
                "format_instructions": self.parser.get_format_instructions(),
                "evaluation_criteria": json.dumps(
                    self.evaluation_criteria, indent=2, ensure_ascii=False
                ),
            },
        )

    def evaluate(self, resume_text: str, job_info: dict) -> DetailedEvaluation:
        """Perform resume evaluation"""
        try:
            messages = self.prompt.format_messages(
                company_name=job_info.get("company", "Unknown"),
                position=job_info.get("position", "Unknown"),
                job_description=job_info.get("description", ""),
                resume_text=resume_text,
            )

            response = self.llm.invoke(messages)
            evaluation = self.parser.parse(response.content)
            return evaluation

        except Exception as e:
            print(f"Error during evaluation: {str(e)}")
            raise


class ResumeEvaluationSystem:
    def __init__(self):
        self.resume_processor = ResumeProcessor()
        self.judge = LLMJudge()

    def evaluate_with_recommendations(
        self, resume_path: str, recommended_jobs: List[dict], top_n: int = 3
    ) -> List[Dict]:
        """Evaluate the resume for the recommended jobs"""
        # Extract resume text
        resume_text = self.resume_processor.process_resume(resume_path)

        # Select top N jobs
        sorted_jobs = sorted(
            recommended_jobs, key=lambda x: x["similarity"], reverse=True
        )[:top_n]
        evaluations = []

        for job in sorted_jobs:
            job_info = {
                "company": job["company"],
                "position": job["title"],
                "description": job["description"],
                "similarity_score": job["similarity"],
            }

            try:
                # Perform evaluation
                evaluation = self.judge.evaluate(resume_text, job_info)

                # Generate evaluation report
                report = format_evaluation_report(evaluation)

                evaluations.append(
                    {"job_info": job_info, "evaluation": evaluation, "report": report}
                )

            except Exception as e:
                print(f"Error evaluating for {job_info['company']}: {str(e)}")
                continue

        return evaluations


def format_evaluation_report(evaluation: DetailedEvaluation) -> str:
    """Format evaluation results into a report"""
    output = []
    output.append("\n📊 Resume Evaluation Report")
    output.append("=" * 50)

    output.append(f"\n💡 Overall Score: {evaluation.overall_score}/100\n")

    # Evaluation by criteria
    criteria_items = [
        ("🔧 Technical Fit (30%)", evaluation.technical_fit),
        ("👔 Experience Relevance (25%)", evaluation.experience_relevance),
        ("🎯 Industry Knowledge (15%)", evaluation.industry_knowledge),
        ("📚 Education Qualification (15%)", evaluation.education_qualification),
        ("🤝 Soft Skills (15%)", evaluation.soft_skills),
    ]

    for title, criterion in criteria_items:
        output.append(f"\n{title}")
        output.append(f"Score: {criterion.score}/5")
        output.append(f"Reasoning: {criterion.reasoning}")
        output.append("Evidence Found:")
        for evidence in criterion.evidence:
            output.append(f"  • {evidence}")
        output.append("Suggestions:")
        for suggestion in criterion.suggestions:
            output.append(f"  • {suggestion}")

    # Overall evaluation
    output.append("\n📋 Overall Evaluation")
    output.append("-" * 30)

    output.append("\n💪 Key Strengths:")
    for strength in evaluation.key_strengths:
        output.append(f"  • {strength}")

    output.append("\n📈 Areas for Improvement:")
    for area in evaluation.improvement_areas:
        output.append(f"  • {area}")

    output.append("\n🎯 Final Recommendation:")
    output.append(f"{evaluation.final_recommendation}")

    return "\n".join(output)


def print_comprehensive_report(evaluations: List[Dict]):
    """Display the complete evaluation results"""
    print("\n" + "=" * 80)
    print("📋 Comprehensive Resume Evaluation Report")
    print("=" * 80)

    for idx, eval_result in enumerate(evaluations, 1):
        job_info = eval_result["job_info"]
        evaluation = eval_result["evaluation"]

        print(f"\n{idx}. {job_info['company']} - {job_info['position']}")
        print(f"Recommendation Similarity Score: {job_info['similarity_score']:.2f}")
        print("-" * 50)
        print(eval_result["report"])
        print("\n" + "=" * 80)

Excute Evaluation

In [35]:
# Resume file path
resume_path = "../data/joannadrummond-cv.pdf"

# Initialize evaluation system
evaluation_system = ResumeEvaluationSystem()

# First, get the resume text
resume_chunks = process_resume(resume_path)
resume_text = " ".join([chunk[0] for chunk in resume_chunks])

# Perform resume evaluation
print("Evaluating resume...")
evaluations = evaluation_system.evaluate_with_recommendations(
    resume_text,  # Pass the actual resume text instead of the path
    recommended_jobs=recommended_jobs,
    top_n=1,
)

# Print comprehensive report
print_comprehensive_report(evaluations)

Resume analysis completed.
Number of extracted chunks: 6

Career Analysis Summary:
------------------------
Interests: Joanna Drummond's primary academic interests and research focus lie in computer science, particularly in algorithms, artificial intelligence, and game theory. Her research has extensively explored stable matching problems, preference elicitation, and multi-agent systems, with a strong emphasis on decision-making under uncertainty and the application of machine learning techniques to educational technologies. Her career pattern reflects a consistent engagement with theoretical and applied aspects of computer science, evidenced by her work on stable matchings and dialogue systems, as well as her involvement in teaching and research internships.

Recommended Roles:
Evaluating resume...

📋 Comprehensive Resume Evaluation Report


## LLM-Based Resume Revise System

This tutorial demonstrates how to create a system that evaluates and improves resumes using a **Large Language Model (LLM)**. 

The system provides actionable suggestions to optimize resumes for specific job descriptions, enhancing the candidate’s chances of securing a role.

---

Key Components

1. **EnhancementSuggestion Model**
The `EnhancementSuggestion` model defines the structure for improvement suggestions:
- **`section`**: The specific resume section being improved (e.g., "Skills" or "Work Experience").
- **`current_content`**: The original content of the section.
- **`improved_content`**: The suggested improvement for the section.
- **`explanation`**: A detailed explanation of why the improvement is recommended.

---

2. **ResumeEnhancement Model**
The `ResumeEnhancement` model provides a holistic improvement report:
- **`improvements`**: A list of section-specific suggestions.
- **`keyword_optimization`**: Suggested keywords to include in the resume for optimization.
- **`general_suggestions`**: Overall suggestions for structure and presentation.
- **`action_items`**: Practical, actionable items for the candidate to implement.

---

3. **ResumeEnhancementSystem**
This class integrates LLM-based analysis to generate detailed improvement suggestions for resumes.

In [36]:
from langchain.output_parsers import PydanticOutputParser


class EnhancementSuggestion(BaseModel):
    """Suggestions for improvement for each resume section"""

    section: str = Field(description="Resume section")
    current_content: str = Field(description="Current content")
    improved_content: str = Field(description="Suggested improvement")
    explanation: str = Field(description="Reason for the improvement and explanation")


class ResumeEnhancement(BaseModel):
    """Overall suggestions for resume improvement"""

    improvements: List[EnhancementSuggestion] = Field(
        description="Suggestions for each section"
    )
    keyword_optimization: List[str] = Field(description="Keywords to optimize")
    general_suggestions: List[str] = Field(description="General suggestions")
    action_items: List[str] = Field(description="Actionable items")


class ResumeEnhancementSystem:
    def __init__(self, model_name="gpt-4o", temperature=0.1):
        self.llm = ChatOpenAI(model=model_name, temperature=temperature)
        self.parser = PydanticOutputParser(pydantic_object=ResumeEnhancement)

        # Prompt template for generating improvement suggestions
        self.prompt_template = """You are a professional resume consultant.
        Based on the provided evaluation results, offer detailed and actionable suggestions for improving the resume.

        Current Resume:
        {resume_text}

        Evaluation Results:
        {evaluation_results}

        Job Information:
        {job_info}

        Please include the following considerations when making your suggestions:
        1. Specific improvement suggestions for each section
        2. Key job-related keywords
        3. General structural and expression improvements
        4. Short-term and long-term actionable items

        Pay particular attention to the following:
        - Emphasize areas with high scores
        - Provide concrete solutions for areas with low scores
        - Tailor suggestions to the characteristics of the job
        - Ensure realistic and actionable recommendations

        Provide the improvement suggestions in the following format:
        {format_instructions}
        """

        self.prompt = ChatPromptTemplate.from_template(
            template=self.prompt_template,
            partial_variables={
                "format_instructions": self.parser.get_format_instructions()
            },
        )

    def generate_improvements(
        self, resume_text: str, evaluation_results: List[Dict], job_info: Dict
    ) -> ResumeEnhancement:
        """Generate improvement suggestions based on the evaluation results"""
        try:
            # Serialize evaluation_results (if DetailedEvaluation objects are included)
            evaluation_data = [
                (
                    eval_result.model_dump()
                    if hasattr(eval_result, "model_dump")
                    else eval_result
                )
                for eval_result in evaluation_results
            ]

            messages = self.prompt.format_messages(
                resume_text=resume_text,
                evaluation_results=json.dumps(
                    evaluation_data, ensure_ascii=False, indent=2
                ),
                job_info=json.dumps(job_info, ensure_ascii=False, indent=2),
            )

            response = self.llm.invoke(messages)
            suggestions = self.parser.parse(response.content)
            return suggestions

        except Exception as e:
            print(f"Error while generating improvement suggestions: {str(e)}")
            raise


def format_enhancement_report(enhancement: ResumeEnhancement) -> str:
    """Format the improvement suggestions into a report"""
    output = []
    output.append("\n📝 Resume Improvement Report")
    output.append("=" * 50)

    # Section-specific suggestions
    output.append("\n📋 Section-Specific Improvements")
    output.append("-" * 30)
    for improvement in enhancement.improvements:
        output.append(f"\n[{improvement.section}]")
        output.append("Current:")
        output.append(f"  {improvement.current_content}")
        output.append("Improved:")
        output.append(f"  {improvement.improved_content}")
        output.append("Reason:")
        output.append(f"  {improvement.explanation}")

    # Keyword optimization
    output.append("\n🔍 Recommended Keywords")
    output.append("-" * 30)
    for keyword in enhancement.keyword_optimization:
        output.append(f"• {keyword}")

    # General suggestions
    output.append("\n💡 General Suggestions")
    output.append("-" * 30)
    for suggestion in enhancement.general_suggestions:
        output.append(f"• {suggestion}")

    # Action items
    output.append("\n✅ Actionable Steps")
    output.append("-" * 30)
    for item in enhancement.action_items:
        output.append(f"• {item}")

    return "\n".join(output)


class IntegratedResumeSystem:
    """A system combining evaluation and improvement"""

    def __init__(self):
        self.evaluation_system = ResumeEvaluationSystem()
        self.enhancement_system = ResumeEnhancementSystem()

    def analyze_and_improve(
        self, resume_path: str, recommended_jobs: List[dict], top_n: int = 3
    ):
        """Perform integrated resume evaluation and improvement suggestions"""
        try:
            # First, process the resume to get the text content
            resume_chunks = process_resume(resume_path)
            resume_text = " ".join([chunk[0] for chunk in resume_chunks])

            # 1. Perform resume evaluation
            print("Evaluating the resume...")
            evaluations = self.evaluation_system.evaluate_with_recommendations(
                resume_text,  # Pass the processed text instead of path
                recommended_jobs=recommended_jobs,
                top_n=top_n,
            )

            # 2. Generate improvement suggestions for each recommended job
            print("Generating improvement suggestions...")
            improvements = []

            for eval_result in evaluations:
                job_info = eval_result["job_info"]
                evaluation = eval_result["evaluation"]

                # Generate improvement suggestions using the already processed resume text
                enhancement = self.enhancement_system.generate_improvements(
                    resume_text=resume_text,  # Use the processed text
                    evaluation_results=[evaluation.model_dump()],
                    job_info=job_info,
                )

                improvements.append(
                    {
                        "job_info": job_info,
                        "evaluation": evaluation,
                        "enhancement": enhancement,
                    }
                )

            return improvements

        except Exception as e:
            print(f"Error during analysis and improvement: {str(e)}")
            raise

Excute Evaluation

you can choose how many jobs you want to evaluate by changing the `top_n` value.

In [37]:
# Resume file path
resume_path = "../data/joannadrummond-cv.pdf"

# Initialize the integrated system
system = IntegratedResumeSystem()

# Perform analysis and improvements
results = system.analyze_and_improve(
    resume_path=resume_path, recommended_jobs=recommended_jobs, top_n=3
)

# Display the results
for result in results:
    print(f"\nJob: {result['job_info']['position']} @ {result['job_info']['company']}")
    print("=" * 80)
    print("\n[Evaluation Results]")
    print(result["evaluation"])
    print("\n[Improvement Suggestions]")
    print(format_enhancement_report(result["enhancement"]))
    print("=" * 80)

Resume analysis completed.
Number of extracted chunks: 8

Career Analysis Summary:
------------------------
Interests: Joanna Drummond's main areas of interest and research focus are in computer science, specifically in algorithms, artificial intelligence, and stable matching problems. Her research has extensively explored topics such as stable and approximately stable matching using multi-attribute preference information, preference elicitation, and the application of machine learning techniques to educational technologies. She has also investigated decision-making under uncertainty and the impact of dialogue systems on learning, demonstrating a strong interest in the intersection of computational methods and human-centered applications.

Recommended Roles:
Evaluating the resume...
Generating improvement suggestions...
