# **CV Checker by Omer Hausner**
---
---

# Introduction
---
In this notebook, I prepared a model that compares you CV resume against a job desctiption, and summerize the pros and cons in your resume according to the job's requirements. Finally, the model will suggest steps to improve your cv for the job application.

# Imports & Installations

In [None]:
# Install pdfplumber if not already installed - to allow PDF text extraction
!uv pip install pdfplumber
! uv pip install docx2txt

In [None]:
import os

from openai import OpenAI
from IPython.display import Markdown, display, update_display

from dotenv import load_dotenv

import docx2txt
import pdfplumber



# Config

In [None]:
# set OPENAI_API_KEY

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    


# Preprocessing
---
In this section we will preprocess the functions and variables we need for the final inference model.

# Scraper Utils

In [None]:
from bs4 import BeautifulSoup
import requests


# Standard headers to fetch a website
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url):
    """
    Return the title and contents of the website at the given url;
    truncate to 2,000 characters as a sensible limit
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    title = soup.title.string if soup.title else "No title found"
    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""
    return (title + "\n\n" + text)[:3_000]


def fetch_website_links(url):
    """
    Return the links on the webiste at the given url
    I realize this is inefficient as we're parsing twice! This is to keep the code in the lab simple.
    Feel free to use a class and optimize it!
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]


## Resume Guidelines
For the purpose of this task, I chose a website which contained relevant resume guidelines which will help the LLM model to acknowledge basic requirements in a resume.

In [None]:
# This url contains guidelines on how to write a good resume.
# You may change it to any other url you like, or leave it as None if you don't want to use any guidelines.
resume_guidlines_url = "https://nationalcareers.service.gov.uk/careers-advice/cv-sections" 

# We now fetch the contents of the resume guidelines website.
resume_guidlines = fetch_website_contents(resume_guidlines_url) if resume_guidlines_url else None

## Extract resume to text
We now define functions that take a file path of the desired resume, and extract it as plain text. 

This function supports files such as Word (.docx) and PDF (.pdf) only.

In [None]:
# Text Extraction Functions

def extract_text_from_pdf(pdf_path: str) -> None:
    """
    Extract text from a PDF file and save it to a text file.
    """
    with pdfplumber.open(pdf_path) as pdf:
        all_text = ''
        for page in pdf.pages:
            all_text += page.extract_text() + '\n'
    
    return all_text

def extract_text_from_docx(docx_path: str) -> None:
    """
    Extract text from a DOCX file and save it to a text file.
    """
    all_text = docx2txt.process(docx_path)
    
    return all_text

def extract_text_from_resume(resume_path: str) -> None:
    """
    Extract text from a resume file (PDF or DOCX) and save it to a text file.
    """
    _, file_extension = os.path.splitext(resume_path)
    if file_extension.lower() == '.pdf':
        text = extract_text_from_pdf(resume_path)
    elif file_extension.lower() == '.docx':
        text = extract_text_from_docx(resume_path)
    else:
        raise ValueError("Unsupported file format. Please provide a PDF or DOCX file.")

    return text


In [None]:
## Job description - Example
job_desc_url = "https://www.linkedin.com/jobs/view/4336621982/?alternateChannel=search&eBP=NOT_ELIGIBLE_FOR_CHARGING&trk=d_flagship3_search_srp_jobs&refId=%2BvHkE19BdH5zD0S1GtNrEg%3D%3D&trackingId=d7wYUZZ%2F%2BgVrwN%2FnD2sKlw%3D%3D"
job_description = fetch_website_contents(job_desc_url)  # Replace with actual job description URL

In [None]:
job_description

## Set system and user messages

In [None]:
# System Message
def set_system_message(resume_guidlines_url):
    resume_guidlines = fetch_website_contents(resume_guidlines_url) if resume_guidlines_url else None
    
    system_message = f"""
    You are an expert career advisor helping people improve their CVs (resumes) for specific job applications.
    Your task is to analyze a user's CV against a given job description and provide feedback on how well the CV matches the job requirements.
    You will provide a summary of strengths and weaknesses in the CV relative to the job description, and suggest specific improvements to better align the CV with the job.
    Consider also the overall presentation, clarity, structure and relevance of the CV content.
    Use the following guidelines to evaluate the CV:
    {resume_guidlines if resume_guidlines else "No specific guidelines provided."}
    Be concise and focus on actionable feedback, up to 4 main bullet points in each section. 

    Provide your response in the following format:
    1. General Feedback:
    2. Summary of Strengths:
    3. Summary of Weaknesses:
    4. Suggested Improvements:

    Respond in markdown format, use headings and bullet points where appropriate, and emojis to enhance readability.
    Keep the response not too long, ideally under 300 words.
    
    """
    return system_message

In [None]:
def set_user_message(resume_path: str, job_description_url: str) -> str:
    job_description = fetch_website_contents(job_description_url)
    resume_text = extract_text_from_resume(resume_path)

    user_message = f"""
    Here is the job description:
    {job_description}

    Here is the my CV:
    {resume_text}

    Please analyze the CV against the job description and provide your feedback.
    """
    return user_message

In [None]:
def set_messages(resume_guidlines_url, job_description_url, resume_path):
    system_message = set_system_message(resume_guidlines_url)
    user_message = set_user_message(resume_path, job_description_url)
    
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message}
    ]
    
    return messages

# Set Final Model

In [None]:
openai = OpenAI()
MODEL = 'gpt-4.1-mini'

In [None]:
def cv_checker_model(resume_guidlines_url, job_description_url, resume_path, model="gpt-5-nano"):
    messages = set_messages(resume_guidlines_url, job_description_url, resume_path)
    response = openai.chat.completions.create(model=model, messages=messages)

    display(Markdown(response.choices[0].message.content))

## Set Variables

In [None]:
resume_guidelines_url = "https://nationalcareers.service.gov.uk/careers-advice/cv-sections"

## CHANGE HERE
job_description_url = "https://www.linkedin.com/jobs/view/4336621982/?alternateChannel=search&eBP=NOT_ELIGIBLE_FOR_CHARGING&trk=d_flagship3_search_srp_jobs&refId=%2BvHkE19BdH5zD0S1GtNrEg%3D%3D&trackingId=d7wYUZZ%2F%2BgVrwN%2FnD2sKlw%3D%3D"
resume_path = r"enter_your_resume_path_here.pdf"


## Run Model

In [None]:
cv_checker_model(resume_guidelines_url, job_description_url, resume_path, model=MODEL)