# Evaluation

Let's compare my workflow to a baseline of feeding the entire job description + entire resume (without parsing) to an LLM and asking it to predict a score

In [1]:
import json
import sys
import os
from pprint import pprint
from pydantic import BaseModel, RootModel, Field
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))

from resume_scanner.utils.with_structured_output import with_structured_output
from resume_scanner.parsing.parsing import parse_resume
from resume_scanner.parsing.job_parsing import parse_job_desc
from resume_scanner.scoring.scoring import score_resume

In [117]:
parsed_resume = parse_resume("../data/input/resumes/Kevin_resume.pdf")

In [118]:
pprint(parsed_resume)

{'Education': [{'Coursework': ['Data Structures & Algorithms',
                               'Software Engineering',
                               'Computer Systems',
                               'Discrete Math',
                               'Linear Algebra'],
                'Graduation Year': 2026,
                'Honors': ['Dean’s Honor Roll',
                           'Engineering Honors (EH)',
                           'Dean’s Excellence Award Semi-finalist'],
                'Majors': ['BS in Computer Science'],
                'Minors': ['Statistics', 'Math'],
                'Name': 'Texas A&M University'}],
 'Projects': [{'Contributions': ['Developed LLM-based credit card '
                                 'recommendation system',
                                 'Devised chain-of-thought (CoT) '
                                 'recommendation prompt',
                                 'Augmented recommendations with self-curated '
                                 'JS

In [103]:
with open("../data/input/jobs/google_swe_senior.txt", "r") as file:
    job_desc = file.read()

In [123]:
pprint(job_desc)

('Senior Software Engineer, AI/ML GenAI, Google Cloud AI\n'
 '\n'
 'Minimum qualifications:\n'
 '- Bachelor’s degree or equivalent practical experience.\n'
 '- 5 years of experience with software development in one or more programming '
 'languages, and with data structures/algorithms.\n'
 '- 3 years of experience testing, maintaining, or launching software '
 'products, and 1 year of experience with software design and architecture.\n'
 '- 3 years of experience with state of the art GenAI techniques (e.g., LLMs, '
 'Multi-Modal, Large Vision Models) or with GenAI-related concepts (language '
 'modeling, computer vision).\n'
 '- 3 years of experience with ML infrastructure (e.g., model deployment, '
 'model evaluation, optimization, data processing, debugging).\n'
 '\n'
 'Preferred qualifications:\n'
 "- Master's degree or PhD in Computer Science or related technical field.\n"
 '- 1 year of experience in a technical leadership role.\n'
 '- Experience developing accessible technologies.

In [80]:
FINAL_SCORING_TEMPLATE = """
### Instruction

You are an expert at evaluating resumes for a job opening. Your goal is to score each resume based on its alignment with the provided job description and parsed sections. Resumes are parsed into the following sections: Experience, Education, Projects, Leadership, Research, and Skills. Not all resumes will include every section. Use the scoring criteria below to evaluate each section and calculate an overall score. Provide concise, explainable feedback for each score.

#### Scoring Criteria

For each section, score based on the following criteria:
1. **Relevance (0-5)**: How well does the content align with the job description?
2. **Depth (0-5)**: How substantial and well-developed is the content?
3. **Impact (0-5)**: Does the content demonstrate measurable outcomes or achievements?

If a section is missing, assign a score of 0 for that section.

For "Education":
- Relevance: Score highly if the degree is directly related to the job. Consider coursework or certifications that match the job requirements.
- Depth: Evaluate based on the inclusion of relevant coursework, academic achievements, or certifications.
- Impact: Look for measurable outcomes such as GPA (if relevant), honors, or academic recognition that highlight academic excellence.

#### Scoring Rubric

| Score | Meaning |
|-------|---------|
| **0** | **Not Applicable / Missing**: No content provided, or the section is irrelevant to the job. |
| **1** | **Poor**: Content exists but is highly generic, irrelevant, or underdeveloped. Little to no measurable impact is demonstrated. |
| **2** | **Below Average**: Somewhat relevant but lacks depth or specificity. Minimal impact or achievements are demonstrated. |
| **3** | **Average**: Content is moderately relevant, with adequate detail. Some measurable impact or effort is evident, but not exceptional. |
| **4** | **Good**: Content is highly relevant, well-detailed, and demonstrates meaningful contributions or achievements. Could be improved slightly to reach exceptional quality. |
| **5** | **Excellent**: Content is exceptionally relevant, detailed, and impactful, showcasing strong alignment with job requirements and significant measurable outcomes. |

---

### Output Format

```json
{{
   "Experience": {{
      "Alignment": [Score]
      "Comment": "[Explanation for score that mentions specific requirements from job description]"
   }},
   "Education": {{
      "Relevance": [Score]
      "Comment": "[Explanation for score that mentions specific requirements from job description]"
   }},
   "Projects": {{
      "Relevance": [Score],
      "Depth": [Score],
      "Impact": [Score],
      "Comment": "[Explanation for score that mentions specific requirements from job description]"
   }},
   "Leadership": {{
      "Relevance": [Score],
      "Depth": [Score],
      "Impact": [Score],
      "Comment": "[Explanation for score that mentions specific requirements from job description]"
   }},
   "Research": {{
      "Relevance": [Score],
      "Depth": [Score],
      "Impact": [Score],
      "Comment": "[Explanation for score that mentions specific requirements from job description]"
   }},
   "Skills": {{
      "Alignment": [Score]
      "Comment": "[Explanation for score that mentions specific requirements from job description]"
   }}
   "Overall Comment": "[General comments about the candidate's resume, including strengths, weaknesses, and alignment with the job description]"
}}
```

---

### Input

Work Experience:
```{work_experience}```

Education:
```{education}```

Projects:
```{projects}```

Leadership:
```{leadership}```

Research:
```{research}```

Skills:
```{skills}```

Job description:
```{job_desc}```

Output:
"""

In [99]:
class SectionScore(BaseModel):
    relevance: int  = Field(..., alias="Relevance", description="Relevance score (0-5)")
    depth: int      = Field(..., alias="Depth", description="Depth score (0-5)")
    impact: int     = Field(..., alias="Impact", description="Impact score (0-5)")
    comment: str    = Field(..., alias="Comment", description="Explanation for the scores in this section")
    
class ReducedSectionScore(BaseModel):
    alignment: int  = Field(..., alias="Alignment", description="Alignment score (0-5)")
    comment: str    = Field(..., alias="Comment", description="Explanation for the score in this section")

class ResumeEvaluation(BaseModel):
    experience: SectionScore         = Field(..., alias="Experience")
    education: ReducedSectionScore   = Field(..., alias="Education")
    projects: SectionScore           = Field(..., alias="Projects")
    leadership: SectionScore         = Field(..., alias="Leadership")
    research: SectionScore           = Field(..., alias="Research")
    skills: ReducedSectionScore      = Field(..., alias="Skills")
    overall_comment: str             = Field(..., alias="Overall Comment", 
                                        description="General comments about the resume, including strengths and weaknesses"
                                     )

In [77]:
def score_resume(resume: dict, job_desc: str):
    evaluation = with_structured_output(
        prompt=FINAL_SCORING_TEMPLATE.format(
            job_desc=job_desc,
            work_experience=resume.get("Work Experience", "Unavailable."),
            education=resume.get("Education", "Unavailable."),
            projects=resume.get("Projects", "Unavailable."),
            leadership=resume.get("Leadership", "Unavailable."),
            research=resume.get("Research", "Unavailable."),
            skills=resume.get("Skills", "Unavailable.")
        ),
        schema=ResumeEvaluation
    )
    return evaluation

In [100]:
from openai import OpenAI

client = OpenAI()

def score_resume_gpt(resume: dict, job_desc: str):
    response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-11-20",
        messages=[
            {
                "role": "user",
                "content": FINAL_SCORING_TEMPLATE.format(
                    job_desc=job_desc,
                    work_experience=resume.get("Work Experience", "Unavailable."),
                    education=resume.get("Education", "Unavailable."),
                    projects=resume.get("Projects", "Unavailable."),
                    leadership=resume.get("Leadership", "Unavailable."),
                    research=resume.get("Research", "Unavailable."),
                    skills=resume.get("Skills", "Unavailable.")
                ),
            }
        ],
        response_format=ResumeEvaluation
    )
    
    return response.choices[0].message.parsed

In [119]:
eval = score_resume_gpt(parsed_resume, job_desc)

In [121]:
eval = eval.model_dump()

In [122]:
pprint(eval)

{'education': {'alignment': 5,
               'comment': 'Pursuing a BS in Computer Science with relevant '
                          'coursework supports the educational qualifications '
                          'for the position.'},
 'experience': {'comment': 'Significant experience aligns with GenAI '
                           'development and software engineering requirements '
                           'of the role, demonstrating impact through '
                           'measurable achievements.',
                'depth': 4,
                'impact': 4,
                'relevance': 4},
 'leadership': {'comment': 'No leadership experience is explicitly listed, '
                           'which is a preferred qualification for the role.',
                'depth': 0,
                'impact': 0,
                'relevance': 0},
 'overall_comment': "The candidate's resume highlights strong alignment with "
                    'the job through relevant experience, education, an

## Assign weights to the sections

Let's ask the LLM to generate weights for each resume section based on the job description

In [58]:
SECTION_WEIGHT_TEMPLATE = """
### Instruction

You are an expert at evaluating job descriptions. Given a job description, your job is to assign weights to resume sections that will be used to score resumes against the job description.

There are six resume sections:
1. Education
2. Work Experience
3. Projects
4. Leadership
5. Research
6. Skills

Assign a percentage weight between 0 and 1 that determines how important each resume section is when evaluating a resume's fit to a provided job description.

**ALL WEIGHTS MUST SUM TO 1**

Please think step-by-step and output your reasoning in the "Reasoning" section before assigning weights. **You must explicitly sum up the scores you provided and validate that it adds up to 1. If it is not, please re-compute the weights.**

### Output Format

```json
{{
    "Reasoning": "<Reasoning leading to weight assignments AND validation that scores add up to 1>"
    "Education": <weight between 0 and 1>,
    "Work Experience": <weight between 0 and 1>,
    "Projects": <weight between 0 and 1>,
    "Leadership": <weight between 0 and 1>,
    "Research": <weight between 0 and 1>,
    "Skills": <weight between 0 and 1>
}}
```

---

### Input

Job Description:
{job_desc}

Output:
"""

In [39]:
class ResumeWeights(BaseModel):
    reasoning: str = Field(..., alias="Reasoning")
    education: float = Field(..., ge=0, le=1, alias="Education")
    experience: float = Field(..., ge=0, le=1, alias="Work Experience")
    projects: float = Field(..., ge=0, le=1, alias="Projects")
    leadership: float = Field(..., ge=0, le=1, alias="Leadership")
    research: float = Field(..., ge=0, le=1, alias="Research")
    skills: float = Field(..., ge=0, le=1, alias="Skills")

In [89]:
weights = with_structured_output(
    prompt=SECTION_WEIGHT_TEMPLATE.format(job_desc=job_desc),
    schema=ResumeWeights
)

In [90]:
pprint(weights)

{'Education': 0.15,
 'Leadership': 0.05,
 'Projects': 0.1,
 'Reasoning': 'Based on the job description, we can infer that the most '
              'important qualifications for this position are related to '
              'software development, AI/ML, and Google Cloud expertise. The '
              'minimum qualifications require 5 years of experience with '
              'software development and 3 years of experience with '
              'state-of-the-art GenAI techniques or concepts. The preferred '
              "qualifications include a Master's degree or PhD in Computer "
              'Science or related technical field, and 1 year of experience in '
              'a technical leadership role. Given these requirements, we can '
              'assign weights to each resume section as follows:\n'
              '\n'
              '* Education: 0.15 (a Bachelor’s degree is the minimum '
              'qualification, but a Master’s degree or PhD is preferred)\n'
              '* Work

## Final Scoring

In [124]:
edu_score = eval["education"]["alignment"] / 5 * weights["Education"]
exp_score = (eval["experience"]["depth"] + eval["experience"]["impact"] + eval["experience"]["relevance"]) / 15 * weights["Work Experience"]
proj_score = (eval["projects"]["depth"] + eval["projects"]["impact"] + eval["projects"]["relevance"]) / 15 * weights["Projects"]
research_score = (eval["research"]["depth"] + eval["research"]["impact"] + eval["research"]["relevance"]) / 15 * weights["Research"]
leadership_score = (eval["leadership"]["depth"] + eval["leadership"]["impact"] + eval["leadership"]["relevance"]) / 15 * weights["Leadership"]
skills_score = eval["skills"]["alignment"] / 5 * weights["Skills"]

final_score = edu_score + exp_score + proj_score + research_score + leadership_score + skills_score
final_score

0.69