# Evaluation

Let's compare my workflow to a baseline of feeding the entire job description + entire resume (without parsing) to an LLM and asking it to predict a score

In [1]:
import json
import sys
import os
from pprint import pprint
from pydantic import BaseModel, RootModel, Field
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))

from resume_scanner.utils.with_structured_output import with_structured_output
from resume_scanner.parsing.parsing import parse_resume
from resume_scanner.parsing.job_parsing import parse_job_desc
from resume_scanner.scoring.scoring import score_resume

In [2]:
parsed_resume = parse_resume("../data/input/resumes/Kevin_resume.pdf")

In [3]:
pprint(parsed_resume)

{'Education': [{'Coursework': ['Data Structures & Algorithms',
                               'Software Engineering',
                               'Computer Systems',
                               'Discrete Math',
                               'Linear Algebra'],
                'Graduation Year': 2026,
                'Honors': ['Dean’s Honor Roll',
                           'Engineering Honors (EH)',
                           'Dean’s Excellence Award Semi-finalist'],
                'Majors': ['BS in Computer Science'],
                'Minors': ['Statistics', 'Math'],
                'Name': 'Texas A&M University'}],
 'Projects': [{'Contributions': ['Developed LLM-based credit card '
                                 'recommendation system',
                                 'Devised chain-of-thought (CoT) '
                                 'recommendation prompt',
                                 'Augmented recommendations with self-curated '
                                 'JS

In [4]:
with open("../data/input/jobs/google_swe_senior.txt", "r") as file:
    job_desc = file.read()

In [10]:
FINAL_SCORING_TEMPLATE = """
### Instruction

You are an expert at evaluating resumes for a job opening. Your goal is to score each resume based on its alignment with the provided job description and parsed sections. Resumes are parsed into the following sections: Experience, Education, Projects, Leadership, Research, and Skills. Not all resumes will include every section. Use the scoring criteria below to evaluate each section and calculate an overall score. Provide concise, explainable feedback for each score.

#### Scoring Criteria

For each section, score based on the following criteria:
1. **Relevance (0-5)**: How well does the content align with the job description?
2. **Depth (0-5)**: How substantial and well-developed is the content?
3. **Impact (0-5)**: Does the content demonstrate measurable outcomes or achievements?

If a section is missing:
- Assign `None` for a section if it is irrelevant to the job. For example "Research" may not be relevant for "Software Engineering" roles.
- Otherwise, assign a score of 0 for that section and note its absence. For example "Experience" is required for most "Software Engineering" roles.

For "Education":
- Relevance: Score highly if the degree is directly related to the job. Consider coursework or certifications that match the job requirements.
- Depth: Evaluate based on the inclusion of relevant coursework, academic achievements, or certifications.
- Impact: Look for measurable outcomes such as GPA (if relevant), honors, or academic recognition that highlight academic excellence.

#### Scoring Rubric

| Score | Meaning |
|-------|---------|
| **0** | **Not Applicable / Missing**: No content provided, or the section is irrelevant to the job. |
| **1** | **Poor**: Content exists but is highly generic, irrelevant, or underdeveloped. Little to no measurable impact is demonstrated. |
| **2** | **Below Average**: Somewhat relevant but lacks depth or specificity. Minimal impact or achievements are demonstrated. |
| **3** | **Average**: Content is moderately relevant, with adequate detail. Some measurable impact or effort is evident, but not exceptional. |
| **4** | **Good**: Content is highly relevant, well-detailed, and demonstrates meaningful contributions or achievements. Could be improved slightly to reach exceptional quality. |
| **5** | **Excellent**: Content is exceptionally relevant, detailed, and impactful, showcasing strong alignment with job requirements and significant measurable outcomes. |

---

### Task

1. Evaluate each section and assign scores for Relevance, Depth, and Impact (0-5 scale).
2. Provide comments justifying the scores for each section.
3. Provide brief overall comments on the resume

---

### Output Format

```json
{{
   "Section Scores": {{
      "Experience": {{
         "Relevance": [Score],
         "Depth": [Score],
         "Impact": [Score],
         "Comment": "[Explanation for the scores in this section]"
      }},
      "Education": {{
         "Relevance": [Score],
         "Depth": [Score],
         "Impact": [Score],
         "Comment": "[Explanation for the scores in this section]"
      }},
      "Projects": {{
         "Relevance": [Score],
         "Depth": [Score],
         "Impact": [Score],
         "Comment": "[Explanation for the scores in this section]"
      }},
      "Leadership": {{
         "Relevance": [Score],
         "Depth": [Score],
         "Impact": [Score],
         "Comment": "[Explanation for the scores in this section]"
      }},
      "Research": {{
         "Relevance": [Score],
         "Depth": [Score],
         "Impact": [Score],
         "Comment": "[Explanation for the scores in this section]"
      }},
      "Skills": {{
         "Relevance": [Score],
         "Depth": [Score],
         "Impact": [Score],
         "Comment": "[Explanation for the scores in this section]"
      }}
   }},
   "Overall Comment": "[General comments about the candidate's resume, including strengths, weaknesses, and alignment with the job description]"
}}
```

---

### Input

Job description:
```{job_desc}```

Work Experience:
```{work_experience}```

Education:
```{education}```

Projects:
```{projects}```

Leadership:
```{leadership}```

Research:
```{research}```

Skills:
```{skills}```

Output:
"""

In [6]:
class SectionScore(BaseModel):
    relevance: int = Field(..., ge=0, le=5, alias="Relevance", description="Relevance score (0-5)")
    depth: int = Field(..., ge=0, le=5, alias="Depth", description="Depth score (0-5)")
    impact: int = Field(..., ge=0, le=5, alias="Impact", description="Impact score (0-5)")
    comment: str = Field(..., alias="Comment", description="Explanation for the scores in this section")

class ResumeEvaluation(BaseModel):
    section_scores: dict[str, SectionScore | None] = Field(
        ..., alias="Section Scores", description="Scores and comments for each section"
    )
    overall_comment: str = Field(
        ..., alias="Overall Comment", description="General comments about the resume, including strengths and weaknesses"
    )

In [12]:
resume = parsed_resume
prompt = FINAL_SCORING_TEMPLATE.format(
    job_desc=job_desc,
    work_experience=resume.get("Work Experience", "Unavailable."),
    education=resume.get("Education", "Unavailable."),
    projects=resume.get("Projects", "Unavailable."),
    leadership=resume.get("Leadership", "Unavailable."),
    research=resume.get("Research", "Unavailable."),
    skills=resume.get("Skills", "Unavailable.")
)
pprint(prompt)

('\n'
 '### Instruction\n'
 '\n'
 'You are an expert at evaluating resumes for a job opening. Your goal is to '
 'score each resume based on its alignment with the provided job description '
 'and parsed sections. Resumes are parsed into the following sections: '
 'Experience, Education, Projects, Leadership, Research, and Skills. Not all '
 'resumes will include every section. Use the scoring criteria below to '
 'evaluate each section and calculate an overall score. Provide concise, '
 'explainable feedback for each score.\n'
 '\n'
 '#### Scoring Criteria\n'
 '\n'
 'For each section, score based on the following criteria:\n'
 '1. **Relevance (0-5)**: How well does the content align with the job '
 'description?\n'
 '2. **Depth (0-5)**: How substantial and well-developed is the content?\n'
 '3. **Impact (0-5)**: Does the content demonstrate measurable outcomes or '
 'achievements?\n'
 '\n'
 'If a section is missing:\n'
 '- Assign `None` for a section if it is irrelevant to the job. Fo

In [7]:
def score_resume(resume: dict, job_desc: str):
    evaluation = with_structured_output(
        prompt=FINAL_SCORING_TEMPLATE.format(
            job_desc=job_desc,
            work_experience=resume.get("Work Experience", "Unavailable."),
            education=resume.get("Education", "Unavailable."),
            projects=resume.get("Projects", "Unavailable."),
            leadership=resume.get("Leadership", "Unavailable."),
            research=resume.get("Research", "Unavailable."),
            skills=resume.get("Skills", "Unavailable.")
        ),
        schema=ResumeEvaluation
    )
    return evaluation

In [11]:
eval = score_resume(parsed_resume, job_desc)
pprint(eval)

{'Overall Comment': 'The candidate has a good balance of technical skills and '
                    'project experience. However, there is no relevant '
                    'education information or leadership/research experience.',
 'Section Scores': {'Education': {'Comment': 'No relevant education '
                                             'information',
                                  'Depth': 0,
                                  'Impact': 0,
                                  'Relevance': 0},
                    'Leadership': {'Comment': 'No leadership experience',
                                   'Depth': 0,
                                   'Impact': 0,
                                   'Relevance': 0},
                    'Projects': {'Comment': 'Two projects with moderate depth '
                                            'and high impact',
                                 'Depth': 2,
                                 'Impact': 3,
                                 'Rele

## Parsing -> scoring

Parallel parsing: ~1:30

In [3]:
pprint(parsed_resume)

{'Education': [{'GPA': 4.0,
                'Graduation Year': 2026,
                'Majors': ['BS in Computer Science'],
                'Minors': ['Statistics', 'Math'],
                'Name': 'Texas A&M University'}],
 'Projects': [{'Contributions': ['Developed LLM-based credit card '
                                 'recommendation system',
                                 'Devised chain-of-thought (CoT) '
                                 'recommendation prompt',
                                 'Augmented recommendations with self-curated '
                                 'JSON database'],
               'Name': 'Credit Card Recommender (Best Financial Hack)',
               'Skills': ['Python', 'Jupyter', 'Flask', 'React', 'LangChain']},
              {'Contributions': ['Designed modular React components and '
                                 'managed application state',
                                 'Orchestrated integration of Google OAuth API',
                          

In [4]:
score_resume(parsed_resume, job_text)

{'Reasoning': {'Skills': 'The candidate has a strong background in software development with experience in multiple programming languages. They have also worked on data structures and algorithms, which is a requirement for the position.',
  'Experience': "The candidate's experience in testing, maintaining, or launching software products aligns with the job requirements. However, they lack experience with state-of-the-art GenAI techniques and ML infrastructure, which are preferred qualifications.",
  'Education': "The candidate has a bachelor's degree, which meets the minimum qualification requirement. However, they do not have a master's degree or PhD in Computer Science or related technical field, which is a preferred qualification."},
 'Overall Assessment': 'The candidate has some relevant skills and experience for the position but lacks some of the preferred qualifications. They may be considered for an entry-level position or further training to meet the requirements.',
 'Score': 3